This work investigates the variation in a word’s distributionally nearest neighbours with respect to the similarity measure used. We identify one type of variation as being the relative frequency of the neighbour words with respect to the frequency of the
CharacterisingMeasuresofLexicalDistributionalSimilarity
JulieWeeds,DavidWeirandDianaMcCarthy
DepartmentofInformaticsUniversityofSussexBrighton,BN19QH,UK
{juliewe,davidw,dianam}@sussex.ac.uk
Abstract
Thisworkinvestigatesthevariationinaword’sdis-tributionallynearestneighbourswithrespecttothesimilaritymeasureused.Weidentifyonetypeofvariationasbeingtherelativefrequencyoftheneigh-bourwordswithrespecttothefrequencyofthetar-getword.Wethendemonstrateathree-wayconnec-tionbetweenrelativefrequencyofsimilarwords,aconceptofdistributionalgneralityandtheseman-ticrelationofhyponymy.Finally,weconsidertheimpactthatthishasononeapplicationofdistribu-tionalsimilaritymethods(judgingthecomposition-alityofcollocations).
1Introduction
Overrecentyears,manyNaturalLanguagePro-cessing(NLP)techniqueshavebeendevelopedthatmightbene tfromknowledgeofdistribu-tionallysimilarwords,i.e.,wordsthatoccurinsimilarcontexts.Forexample,thesparsedataproblemcanmakeitdi culttoconstructlan-guagemodelswhichpredictcombinationsoflex-icalevents.Similarity-basedsmoothing(Brownetal.,1992;Daganetal.,1999)isanintuitivelyappealingapproachtothisproblemwhereprob-abilitiesofunseenco-occurrencesareestimatedfromprobabilitiesofseenco-occurrencesofdis-tributionallysimilarevents.
Otherpotentialapplicationsapplythehy-pothesisedrelationship(Harris,1968)betweendistributionalsimilarityandsemanticsimilar-ity;i.e.,similarityinthemeaningofwordscanbepredictedfromtheirdistributionalsimilarity.Oneadvantageofautomaticallygeneratedthe-sauruses(Grefenstette,1994;Lin,1998;CurranandMoens,2002)overlarge-scalemanuallycre-atedthesaurusessuchasWordNet(Fellbaum,1998)isthattheymightbetailoredtoapartic-ulargenreordomain.
However,duetothelackofatightde ni-tionfortheconceptofdistributionalsimilarityandthebroadrangeofpotentialapplications,a
largenumberofmeasuresofdistributionalsim-ilarityhavebeenproposedoradopted(seeSec-tion2).Previousworkontheevaluationofdis-tributionalsimilaritymethodstendstoeithercomparesetsofdistributionallysimilarwordstoamanuallycreatedsemanticresource(Lin,1998;CurranandMoens,2002)orbeorientedtowardsaparticulartasksuchaslanguagemod-elling(Daganetal.,1999;Lee,1999).The rstapproachisnotidealsinceitassumesthatthegoalofdistributionalsimilaritymethodsistopredictsemanticsimilarityandthattheseman-ticresourceusedisavalidgoldstandard.Fur-ther,thesecondapproachisclearlyadvanta-geouswhenonewishestoapplydistributionalsimilaritymethodsinaparticularapplicationarea.However,itisnotatallobviousthatoneuniversallybestmeasureexistsforallapplica-tions(WeedsandWeir,2003).Thus,applyingadistributionalsimilaritytechniquetoanewap-plicationnecessitatesevaluatingalargenumberofdistributionalsimilaritymeasuresinadditiontoevaluatingthenewmodeloralgorithm.
Weproposeashiftinfocusfromattemptingtodiscovertheoverallbestdistributionalsim-ilaritymeasuretoanalysingthestatisticalandlinguisticpropertiesofsetsofdistributionallysimilarwordsreturnedbydi erentmeasures.Thiswillmakeitpossibletopredictinadvanceofanyexperimentalevaluationwhichdistribu-tionalsimilaritymeasuresmightbemostappro-priateforaparticularapplication.
Further,weexploreaproblemfacedbytheautomaticthesaurusgenerationcommunity,whichisthatdistributionalsimilaritymethodsdonotseemtoo eranyobviouswaytodis-tinguishbetweenthesemanticrelationsofsyn-onymy,antonymyandhyponymy.Previousworkonthisproblem(Caraballo,1999;Linetal.,2003)involvesidentifyingspeci cphrasalpatternswithintexte.g.,“XsandotherYs”isusedasevidencethatXisahyponymofY.Ourworkexplorestheconnectionbetweenrelative
Measures of Distribution... 暂无评价 9页 免费 Characterising measures ......In the same way that lexical distributional similarity is used to estimate ...
A summary of the lexicon and sentence structure_英语学习_外语学习_教育专区...In the Chomskian tradition the notions are encoded in distributional frames....
Major lexical categories and minor lexical categories Examples of some lexical... or A word’s distributional facts together with information about its ...
Characterising measures ... 暂无评价 7页 免费 Distributional clusterin... ... The Meaning of Ta?ig: Distributional Similarity for Rare Words Julie Weeds...
Characterising Legal Language英译汉_英语学习_外语学习_教育专区。</E-C> ...<E-C\> </E-C> Lexicon 词汇 <E-C\> </E-C> In terms of legal ...
We called these measures distributional measures of...of lexical items and lexical semantic relations ...Scaling distributional similarity to large corpora. ...
cant correlation between measures which use these ...1 Introduction Characterising the semantic behaviour...distributional similarity is used to compare the ...
我要评论