Successfully reported this slideshow.
Your SlideShare is downloading. ×

Lecture: Word Senses

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 60 Ad

Lecture: Word Senses

Download to read offline

word senses, lexical semantics, homonymy, polysemy, metonymy, meronymy, antonomy, synonmy, hyponymy, hypernymy, wordnet, mesh, babelnet, lemma, wordform, zeugma test, senseval, selectional restrictions, membership meronymy, part-whole meronymy, semantic analysis, language technology

word senses, lexical semantics, homonymy, polysemy, metonymy, meronymy, antonomy, synonmy, hyponymy, hypernymy, wordnet, mesh, babelnet, lemma, wordform, zeugma test, senseval, selectional restrictions, membership meronymy, part-whole meronymy, semantic analysis, language technology

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Lecture: Word Senses (20)

More from Marina Santini (20)

Advertisement

Recently uploaded (20)

Lecture: Word Senses

  1. 1. Seman&c  Analysis  in  Language  Technology   http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 
 
 Word Senses
 Marina  San(ni   san$nim@stp.lingfil.uu.se     Department  of  Linguis(cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2016      
  2. 2. Previous  Lecture:  Sen$ment  Analysis   •  Affec(ve  meaning  is  a  kind  of  connota(onal  meaning     •  The  importance  of  sen(ment  lexicons   •  Methods  for  the  automa(c  expansion  of  manually-­‐ annotated  lexicons   •  A  baseline  algorithm:  Naive  Bayes   •  Prac(cal  ac(vity:  ML-­‐based  sen(ment  classifier:   movie  reviews,  product  reviews,  restaurant  reviews…   •  Results…  uhm…    somehow  biassed  (short  text  vs   long  texts);  never  a  full  posi(ve  polarity;  etc.   2   Lecture  4:  Word  Senses  
  3. 3. How  is  *meaning*  handled  in  Seman$c-­‐Based  LT-­‐Applica$ons?   •  Seman(c  Role  Labelling/Predicate-­‐Argument  Structure     •  Main  trend:   •  crea(on  of  annotated  resources  (PropBank,  FrameNet,  etc.);       •  use  of  supervised  machine  learning:  classifiers  are  trained  on  annotated  resources,  such  as  PropBank  and   FrameNet.     •  Sen(ment  Analysis     •  Main  trends:   •  iden(fica(on  of  sen(ment-­‐bearing  features  or  iden(fica(on  of  representa(ve  features  for  the  problem   •  use  of  supervervised  learning  à  cf  the  results  of  NLTK  classifier   •  Word  sense  disambigua(on  (???)   •  Informa(on  extrac(on  (???)   •  Ques(on  Answering  (???)   •  Ontologies  (???)   Lecture  4:  Word  Senses  3  
  4. 4. Reminder:  Glossary  Entries   •  Which  concepts  are  the  most  salient  in  Lect  3?   •  Update  your  Glossary…   Lecture  4:  Word  Senses  4  
  5. 5. Previous  lecture:  end   5   Lecture  4:  Word  Senses  
  6. 6. Word  Senses   •  Master  Students  à  NLP  course   •  Bachelor  Students  à  Seman(cs   6   Lecture  4:  Word  Senses  
  7. 7. From  Lect  2:  PropBank  &  Selec$onal  Restric$ons   •  PropBank  is  organized  by  word   senses  :  word  senses  are  different   aspects  of  meaning  of  a  word   •  Selec(onal  restric(ons…  we  can  use   seman(c  constraints  to   disambiguate  senses.  Ex:  eat,   serve….  bear  me  some  pa(ence…   7   Lecture  4:  Word  Senses  
  8. 8. Acknowledgements Most  slides  borrowed  from:   Dan  Jurafsky  and  James  H.  Mar(n   Some  slides  borrowed  from  D.  Jurafsky  and  C.  Manning    and  D.  Radev  (Coursera)     J&M(2015,  draf):  hgps://web.stanford.edu/~jurafsky/slp3/            
  9. 9. Outline   •  Word  Meaning   •  WordNet  and  Other  Lexical  Resources   •  Selec(onal  Restric(ons   9   Lecture  4:  Word  Senses  
  10. 10. Logic:  meaning  representa$on:  uppercase  words!   •  Constant   •  Variables   •  Predicates   •  Boolean  connec(ves   •  Quan(fiers   •  Brackets  and  comma  to  group  the  symbols  together   •  Ex:  A  woman  crosses  Sunset  Boulevard   Lecture  4:  Word  Senses  10   Formal  Seman(cs  
  11. 11. Defini$ons   •  Lexical  seman$cs  is  the  study  of  the  meaning  of  words  and  the  systema(c   meaning-­‐related  connec(ons  between  words.       •  A  word  sense  is  the  locus  of  word  meaning;  defini(ons  and  meaning  rela(ons   are  defined  at  the  level  of  the  word  sense  rather  than  wordforms.     •  Homonymy  is  the  rela(on  between  unrelated  senses  that  share  a  form.     •  Polysemy  is  the  rela(on  between  related  senses  that  share  a  form.   •  Synonymy  holds  between  different  words  with  the  same  meaning.   •  Hyponymy  and  hypernymy  rela(ons  hold  between  words  that  are  in  a  class   inclusion  rela(onship.     •  Meronymy  type  of  hierarchy  that  deals  with  part–whole  rela(onships.   •  WordNet  is  a  large  database  of  lexical  rela(ons  for  English   11     Lecture  4:  Word  Senses  
  12. 12. Word Meaning and Similarity Word  Senses  and   Word  Rela(ons  
  13. 13. Reminder:  lemma  and  wordform   •  A  lemma  or  cita$on  form   •  Same  stem,  part  of  speech,  rough  seman(cs   •  A  wordform   •  The  “inflected”  word  as  it  appears  in  text   Wordform   Lemma   banks   bank   sung   sing   duermes   dormir   Lecture  4:  Word  Senses  13   Cf.  token/type  ra(o:   crude  measure  of   lexical  densi(y:       If a text is 1,000 words long, it is said to have 1,000 "tokens". But a lot of these words will be repeated, and there may be only say 400 different words in the text. "Types", therefore, are the different words. The ratio between types and tokens in this example would be 40%. (source: wordsmith tools)  
  14. 14. Lemmas  have  senses   •  One  lemma  “bank”  can  have  many  meanings:   •  …a bank can hold the investments in a custodial account…! •  “…as agriculture burgeons on the east bank the river will shrink even more”   •  Sense  (or  word  sense)   •  A  discrete  representa(on                                        of  an  aspect  of  a  word’s  meaning.   •  The  lemma  bank  here  has  two  senses   1! 2! Sense  1:   Sense  2:   Lecture  4:  Word  Senses   14  
  15. 15. Other  examples?   Lecture  4:  Word  Senses  15  
  16. 16. Homonymy   Homonyms:  words  that  share  a  form  but  have   unrelated,  dis(nct  meanings:   •  bank1:  financial  ins(tu(on,        bank2:    sloping  land   •  bat1:  club  for  hiqng  a  ball,        bat2:    nocturnal  flying  mammal   1.  Homographs  (bank/bank,  bat/bat)   2.  Homophones:   1.  Write  and  right   2.  Piece  and  peace   Lecture  4:  Word  Senses  16  
  17. 17. Homonymy  causes  problems  for  NLP   applica$ons   •  Informa(on  retrieval   •  “bat care”! •  Machine  Transla(on   •  bat:    murciélago    (animal)  or    bate  (for  baseball)   •  Text-­‐to-­‐Speech   •  bass  (stringed  instrument)  vs.  bass  (fish)   •  There  would  be  no  ambiguity  for  Speech  to  Text:  why?   Lecture  4:  Word  Senses  17  
  18. 18. Polysemy   •  1.  The  bank  was  constructed  in  1875  out  of  local  red  brick.   •  2.  I  withdrew  the  money  from  the  bank     •  Are  those  the  same  sense?   •  Sense  2:  “A  financial  ins(tu(on”   •  Sense  1:  “The  building  belonging  to  a  financial  ins(tu(on”   •  A  polysemous  word  has  related  meanings   •  Most  non-­‐rare  words  have  mul(ple  meanings   Lecture  4:  Word  Senses  18  
  19. 19. Polysemy   19   Lecture  4:  Word  Senses  
  20. 20. •  Lots  of  types  of  polysemy  are  systema(c   •  School, university, hospital! •  All  can  mean  the  ins(tu(on  or  the  building.   •  A  systema(c  rela(onship:   •  Building                        Organiza(on   •  Other  such  kinds  of  systema(c  polysemy:     Author  (Jane Austen wrote Emma)            Works  of  Author  (I love Jane Austen)   Tree  (Plums have beautiful blossoms) ! !Fruit  (I ate a preserved plum)! Metonymy  or  Systema$c  Polysemy:     A  systema$c  rela$onship  between  senses   Lecture  4:  Word  Senses  20  
  21. 21. How  do  we  know  when  a  word  has  more   than  one  sense?   Lecture  4:  Word  Senses  21  
  22. 22. How  do  we  know  when  a  word  has  more   than  one  sense?   •  The  “zeugma”  test:  Two  senses  of  serve?   •  Which flights serve breakfast?! •  Does Lufthansa serve Philadelphia?! •  ?Does  Lufhansa  serve  breakfast  and  San  Jose?   •  Since  this  conjunc(on  sounds  weird,     •  we  say  that  these  are  two  different  senses  of  “serve”   Lecture  4:  Word  Senses  22  
  23. 23. Synonyms   •  Word  that  have  the  same  meaning  in  some  or  all  contexts.   •  filbert  /  hazelnut   •  couch  /  sofa   •  big  /  large   •  automobile  /  car   •  vomit  /  throw  up   •  Water  /  H20   •  Two  lexemes  are  synonyms     •  if  they  can  be  subs(tuted  for  each  other  in  all  situa(ons   •  If  so  they  have  the  same  proposi$onal  meaning   Lecture  4:  Word  Senses  23  
  24. 24. Synonyms   •  But  there  are  few  (or  no)  examples  of  perfect  synonymy.   •  Even  if  many  aspects  of  meaning  are  iden(cal   •  S(ll  may  not  preserve  the  acceptability  based  on  no(ons  of  politeness,   slang,  register,  genre,  etc.   •  Example:   •  Water/H20   •  Big/large   •  Brave/courageous        high  brow:  la(nate  words   Lecture  4:  Word  Senses  24  
  25. 25. Synonymy  is  a  rela$on     between  senses  rather  than  words   •  Consider  the  words  big  and  large   •  Are  they  synonyms?   •  How  big  is  that  plane?   •  Would  I  be  flying  on  a  large  or  small  plane?   •  How  about  here:   •  Miss  Nelson  became  a  kind  of  big  sister  to  Benjamin.   •  ?Miss  Nelson  became  a  kind  of  large  sister  to  Benjamin.   •  Why?   •  big  has  a  sense  that  means  being  older,  or  grown  up   •  large  lacks  this  sense   Lecture  4:  Word  Senses  25  
  26. 26. Synonymy:  Summary   26   Lecture  4:  Word  Senses  
  27. 27. Other  seman$c  rela$ons…   27   Lecture  4:  Word  Senses  
  28. 28. Antonyms   •  Senses  that  are  opposites  with  respect  to  one  feature  of  meaning   •  Otherwise,  they  are  very  similar!   dark/light short/long !fast/slow !rise/fall! hot/cold! up/down! in/out! •  More  formally:  antonyms  can   •  define  a  binary  opposi(on    or  be  at  opposite  ends  of  a  scale   •   long/short, fast/slow! •  Be  reversives:   •  rise/fall, up/down! Lecture  4:  Word  Senses   28  
  29. 29. Hyponymy  and  Hypernymy   •  One  sense  is  a  hyponym  of  another  if  the  first  sense  is  more   specific,  deno(ng  a  subclass  of  the  other   •  car  is  a  hyponym  of  vehicle   •  mango  is  a  hyponym  of  fruit   •  Conversely  hypernym/superordinate  (“hyper  is  super”)   •  vehicle  is  a  hypernym    of  car   •  fruit  is  a  hypernym  of  mango   Superordinate/hyper vehicle fruit furniture Subordinate/hyponym car mango chair Lecture  4:  Word  Senses  29  
  30. 30. Hyponymy  more  formally   •  Extensional:   •  The  class  denoted  by  the  superordinate  extensionally  includes  the  class   denoted  by  the  hyponym   •  Entailment:   •  A  sense  A  is  a  hyponym  of  sense  B  if  being  an  A  entails  being  a  B   •  Hyponymy  is  usually  transi(ve     •  (A  hypo  B  and  B  hypo  C  entails  A  hypo  C)   •  Another  name:  the  IS-­‐A  hierarchy   •  A  IS-­‐A  B            (or  A  ISA  B)   •  B  subsumes  A   Lecture  4:  Word  Senses  30  
  31. 31. Hyponyms  and  Instances   •  WordNet  has  both  classes  and  instances.   •  An  instance  is  an  individual,  a  proper  noun  that  is  a  unique  en(ty   •  San Francisco is  an  instance  of  city! •  But  city  is  a  class   •  city  is  a  hyponym  of        municipality...location...! 31   Lecture  4:  Word  Senses  
  32. 32. WordNet  and  other   Online  Thesauri  
  33. 33. Applica$ons  of  Thesauri  and  Ontologies   •  Informa(on  Extrac(on   •  Informa(on  Retrieval   •  Ques(on  Answering   •  Bioinforma(cs  and  Medical  Informa(cs   •  Machine  Transla(on   Lecture  4:  Word  Senses  33  
  34. 34. WordNet   34   Lecture  4:  Word  Senses  
  35. 35. Synsets   35   Lecture  4:  Word  Senses  
  36. 36. How  is  “sense”  defined  in  WordNet?   •  The  synset  (synonym  set),  the  set  of  near-­‐synonyms,   instan(ates  a  sense  or  concept,  with  a  gloss   •  Example:  chump  as  a  noun  with  the  gloss:   “a  person  who  is  gullible  and  easy  to  take  advantage  of”   •  This  sense  of  “chump”  is  shared  by  9  words:   chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2! •  Each  of  these  senses  have  this  same  gloss   •  (Not  every  sense;  sense  2  of  gull  is  the  aqua(c  bird)     Lecture  4:  Word  Senses   36   gullible=naive  
  37. 37. Tree-­‐like  Structure   37   Lecture  4:  Word  Senses  
  38. 38. WordNet:  bar  1/6   38   Lecture  4:  Word  Senses  
  39. 39. 2/6   39   Lecture  4:  Word  Senses  
  40. 40. 3/6   40   Lecture  4:  Word  Senses  
  41. 41. 4/6   41   Lecture  4:  Word  Senses  
  42. 42. 5/6   42   Lecture  4:  Word  Senses  
  43. 43. 6/6   43   Lecture  4:  Word  Senses  
  44. 44. Polysemy   44   Lecture  4:  Word  Senses  
  45. 45. WordNet  3.0   •  A  hierarchically  organized  lexical  database   •  On-­‐line  thesaurus  +  aspects  of  a  dic(onary   •  Some  other  languages  available  or  under  development   •  (Arabic,  Finnish,  German,  Portuguese…)   Category   Unique  Strings   Noun   117,798   Verb   11,529   Adjec(ve   22,479   Adverb   4,481   Lecture  4:  Word  Senses   45  
  46. 46. Senses  of  “bass”  in  Wordnet   Lecture  4:  Word  Senses  46  
  47. 47. WordNet  Hypernym  Hierarchy  for  “bass”   Lecture  4:  Word  Senses  47  
  48. 48. WordNet  Noun  Rela$ons   Lecture  4:  Word  Senses  48  
  49. 49. WordNet  3.0   •  Where  it  is:   •  hgp://wordnetweb.princeton.edu/perl/webwn   •  Libraries   •  Python:    WordNet    from  NLTK   •  hgp://www.nltk.org/Home   •  Java:   •  JWNL,  extJWNL  on  sourceforge  Lecture  4:  Word  Senses  
  50. 50. Synset •  MeSH  (Medical  Subject  Headings)   •  177,000  entry  terms    that  correspond  to  26,142  biomedical   “headings”   •  Hemoglobins   Entry  Terms:    Eryhem,  Ferrous  Hemoglobin,  Hemoglobin   Defini$on:    The  oxygen-­‐carrying  proteins  of  ERYTHROCYTES.   They  are  found  in  all  vertebrates  and  some  invertebrates.   The  number  of  globin  subunits  in  the  hemoglobin  quaternary   structure  differs  between  species.  Structures  range  from   monomeric  to  a  variety  of  mul(meric  arrangements   MeSH:  Medical  Subject  Headings   thesaurus  from  the  Na$onal  Library  of  Medicine   Lecture  4:  Word  Senses  50  
  51. 51. The  MeSH  Hierarchy   •  a   51   Lecture  4:  Word  Senses  
  52. 52. Uses  of  the  MeSH  Ontology   •  Provide  synonyms  (“entry  terms”)   •  E.g.,  glucose  and  dextrose   •  Provide  hypernyms  (from  the  hierarchy)   •  E.g.,  glucose  ISA  monosaccharide   •  Indexing  in  MEDLINE/PubMED  database   •  NLM’s  bibliographic  database:     •  20  million  journal  ar(cles   •  Each  ar(cle  hand-­‐assigned  10-­‐20  MeSH  terms   Lecture  4:  Word  Senses  52  
  53. 53. Other  resources   53   Lecture  4:  Word  Senses  
  54. 54. BabelNet   54   Lecture  4:  Word  Senses  
  55. 55. Selectional Restrictions
  56. 56. Selec$onal  Restric$ons   Consider  the  two  interpreta(ons  of:    I  want  to  eat  someplace  nearby.     a)  sensible:     Eat  is  intransi(ve  and  “someplace  nearby”  is  a  loca(on  adjunct   b)  Speaker  is  Godzilla:  a  monster  that  likes  ea(ng  buildings!!!            Eat  is  transi(ve  and  “someplace  nearby”  is  a  direct  object   How  do  we  know  speaker  didn’t  mean  b)    ?    Because  the  THEME  of  ea(ng  tends  to  be  something  edible   56   Lecture  4:  Word  Senses  
  57. 57. Selec$onal  restric$ons  are  associated  with  senses   •  The  restaurant  serves  green-­‐lipped  mussels.     •  THEME  is  some  kind  of  food   •  Which  airlines  serve  Denver?     •  THEME  is  an  appropriate  loca(on   57   Lecture  4:  Word  Senses   apply  zeugma  test  
  58. 58. Represen$ng  selec$onal  restric$ons   t consists of a single variable that stands for the event, a predicat f event, and variables and relations for the event roles. Ignoring t ctures and using thematic roles rather than deep event roles, the on of a verb like eat might look like the following: 9e,x,y Eating(e)^Agent(e,x)^Theme(e,y) representation, all we know about y, the filler of the THEME ro iated with an Eating event through the Theme relation. To sti l restriction that y must be something edible, we simply add a ne : 9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)^EdibleThing(y)58   ntribution of a verb like eat might look like the following: 9e,x,y Eating(e)^Agent(e,x)^Theme(e,y) th this representation, all we know about y, the filler of the THEME role, is th s associated with an Eating event through the Theme relation. To stipulate t ectional restriction that y must be something edible, we simply add a new term t effect: 9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)^EdibleThing(y) => dish => nutriment, nourishment, nutrition... => food, nutrient => substance => matter => physical entity => entity Figure 22.6 Evidence from WordNet that hamburgers are edible. When a phrase like ate a hamburger is encountered, a semantic analyzer can form the following kind of representation: 9e,x,y Eating(e)^Eater(e,x)^Theme(e,y)^EdibleThing(y)^Hamburger(y) This representation is perfectly reasonable since the membership of y in the category Hamburger is consistent with its membership in the category EdibleThing, assuming Instead  of  represen(ng  “eat”  as:   Just  add:   And  “eat  a  hamburger”  becomes   But  this  assumes  we  have  a  large  knowledge  base  of  facts   about  edible  things  and  hamburgers  and  whatnot.  
  59. 59. Let’s  use  WordNet  synsets  to  specify   selec$onal  restric$ons   •  The  THEME  of  eat  must  be  WordNet  synset  {food,  nutrient}     “any  substance  that  can  be  metabolized  by  an  animal  to  give  energy  and  build  8ssue”   •  Similarly   THEME  of  imagine:  synset  {en(ty}   THEME  of  li=:  synset  {physical  en(ty}   THEME  of  diagonalize:  synset  {matrix}     •  This  allows   imagine  a  hamburger    and    li=  a  hamburger,     •  Correctly  rules  out     diagonalize  a  hamburger.    59   Lecture  4:  Word  Senses  
  60. 60. The end

×