Encoding syntactic dependencies by vector permutation

759 views

Published on

Distributional approaches are based on a simple hypothesis: the meaning of a word can be inferred from its usage. The application of that idea to the vector space model makes possible the construction of a WordSpace in which words are represented by mathematical points in a geometric space. Similar words are represented close in this space and the definition of ``word usage'' depends on the definition of the context used to build the space, which can be the whole document, the sentence in which the word occurs, a fixed window of words, or a specific syntactic context. However, in its original formulation WordSpace can take into account only one definition of context at a time. We propose an approach based on vector permutation and Random Indexing to encode several syntactic contexts in a single WordSpace. Moreover, we propose some operations in this space and report the results of an evaluation performed using the GEMS 2011 Shared Evaluation data.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
759
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Encoding syntactic dependencies by vector permutation

  1. 1. Encoding  syntac-c  dependencies   by  vector  permuta-on  Pierpaolo  Basile,  Annalina  Caputo  and  Giovanni  Semeraro   Department  of  Computer  Science   University  of  Bari  “Aldo  Moro”  (Italy)   GEMS  2011:  GEometrical  Models  of  Natural  Language  Seman-cs   Edinburgh,  Scotland  -­‐  July  31st,  2011  
  2. 2. Mo-va-on  •  meaning  is  its  use  •  the  meaning  of  a  word   is  determined  by  the  set   of  textual  contexts  in   which  it  appears  •  one  defini-on  of   context  at  a  -me   2  
  3. 3. Building  Blocks  •  Random  Indexing  •  Dependency  Parser  •  Vector  Permuta-on   3  
  4. 4. Random  Indexing  •  assign  a  context  vector  to  each  context   element  (e.g.  document,  passage,  term,  …)  •  term  vector  is  the  sum  of  the  context  vectors   in  which  the  term  occurs   –  some-mes  the  context  vector  could  be  boosted   by  a  score  (e.g.  term  frequency,  PMI,  …)     4  
  5. 5. Context  Vector   0  0  0  0  0  0  0  -­‐1  0  0  0  0  1  0  0  -­‐1  0  1  0  0  0  0  1  0  0  0  0  -­‐1    •  sparse  •  high  dimensional  •  ternary  {-­‐1,  0,  +1}  •  small  number  of  randomly  distributed  non-­‐ zero  elements   5  
  6. 6. Random  Indexing  (formal)   n,k n,m m,k B =A R k << m B  preserves  the  distance   between  points   (Johnson-­‐Lindenstrauss  lemma)   dr = c ! d 6  
  7. 7. Dependency  parser   John  eats  a  red  apple.   subject   object  John   eats   apple   modifier   red   7  
  8. 8. Vector  permuta-on  •  using  permuta-on  of  elements  in  random   vector  to  encode  several  contexts   –  right  shib  of  n  elements  to  encode  dependents   (permuta-on)   –  leb  shib  of  n  elements  to  encode  heads  (inverse   permuta-on)  •  choose  a  different  n  for  each  kind  of   dependency   8  
  9. 9. Method  •  assign  a  context  vector  to  each  term  •  assign  a  shib  func-on  (Πn)  to  each  kind  of   dependency  •  each  term  is  represented  by  a  vector  which  is   –  the  sum  of  the  permuted  vectors  of  all  the   dependent  terms   –  the  sum  of  the  inverse  permuted  vectors  of  all  the   head  terms   9  
  10. 10. Example   John  -­‐>  (0,  0,  0,  0,  0,  0,  1,  0,  -­‐1,  0)   eat  -­‐>  (1,  0,  0,  0,  -­‐1,  0,  0  ,0  ,0  ,0)   John  eats  a  red  apple   red-­‐>  (0,  0,  0,  1,  0,  0,  0,  -­‐1,  0,  0)   apple  -­‐>  (1,  0,  0,  0,  0,  0,  0,  -­‐1,  0,  0)   mod-­‐>Π3;  obj-­‐>Π7    (apple)=Π3(red)+Π-­‐7(eat)=…   10  
  11. 11. Example   John  -­‐>  (0,  0,  0,  0,  0,  0,  1,  0,  -­‐1,  0)   eat  -­‐>  (1,  0,  0,  0,  -­‐1,  0,  0  ,0  ,0  ,0)   John  eats  a  red  apple   red-­‐>  (0,  0,  0,  1,  0,  0,  0,  -­‐1,  0,  0)   apple  -­‐>  (1,  0,  0,  0,  0,  0,  0,  -­‐1,  0,  0)   mod-­‐>Π3;  obj-­‐>Π7      (apple)=Π3(red)+Π-­‐7(eat)=…    …=(-­‐1,  0,  0,  0,  0,  0,  1,  0,  0,  0)  +  (0,  0,  0,  1,  0,  0,  0,  -­‐1,  0,  0)     3  right  shibs   7  leb  shibs   11  
  12. 12. Output   R   B  Vector  space  of  random   Vector  space  of  terms   context  vectors   12  
  13. 13. Query  1/4  •  similarity  between  terms   –  cosine  similarity  between  terms  vectors  in  B   –  terms  are  similar  if  they  occur  in  similar  syntac-c   contexts     13  
  14. 14. Query  2/4   Words  similar  to  “provide”  offer   0.855  supply   0.819  deliver   0.801  give   0.787  contain   0.784  require   0.782  present   0.778   14  
  15. 15. Query  3/4  •  similarity  between  terms  exploi-ng   dependencies      what  are  the  objects  of  the  word  “provide”?   1.  get  the  term  vector  for  “provide”  in  B   2.  compute  the  similarity  with  all  permutated   vectors  in  R  using  the  permuta-on  assigned  to   “obj”  rela-on     15  
  16. 16. Query  4/4  What  are  the  objects  of  the  word  “provide”?   informa-on   0.344   food   0.208   support   0.143   energy   0.143   job   0.142   16  
  17. 17. Composi-onal  seman-cs  1/2  •  words  are  represented  in  isola-on  •  represent  complex  structure  (phrase  or   sentence)  is  a  challenge  task   –  IR,  QA,  IE,  Text  Entailment,  …  •  how  to  combine  words   –  tensor  product  of  words   –  Clark  and  Pulman  suggest  to  take  into  account   symbolic  features  (syntac-c  dependencies)   17  
  18. 18. Composi-onal  seman-cs  2/2   man  reads  magazine   (Clark  and  Pulman)  man ! subj ! read ! obj ! magazine 18  
  19. 19. Similarity  between  structures   man  reads  magazine   woman  browses  newspaper   man ! subj ! read ! obj ! magazinewoman ! subj ! browse ! obj ! newspaper 19  
  20. 20. …a  bit  of  math  (w1 ! w2 )" (w3 ! w4 ) = (w1 " w3 ) # (w2 " w4 )man ! woman " read ! browse " magazine ! newspaper 20  
  21. 21. System  setup   •  Implemented  in  JAVA   •  Two  corpora   –  TASA:  800K  sentences   and  9M  dependencies   –  a  por-on  of  ukWaC:  7M   sentences  and  127M   dependencies   –  40,000  most  frequent   words   •  Dependency  parser   –  MINIPAR   21  
  22. 22. Evalua-on  •  GEMS  2011  Shared  Task  for  composi-onal   seman-cs   –  list  of  two  pairs  of  words  combina-on   •  rated  by  humans   •  5,833  rates   •  encoded  dependencies:  subj,  obj,  mod,  nn   –  GOAL:  compare  the  system  performance  against   humans  scores   •  Spearman  correla-on   22  
  23. 23. Results  (old)  Corpus   Combina-on   ρ  TASA   verb-­‐obj   0.260   adj-­‐noun   0.637   compound  nouns   0.341   overall   0.275  ukWaC   verb-­‐obj   0.292   adj-­‐noun   0.445   compound  nouns   0.227   overall   0.261   23  
  24. 24. Results  (new)  Corpus   Combina-on   ρ  TASA   verb-­‐obj   0.160   adj-­‐noun   0.435   compound  nouns   0.243   overall   0.186  ukWaC   verb-­‐obj   0.190   adj-­‐noun   0.303   compound  nouns   0.159   overall   0.179   24  
  25. 25. Conclusion  and  Future  Work  •  Conclusion   –  encode  syntac-c  dependencies  using  vector   permuta-ons  and  Random  Indexing   –  early  arempt  in  seman-c  composi-on  •  Future  Work   –  deeper  evalua-on  (in  vivo)   –  more  formal  study  about  seman-c  composi-on   –  tackle  scalability  problem   –  try  to  encode  other  kinds  of  context   25  
  26. 26. That’s  all  folks!   26  

×