Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mikel Artetxe - 2017 - Learning bilingual word embeddings with (almost) no bilingual data

364 views

Published on

ACL 2017 - P17-1042

Published in: Education
  • Be the first to comment

Mikel Artetxe - 2017 - Learning bilingual word embeddings with (almost) no bilingual data

  1. 1. Learning bilingual word embeddings with (almost) no bilingual data Mikel Artetxe, Gorka Labaka, Eneko Agirre IXA NLP group – University of the Basque Country (UPV/EHU)
  2. 2. Who cares?
  3. 3. Who cares? word embeddings are useful!
  4. 4. Who cares? word embeddings are useful!
  5. 5. Who cares? word embeddings are useful!
  6. 6. Who cares? word embeddings are useful!
  7. 7. Who cares? - inherently crosslingual tasks word embeddings are useful!
  8. 8. Who cares? - inherently crosslingual tasks - crosslingual transfer learning word embeddings are useful!
  9. 9. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training word embeddings are useful!
  10. 10. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training word embeddings are useful!
  11. 11. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training Previous work word embeddings are useful!
  12. 12. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training Previous work - parallel corpora word embeddings are useful!
  13. 13. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training Previous work - parallel corpora - comparable corpora word embeddings are useful!
  14. 14. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training Previous work - parallel corpora - comparable corpora - (big) dictionaries word embeddings are useful!
  15. 15. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training Previous work - parallel corpora - comparable corpora - (big) dictionaries word embeddings are useful!
  16. 16. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training This talkPrevious work - parallel corpora - comparable corpora - (big) dictionaries word embeddings are useful!
  17. 17. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training This talk - 25 word dictionary Previous work - parallel corpora - comparable corpora - (big) dictionaries word embeddings are useful!
  18. 18. Who cares? - inherently crosslingual tasks - crosslingual transfer learning bilingual signal for training This talk - 25 word dictionary - numerals (1, 2, 3…) Previous work - parallel corpora - comparable corpora - (big) dictionaries word embeddings are useful!
  19. 19. Bilingual embedding mappings
  20. 20. Bilingual embedding mappings 𝑋 𝑍 Basque English
  21. 21. Bilingual embedding mappings 𝑋 𝑍 Seed dictionary Basque English
  22. 22. Bilingual embedding mappings Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑋 𝑍 Seed dictionary Basque English
  23. 23. Bilingual embedding mappings Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 Seed dictionary Basque English
  24. 24. Bilingual embedding mappings Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 Seed dictionary Basque English
  25. 25. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 Seed dictionary Basque English
  26. 26. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 Seed dictionary Basque English
  27. 27. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 arg min 𝑊∈𝑂(𝑛) ෍ 𝑖 𝑋𝑖∗ 𝑊 − 𝑍𝑗∗ 2 Basque English
  28. 28. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 arg min 𝑊∈𝑂(𝑛) ෍ 𝑖 𝑋𝑖∗ 𝑊 − 𝑍𝑗∗ 2 Basque English
  29. 29. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 arg min 𝑊∈𝑂(𝑛) ෍ 𝑖 𝑋𝑖∗ 𝑊 − 𝑍𝑗∗ 2 Basque English
  30. 30. Bilingual embedding mappings 𝑋1,∗ 𝑋2,∗ ⋮ 𝑋 𝑛,∗ 𝑊 ≈ 𝑍1,∗ 𝑍2,∗ ⋮ 𝑍 𝑛,∗ Txakur Sagar ⋮ Egutegi Dog Apple ⋮ Calendar 𝑊 𝑋 𝑍 𝑋𝑊 arg min 𝑊∈𝑂(𝑛) ෍ 𝑖 𝑋𝑖∗ 𝑊 − 𝑍𝑗∗ 2 Basque English
  31. 31. Bilingual embedding mappings
  32. 32. Bilingual embedding mappings Monolingual embeddings
  33. 33. Bilingual embedding mappings Monolingual embeddings Dictionary
  34. 34. Bilingual embedding mappings Monolingual embeddings Dictionary
  35. 35. Bilingual embedding mappings Mapping Monolingual embeddings Dictionary
  36. 36. Bilingual embedding mappings Mapping Monolingual embeddings Dictionary
  37. 37. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary
  38. 38. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary better!
  39. 39. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary better!
  40. 40. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping better!
  41. 41. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping better!
  42. 42. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary better!
  43. 43. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary better! even better!
  44. 44. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary better! even better!
  45. 45. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary better! even better!
  46. 46. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary better! even better!
  47. 47. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary better! even better!
  48. 48. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary better! even better! even better!
  49. 49. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary
  50. 50. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary proposed self-learning method
  51. 51. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary proposed self-learning method formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016)
  52. 52. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary proposed self-learning method formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016) Too good to be true?
  53. 53. Bilingual embedding mappings Mapping Dictionary Monolingual embeddings Dictionary proposed self-learning method formalization and implementation details in the paper based on the mapping method of Artetxe et al. (2016) Too good to be true?
  54. 54. Experiments
  55. 55. Experiments • Dataset by Dinu et al. (2015)
  56. 56. Experiments • Dataset by Dinu et al. (2015) English-Italian
  57. 57. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish English-Italian
  58. 58. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish English-Italian English-German English-Finnish
  59. 59. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) English-Italian English-German English-Finnish
  60. 60. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs English-Italian English-German English-Finnish 5,000 5,000 5,000
  61. 61. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs English-Italian English-German English-Finnish 5,000 25 5,000 25 5,000 25
  62. 62. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num.
  63. 63. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num.
  64. 64. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. word translation induction
  65. 65. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) word translation induction
  66. 66. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) Our method word translation induction
  67. 67. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77% Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47% word translation induction
  68. 68. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77% Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47% word translation induction
  69. 69. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77% Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47% word translation induction
  70. 70. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77% Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47% word translation induction
  71. 71. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian English-German English-Finnish 5,000 25 num. 5,000 25 num. 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77% Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47% word translation induction
  72. 72. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs English-Italian 5,000 25 num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% Zhang et al. (2016) 36.73% 0.07% 0.27% Artetxe et al. (2016) 39.27% 0.07% 0.40% Our method 39.67% 37.27% 39.40% word translation induction
  73. 73. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
  74. 74. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals crosslingual word similarity
  75. 75. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE WS RG WS crosslingual word similarity
  76. 76. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS crosslingual word similarity
  77. 77. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl crosslingual word similarity
  78. 78. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict crosslingual word similarity
  79. 79. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict Our method 5k dict 25 dict num. crosslingual word similarity
  80. 80. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% 25 dict 62.6% 74.9% 61.2% num. 62.8% 73.9% 60.4% crosslingual word similarity
  81. 81. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% 25 dict 62.6% 74.9% 61.2% num. 62.8% 73.9% 60.4% crosslingual word similarity
  82. 82. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN-IT EN-DE Bi. data WS RG WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% Our method 5k dict 62.4% 74.2% 61.6% 25 dict 62.6% 74.9% 61.2% num. 62.8% 73.9% 60.4% crosslingual word similarity
  83. 83. Why does it work?
  84. 84. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary
  85. 85. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary small
  86. 86. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary small large
  87. 87. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary small no error large
  88. 88. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary small no error large errors
  89. 89. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary small no error large errors
  90. 90. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary small no error large errors better?
  91. 91. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Dictionary small no error large errors better? worse?
  92. 92. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary small no error large errors better? worse?
  93. 93. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary small no error large errors better? worse? even better?
  94. 94. Why does it work? Mapping Dictionary Monolingual embeddings Dictionary Mapping Mapping Dictionary Dictionary small no error large errors better? worse? even better? even worse?
  95. 95. Why does it work? 𝑍𝑋𝑊
  96. 96. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  97. 97. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective: Independent from seed dictionary!
  98. 98. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  99. 99. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  100. 100. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  101. 101. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  102. 102. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  103. 103. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  104. 104. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  105. 105. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  106. 106. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  107. 107. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  108. 108. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  109. 109. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  110. 110. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  111. 111. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  112. 112. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  113. 113. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  114. 114. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  115. 115. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective: Independent from seed dictionary!
  116. 116. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective: Independent from seed dictionary! So why do we need a seed dictionary?
  117. 117. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective: Independent from seed dictionary! So why do we need a seed dictionary? Avoid poor local optima!
  118. 118. Why does it work? 𝑍𝑋𝑊 𝑊∗ = arg max 𝑊 ෍ 𝑖 max 𝑗 𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇 = 𝑊 𝑇 𝑊 = 𝐼Implicit objective:
  119. 119. Conclusions 𝑍𝑋𝑊
  120. 120. Conclusions 𝑍𝑋𝑊 • Simple self-learning method to train bilingual embedding mappings
  121. 121. Conclusions 𝑍𝑋𝑊 • Simple self-learning method to train bilingual embedding mappings • High quality results with almost no supervision (25 words, numerals)
  122. 122. Conclusions 𝑍𝑋𝑊 • Simple self-learning method to train bilingual embedding mappings • High quality results with almost no supervision (25 words, numerals) • Implicit optimization objective independent from seed dictionary
  123. 123. Conclusions 𝑍𝑋𝑊 • Simple self-learning method to train bilingual embedding mappings • High quality results with almost no supervision (25 words, numerals) • Implicit optimization objective independent from seed dictionary • Seed dictionary necessary to avoid poor local optima
  124. 124. Conclusions 𝑍𝑋𝑊 • Simple self-learning method to train bilingual embedding mappings • High quality results with almost no supervision (25 words, numerals) • Implicit optimization objective independent from seed dictionary • Seed dictionary necessary to avoid poor local optima • Future work: fully unsupervised training
  125. 125. One more thing… 𝑍𝑋𝑊>
  126. 126. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git
  127. 127. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git >
  128. 128. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py
  129. 129. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals
  130. 130. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB
  131. 131. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB
  132. 132. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB >
  133. 133. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh
  134. 134. One more thing… 𝑍𝑋𝑊> git clone https://github.com/artetxem/vecmap.git > python3 vecmap/map_embeddings.py --self_learning --numerals SRC_INPUT.EMB TRG_INPUT.EMB SRC_OUTPUT.EMB TRG_OUTPUT.EMB > vecmap/reproduce_acl2017.sh -------------------------------------------------------------------------------- ENGLISH-ITALIAN -------------------------------------------------------------------------------- 5,000 WORD DICTIONARY - Mikolov et al. (2013a) | Translation: 34.93% MWS353: 62.66% - Xing et al. (2015) | Translation: 36.87% MWS353: 61.41% - Zhang et al. (2016) | Translation: 36.73% MWS353: 61.62% - Artetxe et al. (2016) | Translation: 39.27% MWS353: 61.74% - Proposed method | Translation: 39.67% MWS353: 62.35% 25 WORD DICTIONARY - Mikolov et al. (2013a) | Translation: 0.00% MWS353: -6.42% - Xing et al. (2015) | Translation: 0.00% MWS353: 19.49% - Zhang et al. (2016) | Translation: 0.07% MWS353: 15.52% - Artetxe et al. (2016) | Translation: 0.07% MWS353: 17.45% - Proposed method | Translation: 37.27% MWS353: 62.64% NUMERAL DICTIONARY - Mikolov et al. (2013a) | Translation: 0.00% MWS353: 28.75% - Xing et al. (2015) | Translation: 0.13% MWS353: 27.75% - Zhang et al. (2016) | Translation: 0.27% MWS353: 27.38% - Artetxe et al. (2016) | Translation: 0.40% MWS353: 24.85% - Proposed method | Translation: 39.40% MWS353: 62.82%
  135. 135. Thank you! 𝑍𝑋𝑊 https://github.com/artetxem/vecmap

×