This document presents a method for learning bilingual word embeddings using a small bilingual dictionary. It proposes a self-learning method based on mapping monolingual word embeddings from two languages using an initial seed dictionary. The method is evaluated on word translation induction tasks between English-Italian, English-German, and English-Finnish using datasets and dictionaries of various sizes. Results show the method achieves competitive performance compared to previous work, including with very small seed dictionaries of only 25 word pairs, demonstrating it can learn bilingual embeddings with almost no bilingual data.
How to expand your nlp solution to new languages using transfer learningLena Shakurova
Expanding NLP models to new languages typically involves annotating new data sets which is time and resource expensive. To reduce the costs one can use cross-lingual embeddings enabling knowledge transfer from languages with sufficient training data to low-resource languages. In this talk, you will hear about the challenges in learning cross-lingual embeddings for multilingual resume parsing.
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...Lifeng (Aaron) Han
Language Processing and Knowledge in the Web - Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, (GSCL 2013), Darmstadt, Germany, on September 25–27, 2013. LNCS Vol. 8105, Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch. Open tool https://github.com/aaronlifenghan/aaron-project-hppr
How to expand your nlp solution to new languages using transfer learningLena Shakurova
Expanding NLP models to new languages typically involves annotating new data sets which is time and resource expensive. To reduce the costs one can use cross-lingual embeddings enabling knowledge transfer from languages with sufficient training data to low-resource languages. In this talk, you will hear about the challenges in learning cross-lingual embeddings for multilingual resume parsing.
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...Lifeng (Aaron) Han
Language Processing and Knowledge in the Web - Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, (GSCL 2013), Darmstadt, Germany, on September 25–27, 2013. LNCS Vol. 8105, Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch. Open tool https://github.com/aaronlifenghan/aaron-project-hppr
Sequencing run grief counseling: counting kmers at MG-RASTwltrimbl
Talk by Will Trimble of Argonne National Laboratory on April 29, 2014, at UIC's department of Ecology & Evolution on visualizing and interpreting the redundancy spectrum of long kmers in high-throughput sequence data.
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
Parallel corpora play a central role in current approaches to machine and computer-assisted translation and also in any corpus-based study that involves original text and its translation. This talk motivates the use of parallel data, as well as its desired properties. It then introduces practical methodologies to automatically acquire and prepare parallel data for the task at hand. Finally, it glances at the neighbouring field of Translation Studies to assert that translations can differ to a great extent depending on the strategy followed by the translator, which might lead to the translation being more or less appropriate for its use in corpus-based studies.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Assessment and eligibility when working with bilingual children for slide sha...Bilinguistics
This presentation addresses assessment procedures with infants and toddlers from bilingual homes. We review eligibility criteria for children in the Texas ECI programs.
Templates in linguistics - Why Garbage GarbageHussein Ghaly
This is my presentation at SQUID 2014 introducing my model for language acquisition that is based mainly on templates. This started with an observation of the patterns used by my son as he was learning to speak. I included a brief survey of other areas in linguistics which also make use of templates; i.e. Information Extraction and Machine Translation.
MFL CPD delivered in Plymouth for Plymouth Learning Trust. This presentation raises questions about what the "Highly Able Linguist" label actually may mean in our current educational climate, how (in)helpful this label can be, and offers plenty of borrowed, adapted and original take-away teaching ideas for challenging learners in MFL.
Sequencing run grief counseling: counting kmers at MG-RASTwltrimbl
Talk by Will Trimble of Argonne National Laboratory on April 29, 2014, at UIC's department of Ecology & Evolution on visualizing and interpreting the redundancy spectrum of long kmers in high-throughput sequence data.
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
Parallel corpora play a central role in current approaches to machine and computer-assisted translation and also in any corpus-based study that involves original text and its translation. This talk motivates the use of parallel data, as well as its desired properties. It then introduces practical methodologies to automatically acquire and prepare parallel data for the task at hand. Finally, it glances at the neighbouring field of Translation Studies to assert that translations can differ to a great extent depending on the strategy followed by the translator, which might lead to the translation being more or less appropriate for its use in corpus-based studies.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Assessment and eligibility when working with bilingual children for slide sha...Bilinguistics
This presentation addresses assessment procedures with infants and toddlers from bilingual homes. We review eligibility criteria for children in the Texas ECI programs.
Templates in linguistics - Why Garbage GarbageHussein Ghaly
This is my presentation at SQUID 2014 introducing my model for language acquisition that is based mainly on templates. This started with an observation of the patterns used by my son as he was learning to speak. I included a brief survey of other areas in linguistics which also make use of templates; i.e. Information Extraction and Machine Translation.
MFL CPD delivered in Plymouth for Plymouth Learning Trust. This presentation raises questions about what the "Highly Able Linguist" label actually may mean in our current educational climate, how (in)helpful this label can be, and offers plenty of borrowed, adapted and original take-away teaching ideas for challenging learners in MFL.
Similar to Mikel Artetxe - 2017 - Learning bilingual word embeddings with (almost) no bilingual data (20)
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Unit 8 - Information and Communication Technology (Paper I).pdf
Mikel Artetxe - 2017 - Learning bilingual word embeddings with (almost) no bilingual data
1. Learning bilingual word
embeddings with (almost)
no bilingual data
Mikel Artetxe, Gorka Labaka, Eneko Agirre
IXA NLP group – University of the Basque Country (UPV/EHU)
8. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
word embeddings are useful!
9. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
word embeddings are useful!
10. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
word embeddings are useful!
11. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
Previous work
word embeddings are useful!
12. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
Previous work
- parallel corpora
word embeddings are useful!
13. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
Previous work
- parallel corpora
- comparable corpora
word embeddings are useful!
14. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
Previous work
- parallel corpora
- comparable corpora
- (big) dictionaries
word embeddings are useful!
15. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
Previous work
- parallel corpora
- comparable corpora
- (big) dictionaries
word embeddings are useful!
16. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
This talkPrevious work
- parallel corpora
- comparable corpora
- (big) dictionaries
word embeddings are useful!
17. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
This talk
- 25 word dictionary
Previous work
- parallel corpora
- comparable corpora
- (big) dictionaries
word embeddings are useful!
18. Who cares?
- inherently crosslingual tasks
- crosslingual transfer learning
bilingual signal
for training
This talk
- 25 word dictionary
- numerals (1, 2, 3…)
Previous work
- parallel corpora
- comparable corpora
- (big) dictionaries
word embeddings are useful!
51. Bilingual embedding mappings
Mapping Dictionary
Monolingual embeddings
Dictionary
proposed self-learning method
formalization and implementation details in the paper
based on the mapping method of Artetxe et al. (2016)
52. Bilingual embedding mappings
Mapping Dictionary
Monolingual embeddings
Dictionary
proposed self-learning method
formalization and implementation details in the paper
based on the mapping method of Artetxe et al. (2016)
Too good to be true?
53. Bilingual embedding mappings
Mapping Dictionary
Monolingual embeddings
Dictionary
proposed self-learning method
formalization and implementation details in the paper
based on the mapping method of Artetxe et al. (2016)
Too good to be true?
58. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
English-Italian English-German English-Finnish
59. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
English-Italian English-German English-Finnish
60. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs
English-Italian English-German English-Finnish
5,000 5,000 5,000
61. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs
English-Italian English-German English-Finnish
5,000 25 5,000 25 5,000 25
62. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
63. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
64. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
word translation induction
65. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a)
Xing et al. (2015)
Zhang et al. (2016)
Artetxe et al. (2016)
word translation induction
66. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a)
Xing et al. (2015)
Zhang et al. (2016)
Artetxe et al. (2016)
Our method
word translation induction
67. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56%
Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42%
Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77%
Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47%
word translation induction
68. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56%
Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42%
Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77%
Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47%
word translation induction
69. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56%
Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42%
Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77%
Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47%
word translation induction
70. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56%
Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42%
Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77%
Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47%
word translation induction
71. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian English-German English-Finnish
5,000 25 num. 5,000 25 num. 5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56%
Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42%
Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87% 0.13% 0.73% 30.62% 0.21% 0.77%
Our method 39.67% 37.27% 39.40% 40.87% 39.60% 40.27% 28.72% 28.16% 26.47%
word translation induction
72. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
⇒ Test dictionary: 1,500 word pairs
English-Italian
5,000 25 num.
Mikolov et al. (2013a) 34.93% 0.00% 0.00%
Xing et al. (2015) 36.87% 0.00% 0.13%
Zhang et al. (2016) 36.73% 0.07% 0.27%
Artetxe et al. (2016) 39.27% 0.07% 0.40%
Our method 39.67% 37.27% 39.40%
word translation induction
73. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
74. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
crosslingual word similarity
75. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
WS RG WS
crosslingual word similarity
76. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
crosslingual word similarity
77. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl
crosslingual word similarity
78. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl
Mikolov et al. (2013a) 5k dict
Xing et al. (2015) 5k dict
Zhang et al. (2016) 5k dict
Artetxe et al. (2016) 5k dict
crosslingual word similarity
79. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl
Mikolov et al. (2013a) 5k dict
Xing et al. (2015) 5k dict
Zhang et al. (2016) 5k dict
Artetxe et al. (2016) 5k dict
Our method
5k dict
25 dict
num.
crosslingual word similarity
80. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl 33.1% 33.5% 35.6%
Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8%
Xing et al. (2015) 5k dict 61.4% 70.0% 59.5%
Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6%
Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7%
Our method
5k dict 62.4% 74.2% 61.6%
25 dict 62.6% 74.9% 61.2%
num. 62.8% 73.9% 60.4%
crosslingual word similarity
81. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl 33.1% 33.5% 35.6%
Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8%
Xing et al. (2015) 5k dict 61.4% 70.0% 59.5%
Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6%
Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7%
Our method
5k dict 62.4% 74.2% 61.6%
25 dict 62.6% 74.9% 61.2%
num. 62.8% 73.9% 60.4%
crosslingual word similarity
82. Experiments
• Dataset by Dinu et al. (2015) extended to German and Finnish
⇒ Monolingual embeddings (CBOW + negative sampling)
⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals
EN-IT EN-DE
Bi. data WS RG WS
Luong et al. (2015) Europarl 33.1% 33.5% 35.6%
Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8%
Xing et al. (2015) 5k dict 61.4% 70.0% 59.5%
Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6%
Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7%
Our method
5k dict 62.4% 74.2% 61.6%
25 dict 62.6% 74.9% 61.2%
num. 62.8% 73.9% 60.4%
crosslingual word similarity
84. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
85. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
small
86. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
small large
87. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
small
no error
large
88. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
small
no error
large
errors
89. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping Dictionary
small
no error
large
errors
90. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping Dictionary
small
no error
large
errors
better?
91. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping Dictionary
small
no error
large
errors
better?
worse?
92. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping
Mapping Dictionary
Dictionary
small
no error
large
errors
better?
worse?
93. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping
Mapping Dictionary
Dictionary
small
no error
large
errors
better?
worse?
even better?
94. Why does it work?
Mapping Dictionary
Monolingual embeddings
Dictionary
Mapping
Mapping Dictionary
Dictionary
small
no error
large
errors
better?
worse?
even better?
even worse?
96. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
97. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
Independent from seed dictionary!
98. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
99. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
100. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
101. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
102. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
103. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
104. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
105. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
106. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
107. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
108. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
109. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
110. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
111. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
112. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
113. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
114. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
115. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
Independent from seed dictionary!
116. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
Independent from seed dictionary!
So why do we need a seed dictionary?
117. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
Independent from seed dictionary!
So why do we need a seed dictionary?
Avoid poor local optima!
118. Why does it work?
𝑍𝑋𝑊
𝑊∗
= arg max
𝑊
𝑖
max
𝑗
𝑋𝑖∗ 𝑊 ∙ 𝑍𝑗∗ s.t. 𝑊𝑊 𝑇
= 𝑊 𝑇
𝑊 = 𝐼Implicit objective:
121. Conclusions
𝑍𝑋𝑊
• Simple self-learning method to train bilingual embedding mappings
• High quality results with almost no supervision (25 words, numerals)
122. Conclusions
𝑍𝑋𝑊
• Simple self-learning method to train bilingual embedding mappings
• High quality results with almost no supervision (25 words, numerals)
• Implicit optimization objective independent from seed dictionary
123. Conclusions
𝑍𝑋𝑊
• Simple self-learning method to train bilingual embedding mappings
• High quality results with almost no supervision (25 words, numerals)
• Implicit optimization objective independent from seed dictionary
• Seed dictionary necessary to avoid poor local optima
124. Conclusions
𝑍𝑋𝑊
• Simple self-learning method to train bilingual embedding mappings
• High quality results with almost no supervision (25 words, numerals)
• Implicit optimization objective independent from seed dictionary
• Seed dictionary necessary to avoid poor local optima
• Future work: fully unsupervised training