SlideShare a Scribd company logo
1 of 15
How Google Converted
Language Translation Into a
Problem of Vector Space Mathematics
Key Idea
Vector Space
of Language
A
Vector Space
of Language
B
• Represent languages as vector spaces
• Find the linear transformation that maps one to the
other
Google Translate
• Statistical Machine Translation (SMT)
• A machine translation paradigm where translations are
generated on the basis of statistical models whose
parameters are derived from the analysis of
bilingual text corpora. --Wikipedia
Dictionaries &
Phrase Tables
Pineapple Shrimp
Lemon Shrimp
Coconut Shrimp
Pepper Shrimp
菠 萝 虾
柠 檬 虾
椰 子 虾
胡 椒 虾
Parallel
Corpora
Dictionaries &
Phrase Tables
Model of
Language
Structure
Training
★
Translation
Model
Vector Space – why do we need it?
Problems…
• Creating parallel corpora
takes human effort
• Parallel corpora are scarce
for some language pairs
• Translation quality is
language-dependent
New Approach
• Automates the process of
generating and expanding
dictionaries and phrase
tables
• Makes little assumption
about the languages; works
for any language pairs
How does it work?
• Step 1 (Construct Language Spaces)
• Build monolingual models of languages using large amounts
of monolingual texts
• STEP 2 (Find a Translation Matrix)
• Learn a linear transformation between the vector spaces of
languages using a small bilingual dictionary
Step 1: How to Represent Languages?
• Simple neural network architectures that aims to
predict the neighbors of a word
• Continuous Bag-of-Words (CBOW)
• Skip-gram (SG)
• Represent languages as vector spaces using the
relationship between words
CBOW vs. Skip-gram
CBOW
• Predicts current word based
on the context
Skip-gram
• Predicts the context based
on current word
• E.g. “I hit the tennis ball”
- “I hit the”
“hit the tennis”,
“the tennis ball”
- “hit the ball”
(skipped tennis)
Some Great Results…
• Vectors of similar words are close in the vector space
• Capture semantic information and concept relation
• vec(“king”) – vec(“man”) + vec(“woman”) = vec(“queen”)
• vec(“Madrid”) – vec(“Spain”) + vec(“France”) = vec(“Paris”)
• Can be trained on a large corpus in a short time due
to low computational complexity
Step 2: Why does it work?
• All languages have words that describe a similar set
of ideas; words are used in similar ways
• E.g. “A cat is an animal that is smaller than a dog.”
“猫是一种比狗小的动物”
• Strong similarities of geometric arrangements
between different language spaces
Step 2: Translation Matrix
• Given a small bilingual dictionary
• 𝒖𝒊 ∈ 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑺𝒑𝒂𝒄𝒆 𝑨
• 𝒗𝒊 ∈ 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑺𝒑𝒂𝒄𝒆 𝑩
• {𝒖𝒊, 𝒗𝒊} ∈ 𝑫𝒊𝒄𝒕𝒊𝒐𝒏𝒂𝒓𝒚 𝒐𝒇 (𝑨, 𝑩)
• 𝑳𝒆𝒂𝒓𝒏 𝒂 𝒕𝒓𝒂𝒏𝒔𝒍𝒂𝒕𝒊𝒐𝒏 𝒎𝒂𝒕𝒓𝒊𝒙 𝑾 𝒔. 𝒕.
• 𝑾 𝒖𝒊 ≃ 𝒗𝒊
• Works for words that are not in the dictionary
• automatically expands the dictionary
Performance And Applications
• 90% precision@5 between English and Spanish
• Expand and refine existing dictionaries
• Correct errors in the English-Czech dictionary
• Improve translation quality for distant language pairs
• English-Vietnamese
Comments
• A step forward in multilingual communication
• Still a long way to go…
• Sentence structure
• Precision and in-context translation
• Idioms
References
• Haghighi, Aria, et al. "Learning Bilingual Lexicons from Monolingual Corpora." ACL.
Vol. 2008. 2008.
• Guthrie, David, et al. "A closer look at skip-gram modelling." Proceedings of the 5th
international Conference on Language Resources and Evaluation (LREC-2006).
2006.
• Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. "Exploiting similarities among
languages for machine translation." arXiv preprint arXiv:1309.4168 (2013).

More Related Content

Similar to Mtvectorspace 161101214722

Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Closing the language gap: developing machine learning tools to detect the lan...
Closing the language gap: developing machine learning tools to detect the lan...Closing the language gap: developing machine learning tools to detect the lan...
Closing the language gap: developing machine learning tools to detect the lan...CILIP MDG
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherMLReview
 
Ubiquity: Designing a Multilingual Natural Language Interface
Ubiquity: Designing a Multilingual Natural Language InterfaceUbiquity: Designing a Multilingual Natural Language Interface
Ubiquity: Designing a Multilingual Natural Language InterfaceMichael Yoshitaka Erlewine
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44Alain Désilets
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraThoughtworks
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speakMarcos Vinícius
 

Similar to Mtvectorspace 161101214722 (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Closing the language gap: developing machine learning tools to detect the lan...
Closing the language gap: developing machine learning tools to detect the lan...Closing the language gap: developing machine learning tools to detect the lan...
Closing the language gap: developing machine learning tools to detect the lan...
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 
Ubiquity: Designing a Multilingual Natural Language Interface
Ubiquity: Designing a Multilingual Natural Language InterfaceUbiquity: Designing a Multilingual Natural Language Interface
Ubiquity: Designing a Multilingual Natural Language Interface
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
 
Intro
IntroIntro
Intro
 
Intro
IntroIntro
Intro
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos Silveira
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speak
 

Recently uploaded

John Deere Tractors 5415 Diagnostic Repair Service Manual.pdf
John Deere Tractors 5415 Diagnostic Repair Service Manual.pdfJohn Deere Tractors 5415 Diagnostic Repair Service Manual.pdf
John Deere Tractors 5415 Diagnostic Repair Service Manual.pdfExcavator
 
Illustrative History and Influence of Board Games - Thesis.pptx
Illustrative History and Influence of Board Games - Thesis.pptxIllustrative History and Influence of Board Games - Thesis.pptx
Illustrative History and Influence of Board Games - Thesis.pptxHenriSandoval
 
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...Excavator
 
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查huxs9sacp
 
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVE
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVESEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVE
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVEZhandosBuzheyev
 
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.docazrfdstgdgdfh
 
Mercedes Check Engine Light Solutions Precision Service for Peak Performance
Mercedes Check Engine Light Solutions Precision Service for Peak PerformanceMercedes Check Engine Light Solutions Precision Service for Peak Performance
Mercedes Check Engine Light Solutions Precision Service for Peak PerformanceMotronix
 
EV Charging Resources and Technical Assistance for Rural Communities and Trib...
EV Charging Resources and Technical Assistance for Rural Communities and Trib...EV Charging Resources and Technical Assistance for Rural Communities and Trib...
EV Charging Resources and Technical Assistance for Rural Communities and Trib...Forth
 
Access to Rural Charging by David Skakel
Access to Rural Charging by David SkakelAccess to Rural Charging by David Skakel
Access to Rural Charging by David SkakelForth
 
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一avy6anjnd
 
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样wsppdmt
 
Exploring the Heart of Alberta: A Journey from Calgary to Edmonton
Exploring the Heart of Alberta: A Journey from Calgary to EdmontonExploring the Heart of Alberta: A Journey from Calgary to Edmonton
Exploring the Heart of Alberta: A Journey from Calgary to EdmontonTheCanada BUS
 
mechanical vibrations pebbles.pptbbbbbbbbx
mechanical vibrations pebbles.pptbbbbbbbbxmechanical vibrations pebbles.pptbbbbbbbbx
mechanical vibrations pebbles.pptbbbbbbbbxjoshuaclack73
 
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be Wrong
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be WrongIs Your Mercedes Benz Trunk Refusing To Close Here's What Might Be Wrong
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be WrongMomentum Motorworks
 
一比一原版伯明翰城市大学毕业证成绩单留信学历认证
一比一原版伯明翰城市大学毕业证成绩单留信学历认证一比一原版伯明翰城市大学毕业证成绩单留信学历认证
一比一原版伯明翰城市大学毕业证成绩单留信学历认证62qaf0hi
 
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一8jg9cqy
 
John deere 7200r 7230R 7260R Problems Repair Manual
John deere 7200r 7230R 7260R Problems Repair ManualJohn deere 7200r 7230R 7260R Problems Repair Manual
John deere 7200r 7230R 7260R Problems Repair ManualExcavator
 
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...Forth
 
Why Is The Glow Plug Light Flashing In My VW & What Does It Indicate
Why Is The Glow Plug Light Flashing In My VW & What Does It IndicateWhy Is The Glow Plug Light Flashing In My VW & What Does It Indicate
Why Is The Glow Plug Light Flashing In My VW & What Does It IndicateWoodinville Sports Cars
 

Recently uploaded (20)

John Deere Tractors 5415 Diagnostic Repair Service Manual.pdf
John Deere Tractors 5415 Diagnostic Repair Service Manual.pdfJohn Deere Tractors 5415 Diagnostic Repair Service Manual.pdf
John Deere Tractors 5415 Diagnostic Repair Service Manual.pdf
 
Illustrative History and Influence of Board Games - Thesis.pptx
Illustrative History and Influence of Board Games - Thesis.pptxIllustrative History and Influence of Board Games - Thesis.pptx
Illustrative History and Influence of Board Games - Thesis.pptx
 
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...
JOHN DEERE 7200R 7215R 7230R 7260R 7280R TECHNICAL SERVICE PDF MANUAL 2680PGS...
 
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查
如何办理(爱大毕业证书)爱丁堡大学毕业证成绩单留信学历认证真实可查
 
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVE
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVESEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVE
SEM 922 MOTOR GRADER PARTS LIST, ALL WHEEL DRIVE
 
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc
01552_14_01306_8.0_EPS_CMP_SW_VC2_Notebook.doc
 
Mercedes Check Engine Light Solutions Precision Service for Peak Performance
Mercedes Check Engine Light Solutions Precision Service for Peak PerformanceMercedes Check Engine Light Solutions Precision Service for Peak Performance
Mercedes Check Engine Light Solutions Precision Service for Peak Performance
 
EV Charging Resources and Technical Assistance for Rural Communities and Trib...
EV Charging Resources and Technical Assistance for Rural Communities and Trib...EV Charging Resources and Technical Assistance for Rural Communities and Trib...
EV Charging Resources and Technical Assistance for Rural Communities and Trib...
 
Access to Rural Charging by David Skakel
Access to Rural Charging by David SkakelAccess to Rural Charging by David Skakel
Access to Rural Charging by David Skakel
 
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一
如何办理美国华盛顿大学毕业证(UW毕业证书)毕业证成绩单原版一比一
 
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样
一比一原版西安大略大学毕业证(UWO毕业证)成绩单原件一模一样
 
Exploring the Heart of Alberta: A Journey from Calgary to Edmonton
Exploring the Heart of Alberta: A Journey from Calgary to EdmontonExploring the Heart of Alberta: A Journey from Calgary to Edmonton
Exploring the Heart of Alberta: A Journey from Calgary to Edmonton
 
mechanical vibrations pebbles.pptbbbbbbbbx
mechanical vibrations pebbles.pptbbbbbbbbxmechanical vibrations pebbles.pptbbbbbbbbx
mechanical vibrations pebbles.pptbbbbbbbbx
 
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be Wrong
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be WrongIs Your Mercedes Benz Trunk Refusing To Close Here's What Might Be Wrong
Is Your Mercedes Benz Trunk Refusing To Close Here's What Might Be Wrong
 
一比一原版伯明翰城市大学毕业证成绩单留信学历认证
一比一原版伯明翰城市大学毕业证成绩单留信学历认证一比一原版伯明翰城市大学毕业证成绩单留信学历认证
一比一原版伯明翰城市大学毕业证成绩单留信学历认证
 
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一
如何办理加拿大麦克马斯特大学毕业证(McMaste 毕业证书)毕业证成绩单原版一比一
 
John deere 7200r 7230R 7260R Problems Repair Manual
John deere 7200r 7230R 7260R Problems Repair ManualJohn deere 7200r 7230R 7260R Problems Repair Manual
John deere 7200r 7230R 7260R Problems Repair Manual
 
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...
Charging Forward: Bringing Electric Vehicle Charging Infrastructure to Rural ...
 
Obat Penggugur Kandungan Di Apotek Klinik Banyuwangi +6287776558899
Obat Penggugur Kandungan Di Apotek Klinik Banyuwangi +6287776558899Obat Penggugur Kandungan Di Apotek Klinik Banyuwangi +6287776558899
Obat Penggugur Kandungan Di Apotek Klinik Banyuwangi +6287776558899
 
Why Is The Glow Plug Light Flashing In My VW & What Does It Indicate
Why Is The Glow Plug Light Flashing In My VW & What Does It IndicateWhy Is The Glow Plug Light Flashing In My VW & What Does It Indicate
Why Is The Glow Plug Light Flashing In My VW & What Does It Indicate
 

Mtvectorspace 161101214722

  • 1. How Google Converted Language Translation Into a Problem of Vector Space Mathematics
  • 2. Key Idea Vector Space of Language A Vector Space of Language B • Represent languages as vector spaces • Find the linear transformation that maps one to the other
  • 3. Google Translate • Statistical Machine Translation (SMT) • A machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. --Wikipedia
  • 4. Dictionaries & Phrase Tables Pineapple Shrimp Lemon Shrimp Coconut Shrimp Pepper Shrimp 菠 萝 虾 柠 檬 虾 椰 子 虾 胡 椒 虾
  • 5. Parallel Corpora Dictionaries & Phrase Tables Model of Language Structure Training ★ Translation Model
  • 6. Vector Space – why do we need it? Problems… • Creating parallel corpora takes human effort • Parallel corpora are scarce for some language pairs • Translation quality is language-dependent New Approach • Automates the process of generating and expanding dictionaries and phrase tables • Makes little assumption about the languages; works for any language pairs
  • 7. How does it work? • Step 1 (Construct Language Spaces) • Build monolingual models of languages using large amounts of monolingual texts • STEP 2 (Find a Translation Matrix) • Learn a linear transformation between the vector spaces of languages using a small bilingual dictionary
  • 8. Step 1: How to Represent Languages? • Simple neural network architectures that aims to predict the neighbors of a word • Continuous Bag-of-Words (CBOW) • Skip-gram (SG) • Represent languages as vector spaces using the relationship between words
  • 9. CBOW vs. Skip-gram CBOW • Predicts current word based on the context Skip-gram • Predicts the context based on current word • E.g. “I hit the tennis ball” - “I hit the” “hit the tennis”, “the tennis ball” - “hit the ball” (skipped tennis)
  • 10. Some Great Results… • Vectors of similar words are close in the vector space • Capture semantic information and concept relation • vec(“king”) – vec(“man”) + vec(“woman”) = vec(“queen”) • vec(“Madrid”) – vec(“Spain”) + vec(“France”) = vec(“Paris”) • Can be trained on a large corpus in a short time due to low computational complexity
  • 11. Step 2: Why does it work? • All languages have words that describe a similar set of ideas; words are used in similar ways • E.g. “A cat is an animal that is smaller than a dog.” “猫是一种比狗小的动物” • Strong similarities of geometric arrangements between different language spaces
  • 12. Step 2: Translation Matrix • Given a small bilingual dictionary • 𝒖𝒊 ∈ 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑺𝒑𝒂𝒄𝒆 𝑨 • 𝒗𝒊 ∈ 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑺𝒑𝒂𝒄𝒆 𝑩 • {𝒖𝒊, 𝒗𝒊} ∈ 𝑫𝒊𝒄𝒕𝒊𝒐𝒏𝒂𝒓𝒚 𝒐𝒇 (𝑨, 𝑩) • 𝑳𝒆𝒂𝒓𝒏 𝒂 𝒕𝒓𝒂𝒏𝒔𝒍𝒂𝒕𝒊𝒐𝒏 𝒎𝒂𝒕𝒓𝒊𝒙 𝑾 𝒔. 𝒕. • 𝑾 𝒖𝒊 ≃ 𝒗𝒊 • Works for words that are not in the dictionary • automatically expands the dictionary
  • 13. Performance And Applications • 90% precision@5 between English and Spanish • Expand and refine existing dictionaries • Correct errors in the English-Czech dictionary • Improve translation quality for distant language pairs • English-Vietnamese
  • 14. Comments • A step forward in multilingual communication • Still a long way to go… • Sentence structure • Precision and in-context translation • Idioms
  • 15. References • Haghighi, Aria, et al. "Learning Bilingual Lexicons from Monolingual Corpora." ACL. Vol. 2008. 2008. • Guthrie, David, et al. "A closer look at skip-gram modelling." Proceedings of the 5th international Conference on Language Resources and Evaluation (LREC-2006). 2006. • Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. "Exploiting similarities among languages for machine translation." arXiv preprint arXiv:1309.4168 (2013).