Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16

263 views

Published on

Abdessamad Echihabi, VP Research and Product Development at SDL presents at SDL Connect, Palo Alto, November 2016.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16

  1. 1. Applying Machine Learning 11/17/2016 Samad Echihabi
  2. 2. 2 Machine Learning Machine data model input output
  3. 3. 3 Machine Learning Machine data model input output
  4. 4. 4 Machine Learning: French to English Translation data model Méditerranée: 3200 personnes secourues en cinq jours Mediterranean: 3200 people rescued in five days
  5. 5. Machine Translation Models 5 Language Model Translation Model
  6. 6. Source Target Good Translations? bonjour hello bonjour blue bonjour morning bonjour good morning bonjour hi Source Target Good Translations? bonjour hello ✔ bonjour blue ✗ bonjour morning ~ bonjour good morning ✔ bonjour hi ✔ MT: Translation Model
  7. 7. Target Good Language? Be the change that you wish to see in the world. Be the world that you wish to see in the change. The be change which you wish to see on the world. Be that the you world to in wish see change . the Target Good Language? Be the change that you wish to see in the world. ✔ Be the world that you wish to see in the change. ✗ The be change which you wish to see on the world. ✗✗ Be that the you world to in wish see change . the ✗✗✗✗✗ MT: Language Model
  8. 8. MT: Training Statistical Analysis Translation Model la the 80% la a 12% la 8% capitale capital 70% capitale death 30% de of 53% de from 47% france france 100% Is est 75% Is was 25% paris paris 100% Language Model the death of 54% the capital of 34% a capital of 11% capital of france 41% capital from france 9% of france is 45% of the france 2% france is paris 23% france was paris 22% ……… ……… english ……… P(s/t) P(t) parallel monolingual ……… ……… english ……… ……… ……… french ……… Statistical Analysis
  9. 9. MT: Decoding Statistical Search Translation Score the capital of france is paris 94% capital of france is paris 71% a capital of france is paris 65% ... … a death from france was paris 3% Translation Model la the 80% la a 12% la 8% capitale capital 70% capitale death 30% de of 53% de from 47% france france 100% Is est 75% Is was 25% paris paris 100% Language Model the death of 54% the capital of 34% a capital of 11% capital of france 41% capital from france 9% of france is 45% of the france 2% france is paris 23% france was paris 22% Input la capitale de la france est paris
  10. 10. SMT Models Adaptive Models Neural Models • Translation Model P(s/t) • Language Model P(t) • Distortion • Alignment • Phrase • POS • Syntactic Translation • Syntactic Language • Reordering • Lexicalized Reordering • Preordering • Word Deletion • Lexicalized Smoothing • Capitalization • Morphology • Transliteration • Semantic • Informal Models • Social Media Components
  11. 11. Applying Machine Learning – Use cases
  12. 12. Social Media Translation
  13. 13. Character Repetition
  14. 14. Spelling Errors
  15. 15. Dialect
  16. 16. Morphology
  17. 17. Romanization
  18. 18. Metadata
  19. 19. Social Media Translation Challenges‫أاااا‬‫ا‬‫احسن‬ ‫احسن‬ ‫الخلييييييج‬ ‫ا‬‫لخليج‬Normalization ‫نزيفه‬ ‫نظيفة‬ ‫وظيفة‬ ‫نظيف‬ ‫نزيف‬ ‫نزيفه‬ Spelling Correction #‫القدم_كرة‬ #soccerSocial Metadata ‫المرفهين‬ ‫+ال‬ ‫+مرفه‬ ‫ين‬ Morphological Segmentation bessa7a wel3afya habibi ‫والعافية‬ ‫حبيبي‬ ‫بالصحة‬ habibi ‫بساحة‬ Deromanization +62% Improvement
  20. 20. Source Generic MT Social Media Translation • la2a hia katir fi lakhbar. • ma 3ajbanish kida. Lazim t3'iyyer l3ounouane • Enty habla ? • Kalemni lama t3raf ezay tebatal teshtemni • 3andy soda3 fi rassi... 5oshy namy badal chat. a7san lik Ah sa7 • La2a hia katir Fi lakhbar. • Ma 3ajbanish kida. lazim T3 (iyyer L3ounouane • enty habla? • kalemni Lama T3RAF ezay tebatal teshtemni • 3Andy soda3 Fi rassi ... 5oshy namy badal Chat. A7San lik Ah SA7
  21. 21. Source Social Media MT Social Media Translation • la2a hia katir fi lakhbar. • ma 3ajbanish kida. Lazim t3'iyyer l3ounouane • Enty habla ? • Kalemni lama t3raf ezay tebatal teshtemni • 3andy soda3 fi rassi... 5oshy namy badal chat. a7san lik Ah sa7 • No, it is very much in the news. • I don't like this. We must change the title • Are you an idiot? • Talk to me when you know how to stop insulting me • I have a headache in my head. Go to sleep, instead of chat. It is better for you, Yes, sa7
  22. 22. Broadcast News Translations
  23. 23. Broadcast News Translation Speech Recognition Machine Translation Distillation Audio Channels Video Channels Actionable Information
  24. 24. Received Tuesday in Warsaw by Bronislaw Komorowski, Barack Obama has participated in ceremonies marking the twenty-fifth anniversary of the first democratic elections in Poland Broadcast News Translation Reçu mardi à Varsovie par Bronislaw Komorowski, Barack Obama a participé aux cérémonies marquant le vingt-cinquième anniversaire des premières élections démocratiques en Pologne. Received Tuesday in Warsaw by Bronislaw Komorowski, Barack Obama has participated in ceremonies marking the twenty-fifth anniversary of the first democratic elections in Poland ✗
  25. 25. Travel Reviews
  26. 26. Travel User Reviews Translation published Translated User Reviews post-edited good translation bad translation Automatic Quality Prediction
  27. 27. Post-Editing Machine Translation
  28. 28. Post-Editing Machine Translation
  29. 29. Post-Editing Adaptive Machine Translation
  30. 30. Post-Editing Adaptive Machine Translation
  31. 31. Applying Machine Learning Volume Quality Data Domain Models Delivery Security Speed Privacy Evaluation Integration Adaptation
  32. 32. ANSWERS& QUESTIONS
  33. 33. Copyright © 2008-2017 SDL plc. All rights reserved. All company names, brand names, trademarks, service marks, images and logos are the property of their respective owners. This presentation and its content are SDL confidential unless otherwise specified, and may not be copied, used or distributed except as authorised by SDL. Software and Services for Human Understanding

×