Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

84 views

Published on

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Scientists meet Entrepreneurs - AI & Machine Learning, Mark Fishel, Institute of Computer Science, University of Tartu

  1. 1. Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019 Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow
  2. 2. Natural organic non-GMO bio-degradable eco-friendly Language Processing or NLP yesterday, today and tomorrow or NLP today, tomorrow and the day after Mark Fishel, TartuNLP MoMo Estonia / AI & ML January 14, 2019
  3. 3. AI NLP
  4. 4. ● end-user applications ○ translation (neurotolge.ee) ○ text↔speech (neurokone.ee) ○ text mining, information extraction (texta.ee) ○ chat bots ○ world domination, destruction of humanity ○ etc. ● components ● analysis, linguistics ● etc. NLP
  5. 5. Why?
  6. 6. ● NLP makes mistakes! ● in practice: semi-automation, post-editing, etc. Why?
  7. 7. 1. Step-by-step NLP
  8. 8. ● solve separate steps / components ○ via ML, rules, etc. ○ one by one ● put them in a pipeline ○ for that we have to (think we) understand how it works ● … ● profit! NLP before: step-by-step
  9. 9. ET: ? LV: Vai tev ir labāka ideja? Statistical Translation
  10. 10. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  11. 11. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  12. 12. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  13. 13. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  14. 14. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  15. 15. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  16. 16. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  17. 17. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  18. 18. ET: ? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  19. 19. ET: Kas LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  20. 20. ET: Kas sul LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  21. 21. ET: Kas sul on LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  22. 22. ET: Kas sul on parem idee LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  23. 23. ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  24. 24. Actual translation: ● segment input ● translate pieces ● reorder ● put in context ● … ET: Kas sul on parem idee? LV: Vai tev ir labāka ideja? ET: Mul on parem idee. LV: Man ir labāka ideja. ET: Kas sul on tõlketekste? LV: Vai tev ir pārtulkotu tekstu? ET: Sul peaks olema palju tõlketekste. LV: Tev jābūt daudz pārtulkotu tekstu. Statistical Translation
  25. 25. Text-to-speech 1. text to phonemes e.g. through → [θru], reason → ['rizən] 2. pronunciation for phonemes (or pairs of phonemes): e.g. θr → 3. “glue” pieces together → speech
  26. 26. 2. End-to-end NLP/ML
  27. 27. ● gather input/output examples for the end-user task ○ (sentence text, speech) ○ (Estonian sentence, English sentence) ● teach end-to-end deep neural black magic to go from input to output ○ ignore how we think it should be done ● … ● profit! NLP now: end-to-end
  28. 28. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very …? Neural Translation
  29. 29. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or …? Neural Translation
  30. 30. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear …? Neural Translation
  31. 31. Autoregressive: ● predict output based on (1) input and (2) already generated partial output Thank you very much. Would you like tea or coffee? Dear (ladies and gentlemen / mom / …) Neural Translation
  32. 32. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → … Neural Translation
  33. 33. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad … Neural Translation
  34. 34. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid … Neural Translation
  35. 35. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset … Neural Translation
  36. 36. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit … Neural Translation
  37. 37. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Neural Translation
  38. 38. p(They used a state-of-the-art approach, nad...) = = neural_estimator(x, y) = = { kasutasid: 0.67, rakendasid: 0.21, kasutavad: 0.04, … } Neural Translation
  39. 39. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach End-to-end NLP
  40. 40. Autoregressive: ● predict output based on (1) input and (2) already generated partial output They used a state-of-the-art approach → nad kasutasid kaasaegset meetodit KÕIK. Speech↔text: same/similar end-to-end approach NB: needs lots of explicit examples (data) End-to-end NLP
  41. 41. 3. NLP/ML with no data
  42. 42. ● explicit data is expensive and wasteful ● what to do for tasks without it? NLP/ML with no explicit data
  43. 43. Unsupervised Translation https://aclweb.org/anthology/D18-1549.pdf
  44. 44. Learn from: A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Task: Translate between English and Estonian without a single translation example! https://aclweb.org/anthology/D18-1549.pdf Unsupervised Translation
  45. 45. A. Tere! Minu nimi on Juhan. Kui ma eelmisel korral sellest pildist olen….. B. We must address this question as soon as possible. Why have we not….. Or: translate dog barks / kid speech Unsupervised Translation
  46. 46. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  47. 47. Estonian English Latvian Swedish Zero-shot learning https://arxiv.org/abs/1611.04558
  48. 48. ● style transfer ○ “that’s weird” → “that is strange” ● correcting errors ○ “i biggest your fan” → “I am your biggest fan” click Zero-shot NLP demo
  49. 49. ● “data + task understanding” is stable ● “data + end-to-end neural networks” is cool and promising ● “no data, thing still works” is sexy! Message to take home
  50. 50. Thanks! neurotolge.ee neurokone.ee livesubs.ee nlp.cs.ut.ee

×