Your SlideShare is downloading. ×
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Language Use And Preservation Online
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Language Use And Preservation Online

1,327

Published on

TEDx presentation on the latest advances in mainstream language technology and how this affects "minor" languages.

TEDx presentation on the latest advances in mainstream language technology and how this affects "minor" languages.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,327
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Language use and preservation online Tadej Gregorčič
    • 2. “Minor” languages • 6912+ languages altogether • 3500 spoken by 0,2 % of world’s speakers • 40% endangered • Only 600 non-extinct within 100 years?
    • 3. Endangered languages
    • 4. Internet • 90% of content in just 12 languages • How big an issue is extinction? • Language transformation vs. transformation of old media (TV, newspapers, radio) • Unicode - first major breakthrough
    • 5. Slovenian (my language) • Roughly 2 million speakers • More speakers than 96% of languages • Official EU language - enforcement policies • Endangerment?
    • 6. Use of foreign words in scientific text where appropriate Slovenian counterparts exist.
    • 7. Preservation of language
    • 8. The Rosetta Project • http://rosettaproject.org/ • Publicly accessible digital library • Aiming to preserve information about eventually all human languages
    • 9. Preservation of knowledge contained in a language • Smithsonian Institute • Rosetta Project • Unesco • Revitalization (non-extinct) • Resurrection (extinct) • Only successful known example: Hebrew
    • 10. Keeping use of a language viable/economical • Consistent use • Dictionaries, tools • Translation tools • Advanced language software (TTS, SR)
    • 11. Language technologies • Machine translation • Speech synthesis • Speech recognition • ... • Advance in one field accelerates advances in others through increased feasibility
    • 12. Language technologies • Machine translation • Speech synthesis • Speech recognition • ... • Advance in one field accelerates advances in others through increased feasibility
    • 13. 2005 • Systran (fr.) • Yahoo!, Altavista Babelfish • Google • Rule based + statistical approach
    • 14. Live translation • Done in 2005 as Ethnocon project (presented at MS Imagine Cup) • Speech recognition (language 1) • Text machine translation (Systran API) • Speech synthesis (language 2) • MT quality poor
    • 15. 2006+ • Google Translate Systran • Google obtained United Nations parallel corpora • Words = data, grammar = code • Purely statistical approach (a huge amount of data, code )
    • 16. Parallel corpus • evrokorpus.gov.si • Translation memory (Trados ipd.) • TM from governmental institutions • Open TM projects • ...
    • 17. Parallel corpus • evrokorpus.gov.si • Translation memory (Trados ipd.) • TM from governmental institutions • Open TM projects • Example: the Bible
    • 18. Google Translate
    • 19. Crowdsourcing • It works (Wikipedia) • An incorrect translation is a natural motivator • Relatively fast improvement of data • But: unprofessional
    • 20. June, 2009
    • 21. Google Translator Toolkit • June, 2009 (200+ languages in October) • “Open Trados” • Global parallel TM • Google TT + Google Translate • 345 languages, 10.664 language pairs
    • 22. Google Translator Toolkit • Incentive for professionals: productivity • Motivated to contribute to global TM • GT pre-translates text with • Huge parallel corpora • Professional translation!
    • 23. Professional translations are fed into the crowdsourced Google Translate parallel corpora. Like Wikipedia with professional editors. Huge quality gains over time if Google Translator Toolkit takes off.
    • 24. Results today:
    • 25. Automatic subtitling (think hearing impaired users)
    • 26. Results soon:
    • 27. AR, “augmented reality”
    • 28. November 2009 Thank you! Tadej Gregorcic Software developer, entrepreneur and amateur linguist twitter.com/tadej linkedin.com/in/tadejgregorcic www.facebook.com/tadej

    ×