Successfully reported this slideshow.

Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

1

Share

1 of 41
1 of 41

Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

1

Share

Download to read offline

Presenter: Rahzeb Choudhury (TAUS)

This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

Presenter: Rahzeb Choudhury (TAUS)

This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

More Related Content

More from TAUS - Enabling better translation

Related Books

Free with a 14 day trial from Scribd

See all

Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

  1. 1. Industry-Scale Crowdsourcing of Data & Terminology Rahzeb Choudhury, TAUS
  2. 2. TAUS Mission Knowledge Sharing Data & Our an industry-level in an mission is to increase …on the size and significance open and transparent of the translation all to a landscape brings us industry to help the higher level of competence. world communicate better.
  3. 3. Where We Stand Together We Know Better We Know More
  4. 4. Four Focus Areas Technology Data Translation as a Utility Metrics This slide may not be used or copied without permission from TAUS Interoperability
  5. 5. Members
  6. 6. Global Members
  7. 7. Academic, NGO & Government Members
  8. 8. Large Corporate Members
  9. 9. Small Corporate Members
  10. 10. Agency Members
  11. 11. Terminology
  12. 12. Importance of Terminology Work 14.8% 1.8% 43.5% Very important Quite important Less important Not important 39.9% Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  13. 13. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  14. 14. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  15. 15. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  16. 16. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  17. 17. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  18. 18. Main Problems 9.4% Lack of resources/Insufficient terminology management Poor quality/Up-todateness 20.6% Lack of information 12.2% 36.0% 11.5% 10.3% Lack of convincing verification/Misleading information online Rest Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  19. 19. Too many sources. Takes too much time. Effort is duplicated. Results questionable.
  20. 20. …Centralization…
  21. 21. Owned Shared Web
  22. 22. Machine Translation
  23. 23. Data and Quality MT Quality In-domain Data Algorithms More data Amount of Data
  24. 24. Owned Shared Web
  25. 25. Lack of access. Copyright. Takes too much time. Effort is duplicated. Quality questionable.
  26. 26. …Centralization…
  27. 27. Central Source of In-domain Data Owned Shared Web – to come in 2014
  28. 28. Terminology and Machine Translation
  29. 29. Data and Quality Usage/Feedback Data ..Terminology! MT Quality In-domain Data Algorithms More data Amount of Data
  30. 30. …Centralization…
  31. 31. TAUS Mission Knowledge Sharing Data & Our an industry-level in an mission is to increase …on the size and significance open and transparent of the translation all to a landscape brings us industry to help the higher level of competence. world communicate better.
  32. 32. Central Sources of Data and Terminology For language workers, CAT Tools & MT Systems  Own Data – Private Vault  Shared Data – In domain data  Web Data – Data Collector  Own Terms – Build Own Collections  Shared Term – In-domain terms  Web Terms – Term Collector But what about the crowd?
  33. 33. Main Problems 9.4% Lack of resources/Insufficient terminology management Poor quality/Up-todateness 20.6% Lack of information 12.2% 36.0% 11.5% 10.3% Lack of convincing verification/Misleading information online Rest Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  34. 34. Central Sourcing of Data and Terminology But what about the crowd? The crowd must source! The crowd must verify! Web Data – Data Collector Web Terms – Term Collector
  35. 35. Too many sources. Takes time. to We maintain the Unless the crowd helps source is duplicated. and verify……. Effort status quo.. Results questionable.
  36. 36. Register and engage: demo.taas-project.eu
  37. 37. Thank you. Contact: rahzeb@taus.net This slide may not be used or copied without permission from TAUS

×