Industry-Scale
Crowdsourcing of
Data & Terminology
Rahzeb Choudhury, TAUS
TAUS Mission Knowledge
Sharing Data &
Our an industry-level in an
mission is to increase
…on
the size and significance
ope...
Where We Stand

Together
We Know
Better We Know
More
Four Focus Areas
Technology

Data

Translation as
a Utility
Metrics

This slide may not be used or copied without permissi...
Members
Global Members
Academic, NGO & Government Members
Large Corporate Members
Small Corporate Members
Agency Members
Terminology
Importance of Terminology Work
14.8%

1.8%

43.5%
Very important
Quite important
Less important
Not important
39.9%

Sourc...
Information Sources

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical
writers, 30% translators, ...
Information Sources

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical
writers, 30% translators, ...
Information Sources

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical
writers, 30% translators, ...
Information Sources

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical
writers, 30% translators, ...
Information Sources

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical
writers, 30% translators, ...
Main Problems
9.4%

Lack of
resources/Insufficient
terminology
management
Poor quality/Up-todateness

20.6%

Lack of infor...
Too many sources.
Takes too much time.
Effort is duplicated.
Results questionable.
…Centralization…
Owned
Shared
Web
Machine Translation
Data and Quality
MT
Quality

In-domain Data

Algorithms

More data

Amount of Data
Owned
Shared
Web
Lack of access.
Copyright.
Takes too much time.
Effort is duplicated.
Quality questionable.
…Centralization…
Central Source of In-domain Data
Owned
Shared

Web – to come in 2014
Terminology and
Machine Translation
Data and Quality
Usage/Feedback Data
..Terminology!
MT
Quality

In-domain Data

Algorithms

More data

Amount of Data
…Centralization…
TAUS Mission Knowledge
Sharing Data &
Our an industry-level in an
mission is to increase
…on
the size and significance
ope...
Central Sources of Data and Terminology

For language workers, CAT Tools & MT Systems
 Own Data – Private Vault
 Shared ...
Main Problems
9.4%

Lack of
resources/Insufficient
terminology
management
Poor quality/Up-todateness

20.6%

Lack of infor...
Central Sourcing of Data and Terminology
But what about the crowd?

The crowd must source!
The crowd must verify!

Web Dat...
Too many sources.
Takes time. to
We maintain the
Unless the crowd helps
source is duplicated.
and verify…….
Effort
status ...
Register and engage:
demo.taas-project.eu
Thank you.
Contact: rahzeb@taus.net

This slide may not be used or copied without permission from TAUS
Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Upcoming SlideShare
Loading in...5
×

Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

583

Published on

Presenter: Rahzeb Choudhury (TAUS)

This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
583
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

  1. 1. Industry-Scale Crowdsourcing of Data & Terminology Rahzeb Choudhury, TAUS
  2. 2. TAUS Mission Knowledge Sharing Data & Our an industry-level in an mission is to increase …on the size and significance open and transparent of the translation all to a landscape brings us industry to help the higher level of competence. world communicate better.
  3. 3. Where We Stand Together We Know Better We Know More
  4. 4. Four Focus Areas Technology Data Translation as a Utility Metrics This slide may not be used or copied without permission from TAUS Interoperability
  5. 5. Members
  6. 6. Global Members
  7. 7. Academic, NGO & Government Members
  8. 8. Large Corporate Members
  9. 9. Small Corporate Members
  10. 10. Agency Members
  11. 11. Terminology
  12. 12. Importance of Terminology Work 14.8% 1.8% 43.5% Very important Quite important Less important Not important 39.9% Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  13. 13. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  14. 14. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  15. 15. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  16. 16. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  17. 17. Information Sources Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  18. 18. Main Problems 9.4% Lack of resources/Insufficient terminology management Poor quality/Up-todateness 20.6% Lack of information 12.2% 36.0% 11.5% 10.3% Lack of convincing verification/Misleading information online Rest Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  19. 19. Too many sources. Takes too much time. Effort is duplicated. Results questionable.
  20. 20. …Centralization…
  21. 21. Owned Shared Web
  22. 22. Machine Translation
  23. 23. Data and Quality MT Quality In-domain Data Algorithms More data Amount of Data
  24. 24. Owned Shared Web
  25. 25. Lack of access. Copyright. Takes too much time. Effort is duplicated. Quality questionable.
  26. 26. …Centralization…
  27. 27. Central Source of In-domain Data Owned Shared Web – to come in 2014
  28. 28. Terminology and Machine Translation
  29. 29. Data and Quality Usage/Feedback Data ..Terminology! MT Quality In-domain Data Algorithms More data Amount of Data
  30. 30. …Centralization…
  31. 31. TAUS Mission Knowledge Sharing Data & Our an industry-level in an mission is to increase …on the size and significance open and transparent of the translation all to a landscape brings us industry to help the higher level of competence. world communicate better.
  32. 32. Central Sources of Data and Terminology For language workers, CAT Tools & MT Systems  Own Data – Private Vault  Shared Data – In domain data  Web Data – Data Collector  Own Terms – Build Own Collections  Shared Term – In-domain terms  Web Terms – Term Collector But what about the crowd?
  33. 33. Main Problems 9.4% Lack of resources/Insufficient terminology management Poor quality/Up-todateness 20.6% Lack of information 12.2% 36.0% 11.5% 10.3% Lack of convincing verification/Misleading information online Rest Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technical writers, 30% translators, plus others)
  34. 34. Central Sourcing of Data and Terminology But what about the crowd? The crowd must source! The crowd must verify! Web Data – Data Collector Web Terms – Term Collector
  35. 35. Too many sources. Takes time. to We maintain the Unless the crowd helps source is duplicated. and verify……. Effort status quo.. Results questionable.
  36. 36. Register and engage: demo.taas-project.eu
  37. 37. Thank you. Contact: rahzeb@taus.net This slide may not be used or copied without permission from TAUS
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×