Good Applications of Bad Machine Translation


Published on

Presented at the 6th Language Technology Conference, Cordoba, Argentina, April 2009

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Good Applications of Bad Machine Translation

  1. 1. Bob Donaldson, VP Strategy Good Applications of Bad Machine Translation
  2. 2. Promise of MT or eMpTy Promises? Anything worth doing is worth doing poorly.
  3. 3. A Few Well-known Facts <ul><li>Web content is growing 70% or more year over year </li></ul><ul><li>Almost 65% of all Internet users (more than 800 million) do not speak English! </li></ul><ul><li>Today, English is less than 36% of internet users and will possibly decline to 15% by 2010 </li></ul><ul><li>More than 70% of digital messages and documents are “born” in languages other than English </li></ul><ul><li>Non-English speaking Internet users are increasing by over 140 million per year </li></ul><ul><li>Emerging shortage of human translators for the vast and growing demand of digital content </li></ul>
  4. 4. Tipping Point for Machine Translation? <ul><li>Before: </li></ul><ul><li>Machine Translation success is always “ a few years away” </li></ul><ul><li>Actual MT output requires substantial rework with very little overall cost/time savings </li></ul><ul><li>Incremental improvements through technology </li></ul><ul><ul><li>Translation Memory </li></ul></ul><ul><ul><li>Translation Workflow Management </li></ul></ul><ul><li>After (Are we there yet?): </li></ul><ul><li>Statistical MT is showing a 4:1 productivity improvement in some narrow domains of application </li></ul><ul><li>Integration of MT into high-volume workflows </li></ul><ul><li>Expansion of addressable market for translation services </li></ul>
  5. 5. The Famous Triangle MT + Post-Edit Improvements Currently translated by humans } MT Extensions – Reduced Quality
  6. 6. The Famous Triangle Eventual MT success in narrow domain MT + Post-Edit Improvements Currently translated by humans
  7. 7. Intel Experience LAR Spanish MT visits growth since January ‘08 <ul><li>Using SMT for “raw” translation of customer support site </li></ul><ul><li>MT increased Spanish content from 12% to 100%. </li></ul><ul><li>Stopped 95% human translation </li></ul><ul><li>Project cycle went from 2-3 weeks to every 24 hours </li></ul><ul><li>Survey shows MT solves 40% customers’ questions/problems vs. 41% human translation </li></ul><ul><li>Developing more language MT systems </li></ul>* Source: Will Burgett, Global Support Summit, 2008
  8. 8. Microsoft Experience * * Source: Rich Kaplan, Localization World, Seattle, 2007 <ul><li>Users see human-edited MT translations </li></ul><ul><li>Increased efficiency (and cost savings) </li></ul><ul><li>Used when accuracy is critical </li></ul><ul><ul><li>E.g., MT output is post-edited by localizers during translation of documentation and software strings </li></ul></ul><ul><li>Users see unedited MT translations </li></ul><ul><li>Enables translation of material that otherwise goes un-translated </li></ul><ul><li>Used when some errors can be tolerated </li></ul><ul><ul><li>E.g., when the alternative is nothing at all: The CSS knowledge base </li></ul></ul><ul><ul><li>Users are motivated! </li></ul></ul>
  9. 9. Microsoft Experience * <ul><li>Ranges from 5% to 25% savings in translation time/cost </li></ul><ul><li>Depends on division of labor, post-editing quality guidelines, translator training, and vendor </li></ul>* Source: Rich Kaplan, Localization World, Seattle, 2007
  10. 10. What about small volumes? <ul><li>Google? </li></ul><ul><li>MT Vendors? </li></ul><ul><li>SaaS? </li></ul><ul><li>But what about quality? </li></ul><ul><li>Cost/Quality Correlation </li></ul><ul><li>Assumes Fully Trained SMT </li></ul><ul><li>Quality of “Raw” MT is Suspect (at best) </li></ul><ul><li>* Source: Kirti Vashee, </li></ul><ul><ul><li>TAUS Workshop, </li></ul></ul><ul><ul><li>Beijing, 2007 </li></ul></ul>SMT TM SMT HT Review TM SMT Post-Edit TM SMT Quality of translation * SMT
  11. 11. So … What good is Bad Machine Translation? <ul><li>Part of an integrated use-case in research and discovery applications … </li></ul><ul><li>Searching is as much art as science </li></ul><ul><li>Primary goal is to establish relevance </li></ul><ul><li>May also include identifying absence of a particular topic or term </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Patent Search </li></ul></ul><ul><ul><li>Litigation Support </li></ul></ul>
  12. 12. Translation Services Marketplace Taxonomy <ul><li>Customer as content consumer (e.g. individual researchers) </li></ul><ul><ul><li>Traditional “Translation Agency” target </li></ul></ul><ul><ul><li>Multiple sources translated individually </li></ul></ul><ul><ul><li>Little motivation for adopting TM or MT </li></ul></ul><ul><li>Customer as content creator (e.g. Ford, Oracle, etc.) </li></ul><ul><ul><li>“ Localization Company” target </li></ul></ul><ul><ul><li>Large body of source material, often under version control </li></ul></ul><ul><ul><li>Opportunities for controlled authorship, terminology management, TM, etc. to control cost </li></ul></ul><ul><ul><li>Emerging opportunity for MT </li></ul></ul><ul><li>Customer as content aggregator (e.g. Google, LexisNexis) </li></ul><ul><ul><li>Value is in centralized search/selection support (information retrieval) </li></ul></ul><ul><ul><li>Economies of scale to meet needs of content consumer </li></ul></ul><ul><ul><li>Requires MT at some level to be viable </li></ul></ul>
  13. 13. MT + Search + HT = Cost-Effective Solution SMT TM SMT HT Review TM SMT Post-Edit TM SMT Quality of translation SMT Entire Corpus Translated for Human Reader Total Cost Proportionate to Quality & Volume Assumes Fully Trained SMT HT* * Minimal Training SMT HT* Rough MT of Entire Corpus Translated for Index/Search Lower Overall Cost plus Highest Quality * On Demand
  14. 14. Data Aggregation Perspective Rough MT … may not be “human ready” “ On-demand” human translation Analytics to support ‘triage'
  15. 15. Just in Time Translation <ul><li>Manufacturing Model </li></ul><ul><li>Product Design & Prototyping </li></ul><ul><li>Publish Product Catalog </li></ul><ul><li>Build to Order </li></ul><ul><li>No inventory </li></ul><ul><li>Translation Analog </li></ul><ul><li>Train/Configure MT System </li></ul><ul><li>Integrate with Retrieval System </li></ul><ul><li>Translate to Order </li></ul><ul><li>Free inventory! </li></ul><ul><li>Iterative Improvement Loop </li></ul>
  16. 16. Eg: Unified Legal Analysis Environment English Document Set Translated Document Set IPX Document Profiling Process Document-Specific Concept Profiles Translation Process: ~1% HT ~99% MT IPX NLP Process Keyword Translation Basic Priority Scoring Process IPX NLP Process Unified Document Set Unified Correlation Matrices French Document Set Paralegal Analysis
  17. 17. Eg.: Chinese Patent Search Pilot <ul><li>Matrixware Information Services </li></ul><ul><ul><li>Information Retrieval Specialists </li></ul></ul><ul><ul><li>Targeting Individual Knowledge Workers </li></ul></ul><ul><li>Asia Online </li></ul><ul><ul><li>Statistical MT Experts </li></ul></ul><ul><ul><li>Custom Domain Development </li></ul></ul><ul><ul><li>Real-time Feedback for MT Improvement </li></ul></ul><ul><li>Mc Elroy Translation </li></ul><ul><ul><li>Training & Tuning Set Development </li></ul></ul><ul><ul><li>MT Quality Assessment Services </li></ul></ul><ul><ul><li>Ongoing MT Quality Improvement Services </li></ul></ul><ul><ul><li>Quick-turn Human Translation “On Demand” </li></ul></ul>
  18. 18. Project Goals & Status <ul><li>Goal: Proof of Concept </li></ul><ul><ul><li>Validate “searchability” of patent database </li></ul></ul><ul><ul><ul><li>High recall (finding what is there) </li></ul></ul></ul><ul><ul><ul><li>Acceptable precision (eliminating “noise”) </li></ul></ul></ul><ul><ul><li>Validate rate of quality improvement </li></ul></ul><ul><ul><ul><li>Utilizing Asia Online interface </li></ul></ul></ul><ul><ul><ul><li>Filling technical vocabulary gaps </li></ul></ul></ul><ul><ul><li>Validate customer acceptance </li></ul></ul><ul><li>Status </li></ul><ul><ul><li>SMT training to be complete this month </li></ul></ul><ul><ul><li>90 day quality improvement cycle to follow </li></ul></ul>
  19. 19. Contact Details <ul><li>Bob Donaldson </li></ul><ul><li>VP Strategy </li></ul><ul><li>McElroy Translation Company </li></ul><ul><li>910 West Avenue </li></ul><ul><li>Austin, TX 78701 </li></ul><ul><li>+1 (512) 472-6753 </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul>