TAUS	  MACHINE	  TRANSLATION	  SHOWCASE	  A Small LSP’s Guide ToCommercialized Open Source SMT15:30 – 15:50Wednesday, 10 A...
A Small LSPs GuideTo Commercialized Open Source SMT           From 28 years       of corpus exploitation                  ...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
Origin of MT? ●    … the problem of translation could      conceivably be treated as a problem in      cryptography. When ...
Origin of Pessimism? ●    … as to the problem of mechanical      translation, I frankly am afraid the      boundaries of w...
Sharing An Experience ●    ESL/EFL student:       –    “What does wanton mean?” ●    Teacher:       –    “Where did you se...
Working With “Meaning” ●    CONTEXT + CONTENT = MEANING ●    Context: the container       –    i.e. domain, subject, usage...
The bird swam to its nest. ●    ESL/EFL students: “The meaning is      wrong.” ●    Teacher: “Vocabulary, spelling, gramma...
Context Is Determinative ●    Possible solution:       –    The bird is a duck – or swan, goose, penguin,            cormo...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
Disclaimer ●    Speaker does not have a PhD ●    Results from the School of Hard Knocks,      Faculty of Scientific Repeti...
Precision Translation Tools ●    Software publisher       –    Founded in Feb 2010, Bangkok, Thailand       –    Not a tra...
Customers ●    Current       –    ~300 customers/users       –    30 countries ●    Target       –    Small & medium LSPs ...
Mission ●    Make statistical machine translation tools      available to everyone with       –    Open source foundation ...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
7 Fundamental Assumptions ●    These are essential if SMT is to work. ●    They can not be proven. ●    They can only be o...
SMT Assumption 1 ●    Most of the time, most authors create      content with appropriate       –    Vocabulary       –   ...
SMT Assumption 2 ●    Most of the time, most translators create      translations with appropriate       –    Vocabulary  ...
SMT Assumption 3 ●    In large collections of original content,      fragments repeat proportionately to their      occurr...
SMT Assumption 4 ●    In large collections of translations of      original content, the translations mirror the      repe...
SMT Assumptions 5 & 6 ●    Repetitions in past “original content” will      repeat in future content in the same      prop...
SMT Assumption 7 ●    “Exceptions” are exceptions because they      dont follow normative rules.       –    If there’s a r...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
Machine Learning ●    Borrow content from a library ●    Study the content ●    Retain residual knowledge in memory ●    R...
Statistical Machine Translation                SMT Model                               ●    Artificial Intelligence       ...
De afbeelding kan niet worden weergegeven. Mogelijk is er onvoldoende geheugen beschikbaar om de afbeelding te openen of i...
What is a model?12 april 2013   2012 © Precision Translation Tools Co., Ltd.   27
What is a model? ●    A representation of an original that      maintain the original’s proportions,      likeness, etc. ●...
Examples of Statistical Models                                       ●    Financial models                                ...
Examples of Statistical Models                                       ●    Financial models                                ...
Examples of Statistical Models                                       ●    Financial models                                ...
Examples of Statistical Models                                       ●    Financial models                                ...
Proportions Matter                                         ●    Barbie                                                    ...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
SMT Statistical Model                SMT Model                               1. Make SMT model                Configuratio...
Train Translation Model                                     Original Content los pájaros verdes vuelan rápidamente los páj...
Train Language Model                  Target Content            green birds fly    quickly            red birds fly    to ...
Tune SMT Model[ttable-file]0 0 5 ${path}/phrase-table.gz                      domt train-mert[distortion-file]            ...
SMT Statistical Model                SMT Model                               1. Make SMT model                Configuratio...
SMT Model In Use                                      ●    Step 1                                  domt translate         ...
SMT Model In Use    ●    Setp 2                                                  Language model                           ...
SMT Model In Use    ●    Step 3                                                                 The highest score is      ...
Is This Familiar? ●    You have a difficult sentence to translate ●    Despite your training and skills, you      create 4...
What Drives You? ●    How do you make your decision when all      these things are equally “right”       –    Meaning     ...
Feeling and Familiarity ●    The one that feels familiar       –    Familiarity comes from frequency ●    SMT emulates thi...
Stimulus ●    “los pájaros negros nadan con gracia” ●    English possibilities generated       –    green birds swim grace...
Human Response ●    “black birds swim gracefully”       –    I’m familiar with swans as black birds that            swim g...
SMT Response ●    “black birds swim gracefully”       –    All tokens are familiar because they’re in the            table...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
Initial Challenges ●    Requires millions of pairs ●    Requires expensive, powerful hardware ●    Lacks trained user base...
Market Response ●    Private SaaS Portals                         Integrators & Consultants       –    Asia Online        ...
Learned Challenges ●    Customizing models requires possession      and control of TMs       –    Users dont entrust TMs t...
Updated Challenges ●    Requires millions of pairs ●    Requires expensive, powerful hardware ●    Lacks trained user base...
Productivity As Quality ●    Customers want quality       –    Cant define it for computers to test for it ●    All automa...
2012 Serendipitous Discovery ●    Dont need millions of sentence pairs      within a constrained domain ●    PTTools custo...
Productivity As Quality ●    Where does productivity begin? ●    How many 100% matches make      productivity gain inevita...
Adjusted Challenges ●    150,000 millions of pairs      Requiresto 300,000 can work fine ●    Less than professional graph...
Market Response Revisited ●    Portals, Full Service, Experts       –    Perpetuate perception of complexity       –    Co...
Agenda ●    Introduction ●    Who is PTTools? ●    Fundamental Assumptions ●    Models and Proportions ●    SMT Statistica...
Acknowledgements ●    Precision Translation Tools                    DoMT	   ®	   ●    Prompsit Language Engineering ●    ...
Upcoming SlideShare
Loading in …5
×

TAUS MT SHOWCASE, A Small LSP’s Guide to Commercialized Open Source SMT, Tom Hoar, Precision Translation Tools, 10 April 2013

1,110 views

Published on

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.

For the latest updates, follow us on Twitter - #MosesCore

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,110
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
42
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TAUS MT SHOWCASE, A Small LSP’s Guide to Commercialized Open Source SMT, Tom Hoar, Precision Translation Tools, 10 April 2013

  1. 1. TAUS  MACHINE  TRANSLATION  SHOWCASE  A Small LSP’s Guide ToCommercialized Open Source SMT15:30 – 15:50Wednesday, 10 April 2013Tom HoarPrecision Translation Tools
  2. 2. A Small LSPs GuideTo Commercialized Open Source SMT From 28 years of corpus exploitation Tom Hoar Precision Translation Tools
  3. 3. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 3
  4. 4. Origin of MT? ●  … the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” ●  March 4, 1947 ●  From: Warren Weaver, Mathematician Rockefeller ●  To: Norbert Wiener, Professor of Mathematics MIT12 april 2013 2012 © Precision Translation Tools Co., Ltd. 4
  5. 5. Origin of Pessimism? ●  … as to the problem of mechanical translation, I frankly am afraid the boundaries of words in different languages are too vague and the emotional and international connotations are too extensive to make any quasi mechanical translation scheme very hopeful. ●  April 30, 1947 (day 56 later) ●  Norbert Wiener, Professor of Mathematic MIT12 april 2013 2012 © Precision Translation Tools Co., Ltd. 5
  6. 6. Sharing An Experience ●  ESL/EFL student: –  “What does wanton mean?” ●  Teacher: –  “Where did you see it?” –  “How was it used?” ●  Despite this, students learn that meaning comes from vocabulary, spelling, grammar, syntax12 april 2013 2012 © Precision Translation Tools Co., Ltd. 6
  7. 7. Working With “Meaning” ●  CONTEXT + CONTENT = MEANING ●  Context: the container –  i.e. domain, subject, usage, purpose, culture ●  Content: anything in the container –  i.e. vocabulary, spelling, grammar, syntax, punctuation, style12 april 2013 2012 © Precision Translation Tools Co., Ltd. 7
  8. 8. The bird swam to its nest. ●  ESL/EFL students: “The meaning is wrong.” ●  Teacher: “Vocabulary, spelling, grammar, syntax, punctuation are all correct. Why is the meaning wrong?” –  Students are confused ●  Homework: Fix the meaning without changing the contents.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 8
  9. 9. Context Is Determinative ●  Possible solution: –  The bird is a duck – or swan, goose, penguin, cormorant, etc. ●  Lesson? –  Change the container – change the meaning –  Machines can’t search for a greater context ●  Only humans can ●  How often do we look beyond the obvious?12 april 2013 2012 © Precision Translation Tools Co., Ltd. 9
  10. 10. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 10
  11. 11. Disclaimer ●  Speaker does not have a PhD ●  Results from the School of Hard Knocks, Faculty of Scientific Repetition ●  Only affiliation with Moses team is a user12 april 2013 2012 © Precision Translation Tools Co., Ltd. 11
  12. 12. Precision Translation Tools ●  Software publisher –  Founded in Feb 2010, Bangkok, Thailand –  Not a translation services provider –  Software, training and support ●  “Do” Machine Translation ●  “Do” Moses Yourself Community Edition (free) ●  Senior managers over 75 years serving translation professionals and user documentation12 april 2013 2012 © Precision Translation Tools Co., Ltd. 12
  13. 13. Customers ●  Current –  ~300 customers/users –  30 countries ●  Target –  Small & medium LSPs (2-20 persons) –  Translators ●  Accomplishments –  First Maori – English SMT system –  First English – Khmer12 april 2013 2012 © Precision Translation Tools Co., Ltd. 13
  14. 14. Mission ●  Make statistical machine translation tools available to everyone with –  Open source foundation –  Simplified usability –  User education and training –  Autonomous ecosystems –  Intellectual property protection12 april 2013 2012 © Precision Translation Tools Co., Ltd. 14
  15. 15. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 15
  16. 16. 7 Fundamental Assumptions ●  These are essential if SMT is to work. ●  They can not be proven. ●  They can only be observed through the success or failure of an SMT system.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 16
  17. 17. SMT Assumption 1 ●  Most of the time, most authors create content with appropriate –  Vocabulary –  Spelling –  Grammar –  Syntax –  Punctuation –  Style12 april 2013 2012 © Precision Translation Tools Co., Ltd. 17
  18. 18. SMT Assumption 2 ●  Most of the time, most translators create translations with appropriate –  Vocabulary –  Spelling –  Grammar –  Syntax –  Punctuation –  Style12 april 2013 2012 © Precision Translation Tools Co., Ltd. 18
  19. 19. SMT Assumption 3 ●  In large collections of original content, fragments repeat proportionately to their occurrences in the real world green birds fly quickly red birds fly to the nest white birds swim across the pond yellow birds eat sunflower seeds black birds eat yellow corn white birds swim gracefully black birds hover over the nest pink birds stand on one leg pink birds eat orange shrimp grey birds stand in the nest12 april 2013 2012 © Precision Translation Tools Co., Ltd. 19
  20. 20. SMT Assumption 4 ●  In large collections of translations of original content, the translations mirror the repetitions in the original content los pájaros verdes vuelan rápidamente los pájaros rojos vuelan al nido los pájaros blancos nadan en el estanque los pájaros amarillos comen semillas de girasol los pájaros negros comen maíz amarillo los pájaros blancos nadan con gracia los pájaros negros se ciernen sobre el nido los pájaros rosados se aguantan sobre una sola pierna los pájaros rosados comen camarones naranjas los pájaros grises están en el nido12 april 2013 2012 © Precision Translation Tools Co., Ltd. 20
  21. 21. SMT Assumptions 5 & 6 ●  Repetitions in past “original content” will repeat in future content in the same proportions. ●  Mirrored repetitions in past translations of “original content” will repeat in future content in the same proportions.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 21
  22. 22. SMT Assumption 7 ●  “Exceptions” are exceptions because they dont follow normative rules. –  If there’s a rule for a so-called exception, it is a rule not an exception. –  “Exceptions” occur less frequently than “norms.” Therefore, they do not significantly impact the proportions or frequency of repetitions in the large collections.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 22
  23. 23. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 23
  24. 24. Machine Learning ●  Borrow content from a library ●  Study the content ●  Retain residual knowledge in memory ●  Return the content to the library ●  Organize and optimize the knowledge ●  Recall and use the residual knowledge to predict future event12 april 2013 2012 © Precision Translation Tools Co., Ltd. 24
  25. 25. Statistical Machine Translation SMT Model ●  Artificial Intelligence Configuration Translation Model ●  Study = Train Language Model ●  Memory = Tables Reordering Optimize = Tune Phrase ●  Table Table ●  Predict = Translate12 april 2013 2012 © Precision Translation Tools Co., Ltd. 25
  26. 26. De afbeelding kan niet worden weergegeven. Mogelijk is er onvoldoende geheugen beschikbaar om de afbeelding te openen of is de afbeelding beschadigd. Start de computer opnieuw op en open het bestand opnieuw. Als de afbeelding nog steeds wordt voorgesteld door een rode X, kunt u de afbeelding verwijderen en opnieuw invoegen. What is a model? De afbeelding kan niet worden weergegeven. Mogelijk is er onvoldoende geheugen beschikbaar om de afbeelding te openen of is de afbeelding beschadigd. Start de computer opnieuw op en open het bestand opnieuw. Als de afbeelding nog steeds wordt voorgesteld door een rode X, kunt u de afbeelding verwijderen en opnieuw invoegen.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 26
  27. 27. What is a model?12 april 2013 2012 © Precision Translation Tools Co., Ltd. 27
  28. 28. What is a model? ●  A representation of an original that maintain the original’s proportions, likeness, etc. ●  A working model replicates or emulates the functions of the original ●  A statistical model is a working model –  Uses statistical data to “do” something –  Statistical data = numbers about the past –  “Do” something = predict the future12 april 2013 2012 © Precision Translation Tools Co., Ltd. 28
  29. 29. Examples of Statistical Models ●  Financial models predict account balances12 april 2013 2012 © Precision Translation Tools Co., Ltd. 29
  30. 30. Examples of Statistical Models ●  Financial models predict account balances ●  Weather models predict hurricanes12 april 2013 2012 © Precision Translation Tools Co., Ltd. 30
  31. 31. Examples of Statistical Models ●  Financial models predict account balances ●  Weather models predict hurricanes ●  Traffic models predict traffic jams12 april 2013 2012 © Precision Translation Tools Co., Ltd. 31
  32. 32. Examples of Statistical Models ●  Financial models predict account balances ●  Weather models predict hurricanes ●  Traffic models predict traffic jams ●  SMT models predict translations12 april 2013 2012 © Precision Translation Tools Co., Ltd. 32
  33. 33. Proportions Matter ●  Barbie ●  Height 60" ●  Weight 100 lbs. ●  Size 4 ●  39" x 21" x 33" ●  Distorted likeness ●  >15% of segments in EuroParl are parliamentary protocol12 april 2013 2012 © Precision Translation Tools Co., Ltd. 33
  34. 34. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 34
  35. 35. SMT Statistical Model SMT Model 1. Make SMT model Configuration from “original content” Translation Model Language Model 2. Use SMT model to translate new Reordering Phrase Table Table content (predict translations) without the “original content”12 april 2013 2012 © Precision Translation Tools Co., Ltd. 35
  36. 36. Train Translation Model Original Content los pájaros verdes vuelan rápidamente los pájaros rojos vuelan al nido green birds flyquickly red birds fly tothe nest ●  domt train-tm los pájaros blancos nadan en el estanque los pájaros amarillos comen semillas de girasol los pájaros negros comen maíz amarillo white birds swimacross the pond yellow birds eatsunflower seeds black birds eatyellow corn train-model.perl los pájaros blancos nadan con gracia white birds swimgracefully los pájaros negros se ciernen sobre el nido los pájaros rosados se aguantan sobre una sola pierna los pájaros rosados comen camarones naranjas black birds hover over the nest pink birds stand on one leg pink birds eatorange shrimp ●  Count frequencies los pájaros grises están en el nido grey birds stand in the nest of sentence fragment pairs PhraseTableSource language (stimulus) Target language (response) Probabilitylos pájaros birds 50%los birds 50%negros black 50% One or more tablespájaros negros black 50%los pájaros negros black birds 100%los pájaros negros comenlos pájaros negros come n maíz black birds eat black birds eat yellow 100% 100% ● los pájaros negros comen maíz amarillo black birds eat yellow corn 100%pájaros verdes green 50%verdeslos pájaros verdeslos pájaros verdes vuelan green green birds green birds fly 50% 100% 100% –  Can reach 15 GB eachlos pájaros verdes vuelan rápidamente green birds fly quickly 100%grises grey 50%pájaros grises grey 50%los pájaros grises grey birds 100%los pájaros grises están grey birds stand 100%los pájaros grises están en grey birds stand in 100%los pájaros grises están e n el grey birds stand in the 100%los pájaros grises están en el nido grey birds stand in the nest 100% 12 april 2013 2012 © Precision Translation Tools Co., Ltd. 36
  37. 37. Train Language Model Target Content green birds fly quickly red birds fly to the nest white birds swim yellow birds eat across the pond sunflower seeds ●  domt train-lm black birds eat white birds swim yellow corn gracefully black birds hover over the nest build-lm.sh pink birds stand on one leg pink birds eat orange shrimp ●  Count frequencies of sentence grey birds stand in the nest Language Model 2-grams : -1.30713 -0.265492 <s> green green birds fragments in target language -0.850518 birds fly -0.677087 birds eat 3-grams : -0.112767 <s> green birds One or more tables -0.421503 birds fly quickly -0.592076 birds eat yellow 4-grams : ●  -0.10498 <s> green birds fly Can reach 25 GB -0.0527335 birds fly quickly </s> -0.0570311 5-grams : birds eat orange shrimp –  -0.0732878 -0.0274306 -0.0474597 <s> green birds fly quickly birds fly to the nest birds swim across the pond each -0.0255669 birds eat yellow corn </s>12 april 2013 2012 © Precision Translation Tools Co., Ltd. 37
  38. 38. Tune SMT Model[ttable-file]0 0 5 ${path}/phrase-table.gz domt train-mert[distortion-file] mert-moses.pl0-0 msd-bidirectional-fe 6 ${path}/reordering-table.gz[lmodel-file] Creates optimal0 0 3 ${path}/irstlm_arpa.en.gz[weight-t] settings for the0.169891 components to0.0856206-0.0664389 work together0.0489578 Configuration file0.0018491[ttable-limit] defines paths to20 files and stores optimal settings12 april 2013 2012 © Precision Translation Tools Co., Ltd. 38
  39. 39. SMT Statistical Model SMT Model 1. Make SMT model Configuration from “original content” Translation Model Language Model 2. Use SMT model to translate new Reordering Phrase Table Table content (predict translations) without the “original content”12 april 2013 2012 © Precision Translation Tools Co., Ltd. 39
  40. 40. SMT Model In Use ●  Step 1 domt translate moses -f config Translation model Translation Model creates thousandslos pájaros negros nadan con gracia possible sentences Reordering 1 green birds swim gracefully Phrase 2 red birds swim gracefully Table Table 3 black birds swim gracefully 4 yellow birds swim gracefully 5 birds yellow fly green corn 6 red corn eats white pond ... 10,000 pink birds swim gracefully 12 april 2013 2012 © Precision Translation Tools Co., Ltd. 40
  41. 41. SMT Model In Use ●  Setp 2 Language model scores each possible sentence Language Model1 green birds swim gracefully 0.382 red birds swim gracefully 0.323 black birds swim gracefully 0.844 yellow birds swim gracefully 0.745 birds yellow fly green corn 0.076 red corn eats white pond 0.02… …10,000 pink birds swim gracefully 0.57 12 april 2013 2012 © Precision Translation Tools Co., Ltd. 41
  42. 42. SMT Model In Use ●  Step 3 The highest score is most probable and selected as the translation black birds swim gracefully3 black birds swim gracefully 0.84 12 april 2013 2012 © Precision Translation Tools Co., Ltd. 42
  43. 43. Is This Familiar? ●  You have a difficult sentence to translate ●  Despite your training and skills, you create 4 or 5 possible translations with different words and word orders. ●  You struggle –  Which one is “right?” –  Which is the “best?” ●  You have to pick one or you dont get paid.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 43
  44. 44. What Drives You? ●  How do you make your decision when all these things are equally “right” –  Meaning –  Grammar –  Syntax –  Etc. ●  You have to pick one or you dont get paid.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 44
  45. 45. Feeling and Familiarity ●  The one that feels familiar –  Familiarity comes from frequency ●  SMT emulates this process –  SMT can generate 10,000-20,000 possibilities. Computers are good at that; people aren’t. –  SMT calculates the probabilities for each one. Computers aren’t good at feelings.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 45
  46. 46. Stimulus ●  “los pájaros negros nadan con gracia” ●  English possibilities generated –  green birds swim gracefully –  red birds swim gracefully –  black birds swim gracefully –  yellow birds swim gracefully –  pink birds swim gracefully12 april 2013 2012 © Precision Translation Tools Co., Ltd. 46
  47. 47. Human Response ●  “black birds swim gracefully” –  I’m familiar with swans as black birds that swim gracefully. –  I’m familiar with yellow and pink birds that swims, but they don’t swim gracefully. –  I’m not familiar with green or red birds that swim at all.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 47
  48. 48. SMT Response ●  “black birds swim gracefully” –  All tokens are familiar because they’re in the tables. –  The fragment “black birds swim” is the most familiar because it occurs most frequently; therefore it scores highest. –  The sentence scored highest because its fragments are in the language model more frequently.12 april 2013 2012 © Precision Translation Tools Co., Ltd. 48
  49. 49. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 49
  50. 50. Initial Challenges ●  Requires millions of pairs ●  Requires expensive, powerful hardware ●  Lacks trained user base ●  Faces hostile target users ●  Faces criticism from experts ●  Lacks professional features12 april 2013 2012 © Precision Translation Tools Co., Ltd. 50
  51. 51. Market Response ●  Private SaaS Portals Integrators & Consultants –  Asia Online CrossLang –  SDL 1 Digital Silk Road –  Safaba PangeaMT –  Lets MT Asia Online –  Tauyou Safaba –  Firma8 SDL 1 –  KantanaMT IBM –  SmartMATE Systran 2 –  Straker Translations LexWorks 2 –  Cloudwords Prompsit Language Engineering 3 –  AVB Translations Software Publishers –  Lingo24 Systran 2 –  MemSource ProMT 3 –  Translated.net Precision Translation Tools –  Trusted Translations Notes: –  XTM International 1 = LanguageWeaver not Open Source 2 = SYSTRAN Server, RbMT with Moses 3 = RbMT & SMT options12 april 2013 2012 © Precision Translation Tools Co., Ltd. 51
  52. 52. Learned Challenges ●  Customizing models requires possession and control of TMs –  Users dont entrust TMs to portals –  Perception theyre subsidizing competitors ●  Portals must continuously create models –  Overhead for each new model –  No portal has talent for every language –  Revert to customers talents12 april 2013 2012 © Precision Translation Tools Co., Ltd. 52
  53. 53. Updated Challenges ●  Requires millions of pairs ●  Requires expensive, powerful hardware ●  Lacks trained user bases ●  Faces hostile, untrained target users ●  Faces criticism from experts ●  Lacks professional features ●  “Trusted 3rd parties” dont exist ●  Continual need for new models12 april 2013 2012 © Precision Translation Tools Co., Ltd. 53
  54. 54. Productivity As Quality ●  Customers want quality –  Cant define it for computers to test for it ●  All automated quality scoring systems require human reference translations ●  100% match = raw SMT is identical to independent human translations, not post- edited translations12 april 2013 2012 © Precision Translation Tools Co., Ltd. 54
  55. 55. 2012 Serendipitous Discovery ●  Dont need millions of sentence pairs within a constrained domain ●  PTTools customers with 130K to 300K segments achieve 100% matches on 20-40% of SMT output ●  Lets MT reports similar corpus sizes produce 20% productivity gains ●  Tauyou reports a few as 50K segments result in customer satisfaction with productivity gains12 april 2013 2012 © Precision Translation Tools Co., Ltd. 55
  56. 56. Productivity As Quality ●  Where does productivity begin? ●  How many 100% matches make productivity gain inevitable? Quality vs. <100% Match 100% Match Annual Preparation Productivity (Post-editing) (Productivity) TCO Time RbMT 90 – 95% 5% – 10% $150,000 2 – 3 weeks SMT Pre 2007 > 99% < 1% $10,000 1 – 3 weeks SMT 2007 to 2008 > 99% < 1% $6,000 5 – 14 days SMT 2009 to 2011 90% – 95% 5% – 10% $1,500 2 – 7 days SMT 2012 *60% – 80% *20% – 40% $1,200 6 – 48 hours * actual customer experience12 april 2013 2012 © Precision Translation Tools Co., Ltd. 56
  57. 57. Adjusted Challenges ●  150,000 millions of pairs Requiresto 300,000 can work fine ●  Less than professional graphic arts Requires expensive, powerful hardware ●  Professionals pay bases Lacks trained userfor training courses ●  Attitudes are proportionate to benefits Faces hostile, untrained target users ●  Early criticism from experts Facesexperts liquidate ●  New versions add features Lacks professionalnew features ●  “Trusted 3rd parties” dont exist ●  Continual need for new models12 april 2013 2012 © Precision Translation Tools Co., Ltd. 57
  58. 58. Market Response Revisited ●  Portals, Full Service, Experts –  Perpetuate perception of complexity –  Control models created with free technology –  Protect investments ●  If juke boxes and radio stations preceded phonographs, what would today’s music industry sell? –  (a) CD’s –  (b) pay-per-play MP3s and digital radio?12 april 2013 2012 © Precision Translation Tools Co., Ltd. 58
  59. 59. Agenda ●  Introduction ●  Who is PTTools? ●  Fundamental Assumptions ●  Models and Proportions ●  SMT Statistical Models ●  New Perspective ●  Acknowledgements12 april 2013 2012 © Precision Translation Tools Co., Ltd. 59
  60. 60. Acknowledgements ●  Precision Translation Tools DoMT   ®   ●  Prompsit Language Engineering ●  Tauyou ●  Safaba Translation Solutions ●  LetsMT! by Tilde ●  Digital Silk Road ●  PangeaMT by Pangeanic ●  CrossLang ●  KantanMT ●  Lingo2412 april 2013 2012 © Precision Translation Tools Co., Ltd. 60

×