TAUS Scotland Asia Online Technology Platform V1


Published on

TAUS Edinburgh Conf Presentation

  • Be the first to comment

  • Be the first to like this

TAUS Scotland Asia Online Technology Platform V1

  1. 1. TM Translation Technology Platform Kirti Vashee VP Sales, Asia Online Kirti.vashee@asiaonline.net
  2. 2. Revolutionize the enterprise Revolutionize the Internet translation process with a experience for non-English comprehensive, continuous speakers in Asia learning SMT platform Provide 1 billion+ local-language pages online SaaS environment that allows data using mostly translated open license content, cleaning and preparation, develop SMT combined with compelling portal and social networking style services in Thailand, engines on demand and enable ongoing Indonesia, India, Malaysia, Philippines, comprehensive post editing and correction Vietnam and China, Japan & Korea to continuously improve engines The Consumer Market The Enterprise Market Large Buyer & Translation Tools Publisher Perspective Vendor Perspective TM Copyright © 2008, All Rights Reserved
  3. 3. • The only SMT technology provider that is also a major user of ALT technology on one of the largest translation projects in the world - English Wikipedia (1B Words+) into 11 Asian languages using SMT and crowdsourcing • The translation tools and technology platform used to accomplish this, is also being made available as a SaaS product for the enterprise translation market TM Copyright © 2008, All Rights Reserved
  4. 4. Battlefield of words Fusion with customer support Continuous translation Community translation Industry-shared language data Massive online collaboration Translation automation TM Copyright © 2008, All Rights Reserved
  5. 5. Interactive Support: EMAIL Knowledge Knowledge Instant Base Base Data Messaging User Manuals User Generated Voice Support Content Blogs Documentation User Interactive Manual Support • Web 2.0 is much more interactive and dynamic • Globalization will be further driven by internet penetration into Asia • Word-of-mouth-marketing gaining prominence all over the world • Unstructured content in blogs, review sites is becoming critical • The dialogue with global customer needs to be more interactive TM Copyright © 2008, All Rights Reserved
  6. 6. Continuous Improvement HDSMT Engines Sales / Blogs Marketing CRM Product Biz Intelligence Management Human Content Resources Management ECM BPM The Global Customer CRM Email Customer Support IM • Highly adaptive human driven process for continuous output quality improvement in SMT engines and translation automation • Intensive Collaboration with human translators to raise quality of SMT • Integration with content creation and content refinement tools to enhance speed and improve business process management • Continued evolution in standards to facilitate sharing linguistic assets TM Copyright © 2008, All Rights Reserved
  7. 7. • Comprehensive SaaS Platform that facilitates the translation and continued refinement of any large high value translatable corpus using HDSMT • Existing Feature Set – Data Cleaning & Preparation Tools – On Demand SMT engine development – Support for both user created and online dictionaries and glossaries – Ability to pool data for greater leverage – Multiple level domain support – Seamless integration with collaborative post-editing environment – Real time updates of translated assets – Web Services based APIs for integration • System and process foundation for managed online community collaboration TM Copyright © 2008, All Rights Reserved
  8. 8. • Bilingual Data Preparation & Cleaning • Bilingual Data Normalization & Optimization • Source Cleanup and Preparation Data • Grammar and Spelling validation Management • Monolingual Data Extraction & Analysis • SMT System Training & Development • Monolingual Data Training • Ongoing Corpus Refinement and Tuning SMT Engine • Analysis and Evaluation of Ngrams • Error Pattern Identification & Correction • Automated error correction tools Output • Continuing Cycle of Exception Identification and Correction Proofing & • Development of small sets of new data to correct errors Editing TM Copyright © 2008, All Rights Reserved
  9. 9. TM Copyright © 2008, All Rights Reserved
  10. 10. • Data Cleaning Utilities to normalize and standardize data prior to consolidation to provide maximum leverage • Recent study for TAUS proves conclusively that sharing clean data provides leverage – Smaller amount of clean data can produce better results than datasets even 2X larger – Consistent Terminology matters and provides real leverage – Data optimized for TM Tools can be “dirty data “ for SMT TM Copyright © 2008, All Rights Reserved
  11. 11. Initial System put into production Changes are collected Trained Internal and added to initial Experts begin initial corpus to drive clean up and correction continuous retraining process All users allowed to Expert Users also suggest changes which allowed to make go through vetting changes Community process TM Copyright © 2008, All Rights Reserved
  12. 12. Targeted Corrections Initial System of Bad Learning Spelling & Terminology Correct Mistranslation Syntax/Grammar Terminology Spelling Punctuation Human Feedback can raise the raw output to previously unseen quality levels TM Copyright © 2008, All Rights Reserved
  13. 13. TM Copyright © 2008, All Rights Reserved
  14. 14. TM Copyright © 2008, All Rights Reserved
  15. 15. Information Requests Data Training GetAccountInformation CancelTrainingJob GetAccountUsageHistory GetTrainingJobList GetAvailableDomainCombinationsForLanguagePair GetTrainingJobStatus GetAvailableDomainsForLanguagePair SubmitDatasetForTraining GetAvailableLanguagePairs Data Preparation GetCustomDomainsForLanguagePair CleanText Data Storage ExtractText CreateDataset NormalizeText DeleteDataset OCRImage DeleteDataFromDataset ParagraphAlignLanguagePairText DownloadDataset SentenceAlignLanguagePairText DownloadDatasetItem SentenceSegmentText GetDatasetList SpellCheckText GetDatasetItemList WordSegmentText LinkDataToDataset Translation MergeDatasets CancelTranslationJob UploadData GetTranslationJobList UploadGlossary GetTranslationJobStatus UploadImage SubmitDatasetForTranslation UploadLanguageModel SubmitSinglePhraseForTranslation UploadMonolingualText UploadOCRPageLayout sUsername String The username of the person making the request. UploadPhrasePairs sPassword String The password of the person making the request. UploadTranslationMemory iAccountNo Integer The account number that this request is associated with. UploadZIP iDepartmentNo Integer The department number that this request is associated with. iLanguagePairCode Integer The code for the language pair that is being looked up. TM Copyright © 2008, All Rights Reserved
  16. 16. TM Copyright © 2008, All Rights Reserved
  17. 17. TM Copyright © 2008, All Rights Reserved
  18. 18. TM Copyright © 2008, All Rights Reserved
  19. 19. TM Copyright © 2008, All Rights Reserved
  20. 20. TM Copyright © 2008, All Rights Reserved
  21. 21. Provide existing human translated content for training language engines Translation Systems User Publishers Constant User accesses Improvement online content in Social Networks / local language Community Leverage ASP Translation service Translated content proof for translation of read using community new material principles and paid proof readers using Asia Online proof reading system Proof reading still required whether human or machine New translation translations sent back to publisher Translated Translation Asia Online content made Translated Content SaaS Portal available to users Human Proof Readers Translations are proof read via ASP Original Content translated proof reading system to local language Original Content TM Copyright © 2008, All Rights Reserved
  22. 22. • Integrated data cleaning, data preparation, SMT systems development and post-editing environment • Comprehensive proof-reading and post-editing environment that is integrated with core SMT engines to enable instant updates Greater Control & Better systems • Greater transparency of many key SMT building blocks to enable users to see and modify what the system has learnt resulting in greater control and better systems • A richer and deeper taxonomy for domains to ensure the best quality Better systems • Incremental additions of new training data to any existing system to enable rapid updates Faster updates • Easy handling of terminology, glossaries, dictionaries TM Copyright © 2008, All Rights Reserved
  23. 23. TM Kirti Vashee VP Sales, Asia Online kirti.vashee@asiaonline.net