SlideShare a Scribd company logo
1 of 22
Download to read offline
Wednesday,	
  4	
  June	
  
Beyond	
  Data:	
  	
  
Delivering	
  Machine	
  Transla;on	
  with	
  Subject	
  Ma@er	
  
Exper;se	
  
John	
  Tinsley,	
  Iconic	
  Transla1on	
  Machines	
  
TAUS	
  Machine	
  Transla;on	
  Showcase	
  2014	
  
Dublin	
  (Ireland)	
  
The	
  research	
  within	
  the	
  project	
  MosesCore	
  leading	
  to	
  these	
  results	
  has	
  received	
  funding	
  from	
  the	
  European	
  Union	
  7th	
  Framework	
  Programme,	
  grant	
  agreement	
  no	
  288487	
  
Beyond Data
Delivering Machine Translation with
Subject Matter Expertise
John Tinsley
Director / Co-Founder
TAUS MT Showcase. 4th June 2014, Dublin
We provide Machine Translation solutions
with Subject Matter Expertise
We do this using Linguistic Engineering
An “ensemble” MT architecture
The world’s first and only patent specific MT
system that’s ready to go
Data Engineering
What is Linguistic Engineering?
Pre-processing Post-processing
Input Output
Training Data
Patents: an MT nightmare
L is an organic group selected from -CH2-
(OCH2CH2)n-, -CO-NR'-, with R'=H or
C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2>
and a maximum elongation of 700 to
1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
Data Engineering
What is Linguistic Engineering?
Pre-processing Post-processing
Input Output
Training Data
Data Engineering + Linguistic Engineering
An “ensemble” architecture
Chinese pre-ordering
rules
Statistical
Post-editing
Input
Output
Training Data
Spanish med-device
entity recognizer
Multi-output
Combination
Korean pharma
tokenizer
Patent input
classifier
Client TM/terminology (optional)
Japanese script
normalisation
German
Compounding rules
Moses
RBMT
Moses
Moses
If you don’t understand it, you can’t translate it
MT with Subject Matter Expertise
“Allopurinol-induced serious cutaneous adverse
reactions (SCAR), including Steven Johnson’s syndrome
(SJS) and toxic epidermal necrolysis (TEN), are
associated with a genetic marker, the HLA-B*5801
allele.”!
“IPTranslator is perfect for someone who needs to search [patents]
across multiple languages and with is useful in the case of both
patentability and infringement searches.”
– Aalt van de Kuilen, Global Head of Patent Information, Abbott
Machine Translation for Patents
What is the value for users?
Specialist solutions deliver more useable outcomes for the user
Post-editing
For information purposes
Multilingual search
Increased productivity
Extract more meaning
Retrieve more relevant results
=
=
=
De-risking the machine translation proposition
What is the value for users?
+ Data
+ Time
+ €€€
= ???
+ No data needed
+ Systems are ready to go
+ No upfront cost
= Evaluate immediately
Our PrerequisitesTypical Prerequisites
Customisation. Refinement.
» Incorporation of user feedback
» Incremental training with post-edits
» Tuning for specific input types
Iconic in practice
client case
study
Iconic in practice
Iconic had a domain-specific MT solution for that industry
Machine Translation technology for the legal industry
Business Need
Iconic in practice
Delivered immediately and initial results were positive
Translation samples required for initial evaluation
Process (1)
Iconic in practice
“The complexities and unforeseen but inevitable surprises of MT integration
in large scale production processes were handled both competently and
efficiently.”
Integrate Iconic with GlobalSight for productivity pilot
Process (2)
Iconic in practice
>20% productivity increase for translator post-editing Iconic output
“Iconic delivered measurable productivity gains from the outset”
Performance
Iconic in practice
•  Ongoing improvement through feedback from translators
•  Ongoing improvement through the incorporation of post-edits
•  More than 4 million words translated to date for Asian languages
•  Periodic roll-out of new languages over time
Looking forward
Need: short-term solution to provide on-demand translation through
a web search interface
Iconic in practice
Process: integrate directly through Iconic API and evaluate quality
and throughput concurrently
Outcomes: in 5 months of production for English-Portuguese alone,
we processed:
•  15,526 translation requests
•  14,606,374 words
All content is not created equal
We cannot afford to be dogmatic when it
comes to MT
Domain specific MT is about more than just
data
Know your subject matter!
Take home messages…
Thank You!
john@iptranslator.com
@IconicTrans

More Related Content

Similar to TAUS MT Showcase, Beyond Data, John Tinsley, Iconic Translation Machines

Beyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseBeyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseIconic Translation Machines
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchIconic Translation Machines
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum VitaeAndy Nisbet
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning
 
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...Dr. Haxel Consult
 
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)TAUS - The Language Data Network
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGeetha982072
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Iconic Translation Machines
 
TechExpert: Service and Project for Business
TechExpert: Service and Project for BusinessTechExpert: Service and Project for Business
TechExpert: Service and Project for BusinessTechExpert
 
Technip Multidomain MDM Journey
Technip Multidomain MDM JourneyTechnip Multidomain MDM Journey
Technip Multidomain MDM JourneyOrchestra Networks
 
Stanley Wong - CV4
Stanley Wong - CV4Stanley Wong - CV4
Stanley Wong - CV4Stanley Wong
 
L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS
 
AVEVA ENGAGE 2019 Malmø - ASTICON presentation
AVEVA ENGAGE 2019 Malmø - ASTICON presentationAVEVA ENGAGE 2019 Malmø - ASTICON presentation
AVEVA ENGAGE 2019 Malmø - ASTICON presentationArne Svendsen
 
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...SpagoWorld
 
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...Lace presentation at world manufacturing forum 2014 wshop on learning gamific...
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...Fabrizio Cardinali
 
Mohamed thalha senior mes consultant Resume
Mohamed thalha  senior mes consultant ResumeMohamed thalha  senior mes consultant Resume
Mohamed thalha senior mes consultant ResumeMohamed Thalha
 
UK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationNeil Chue Hong
 
Joaquin Pe Fagundo | Technology Transfer Impact
Joaquin Pe Fagundo | Technology Transfer ImpactJoaquin Pe Fagundo | Technology Transfer Impact
Joaquin Pe Fagundo | Technology Transfer ImpactJoaquin Pe Fagundo
 

Similar to TAUS MT Showcase, Beyond Data, John Tinsley, Iconic Translation Machines (20)

Beyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter ExpertiseBeyond Data: Delivering Machine Translation with Subject Matter Expertise
Beyond Data: Delivering Machine Translation with Subject Matter Expertise
 
From the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT ResearchFrom the Lab to the Market: Commercialising MT Research
From the Lab to the Market: Commercialising MT Research
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture Overview
 
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
 
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
MMT – Modern, Next Generation Machine Translation, Achim Ruopp (TAUS)
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
 
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
Data and Linguistics: Delivering Machine Translation with Subject Matter Expe...
 
TechExpert: Service and Project for Business
TechExpert: Service and Project for BusinessTechExpert: Service and Project for Business
TechExpert: Service and Project for Business
 
Technip Multidomain MDM Journey
Technip Multidomain MDM JourneyTechnip Multidomain MDM Journey
Technip Multidomain MDM Journey
 
Stanley Wong - CV4
Stanley Wong - CV4Stanley Wong - CV4
Stanley Wong - CV4
 
L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)
 
AVEVA ENGAGE 2019 Malmø - ASTICON presentation
AVEVA ENGAGE 2019 Malmø - ASTICON presentationAVEVA ENGAGE 2019 Malmø - ASTICON presentation
AVEVA ENGAGE 2019 Malmø - ASTICON presentation
 
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
 
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...Lace presentation at world manufacturing forum 2014 wshop on learning gamific...
Lace presentation at world manufacturing forum 2014 wshop on learning gamific...
 
CETRI profile_2015_Q4
CETRI profile_2015_Q4CETRI profile_2015_Q4
CETRI profile_2015_Q4
 
Mohamed thalha senior mes consultant Resume
Mohamed thalha  senior mes consultant ResumeMohamed thalha  senior mes consultant Resume
Mohamed thalha senior mes consultant Resume
 
UK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing Participation
 
Joaquin Pe Fagundo | Technology Transfer Impact
Joaquin Pe Fagundo | Technology Transfer ImpactJoaquin Pe Fagundo | Technology Transfer Impact
Joaquin Pe Fagundo | Technology Transfer Impact
 

More from TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 

More from TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

TAUS MT Showcase, Beyond Data, John Tinsley, Iconic Translation Machines

  • 1. Wednesday,  4  June   Beyond  Data:     Delivering  Machine  Transla;on  with  Subject  Ma@er   Exper;se   John  Tinsley,  Iconic  Transla1on  Machines   TAUS  Machine  Transla;on  Showcase  2014   Dublin  (Ireland)   The  research  within  the  project  MosesCore  leading  to  these  results  has  received  funding  from  the  European  Union  7th  Framework  Programme,  grant  agreement  no  288487  
  • 2. Beyond Data Delivering Machine Translation with Subject Matter Expertise John Tinsley Director / Co-Founder TAUS MT Showcase. 4th June 2014, Dublin
  • 3. We provide Machine Translation solutions with Subject Matter Expertise
  • 4. We do this using Linguistic Engineering
  • 5. An “ensemble” MT architecture
  • 6. The world’s first and only patent specific MT system that’s ready to go
  • 7. Data Engineering What is Linguistic Engineering? Pre-processing Post-processing Input Output Training Data
  • 8. Patents: an MT nightmare L is an organic group selected from -CH2- (OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 … maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C. Long Sentences Technical constructions Largest single document: 249,322 words Longest Sentence: 1,417 words
  • 9. Data Engineering What is Linguistic Engineering? Pre-processing Post-processing Input Output Training Data
  • 10. Data Engineering + Linguistic Engineering An “ensemble” architecture Chinese pre-ordering rules Statistical Post-editing Input Output Training Data Spanish med-device entity recognizer Multi-output Combination Korean pharma tokenizer Patent input classifier Client TM/terminology (optional) Japanese script normalisation German Compounding rules Moses RBMT Moses Moses
  • 11. If you don’t understand it, you can’t translate it MT with Subject Matter Expertise “Allopurinol-induced serious cutaneous adverse reactions (SCAR), including Steven Johnson’s syndrome (SJS) and toxic epidermal necrolysis (TEN), are associated with a genetic marker, the HLA-B*5801 allele.”! “IPTranslator is perfect for someone who needs to search [patents] across multiple languages and with is useful in the case of both patentability and infringement searches.” – Aalt van de Kuilen, Global Head of Patent Information, Abbott Machine Translation for Patents
  • 12. What is the value for users? Specialist solutions deliver more useable outcomes for the user Post-editing For information purposes Multilingual search Increased productivity Extract more meaning Retrieve more relevant results = = =
  • 13. De-risking the machine translation proposition What is the value for users? + Data + Time + €€€ = ??? + No data needed + Systems are ready to go + No upfront cost = Evaluate immediately Our PrerequisitesTypical Prerequisites Customisation. Refinement. » Incorporation of user feedback » Incremental training with post-edits » Tuning for specific input types
  • 15. Iconic in practice Iconic had a domain-specific MT solution for that industry Machine Translation technology for the legal industry Business Need
  • 16. Iconic in practice Delivered immediately and initial results were positive Translation samples required for initial evaluation Process (1)
  • 17. Iconic in practice “The complexities and unforeseen but inevitable surprises of MT integration in large scale production processes were handled both competently and efficiently.” Integrate Iconic with GlobalSight for productivity pilot Process (2)
  • 18. Iconic in practice >20% productivity increase for translator post-editing Iconic output “Iconic delivered measurable productivity gains from the outset” Performance
  • 19. Iconic in practice •  Ongoing improvement through feedback from translators •  Ongoing improvement through the incorporation of post-edits •  More than 4 million words translated to date for Asian languages •  Periodic roll-out of new languages over time Looking forward
  • 20. Need: short-term solution to provide on-demand translation through a web search interface Iconic in practice Process: integrate directly through Iconic API and evaluate quality and throughput concurrently Outcomes: in 5 months of production for English-Portuguese alone, we processed: •  15,526 translation requests •  14,606,374 words
  • 21. All content is not created equal We cannot afford to be dogmatic when it comes to MT Domain specific MT is about more than just data Know your subject matter! Take home messages…