SlideShare a Scribd company logo
1 of 56
Download to read offline
Copyright © 2014, Asia Online Pte Ltd 
High volume, High Quality Patent Translation across Multiple Domains 
Dion Wiggins Chief Executive Officer dion.wiggins@asiaonline.net
Copyright © 2014, Asia Online Pte Ltd 
•Language Studio™ is a language processing platform, not just a translation tool 
•We currently support 534 language pairs 
•Our very first customer was LexisNexis Univentio in 2008 
–Our first commercial engine was translating Japanese patents into English 
•Not all customers are in the patent space, but patents are the most complex content that we have ever encountered
Copyright © 2014, Asia Online Pte Ltd 
•Collectively our customers are translating more than 2 billion words per day 
•One single customer is translating more than 1 billion words a day of patent content 
•Our highest rate of throughput required by a customer (government) to date is 600 million words per minute 
–Yes, we can support this volume if you can provide the hardware – approx. 25K CPU cores 
–Currently being designed and architected ahead of deployment
Copyright © 2014, Asia Online Pte Ltd 
•Equivalent of 20 million four drawer filing cabinets filled with text. 
•The volume of data is expected to increase by 20 times by 2020.
Copyright © 2014, Asia Online Pte Ltd 
•Equivalent of 20 million four drawer filing cabinets filled with text. 
•The volume of data is expected to increase by 20 times by 2020.
Copyright © 2014, Asia Online Pte Ltd 
A method of distilling a polymerizable vinyl compound selected from the group consisting of acrolein, methacrolein, acrylic acid, methacrylec acid, hydroxyethyl acrylate, hydroxyethyl methacrylate, hydroxypropyl acrylate, hydroxypropyl methacrylate, glycidyl acrylate and glycidyl methacrylate, the method comprising distilling the polymerizable vinyl compound in the presence of a polymerization inhibitor using a distillation tower having perforated trays without downcomers and wherein the temperature of the inner wall of the tower is maintained at a temperature sufficient to prevent the condensation of the vapor being distilled, whereby the polymerizable vinyl compound is distilled without the formation of polymer.
Copyright © 2014, Asia Online Pte Ltd 
Translate 13 million historical patents from Japanese to English and also translate all new Japanese patents going forward. Follow this with the same task in many other languages. 
It would take a human translator 152,257 years to translate all existing Japanese patents into English and would cost US$ 40 billion.
Copyright © 2014, Asia Online Pte Ltd 
Quality requires an understanding of the data 
There is no exception to this rule
Copyright © 2014, Asia Online Pte Ltd 
•Structured XML 
–Header 
•Language 
•IPC 
•… 
–Sections 
•Title 
•Claim 
•Abstract 
•Description
Copyright © 2014, Asia Online Pte Ltd 
•Writing Style Changes 
–Between domains of knowledge 
–Between sections of the patent document 
•Multiple Classes Of Data 
–Formulas 
•Detection 
•Transformation 
•Protection 
–Reference Numbers 
•Breaks fluency of translation 
•Not part of the text, meta data 
–Numbers + Units 
–Dates 
–Patent Numbers
Copyright © 2014, Asia Online Pte Ltd 
•Content Formatting 
–Broken sentences 
–Wrong encoding 
–OCR 
•Different formats data 
–USPTO, EPO, WP and many others have their own formats 
–Changes in format in different offices 
•Quality of Learning Data 
–Spelling errors 
–Poor quality human translations 
–Words glued together 
–OCR 
•the data provided told us it wasn’t OCRed, but…
Copyright © 2014, Asia Online Pte Ltd 
•Gaps in Data 
–Many terms are not in the learning data 
•Tricks By Authors 
–Changing writing mechanism 
•i.e. Switch to Katakana with there is a perfectly good Kanji term 
•Bilingual Data 
–Matching patent documents between various patent office formats 
–Matching sentences 
–Removing poor quality translations 
–Fixing “broken data”
Copyright © 2014, Asia Online Pte Ltd 
•Sentence Length 
–The longest patent sentence we have seen so far is 4,500 words in a single sentence 
•Throughput Requirements 
–Front File 
•Translated and published within X hours of be published by Y patent office 
–Back File 
•All patents going back to X within 3 months 
–This is millions of documents
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•Unique Customization and Quality Improvement Plan 
•Clean Data Strategy 
•One Engine, Multiple Writing Styles 
–Writing Styles By 
•Content Domain 
•Document Section 
–Sentence by sentence domain switching 
•Hybrid – Rules + Syntax + Statistics 
•Multiple Translations 
–Only the best will do 
•Ongoing Improvement 
–Driven by Quality and Measurement
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
Data Cleaning 
Data Preparation 
Data Collections 
Training 
Diagnostics and Fine Tuning 
Original Translation Sources 
Translate 
Quality Assurance 
Language Pair Foundation Data 
Domain Foundation Data
Copyright © 2014, Asia Online Pte Ltd 
Language Pair Foundation 
Domain Foundation 
Client Data 
+ 
= 
Custom Engine 
Asia Online Foundation Data 
+ 
Sub-Domain Specific Data 
Manufactured Data
Copyright © 2014, Asia Online Pte Ltd 
•Definition 
–Domain 
–Target Audience 
–Preferred Writing Style 
–Glossaries, Non-Translatable Terms, Preferred Capitalization 
–Special Formatting Requirements 
–Quality Requirements 
•Data Gathering 
–Source data in domain 
–Bilingual data to support domain 
–Monolingual data to support domain 
•Data Analysis 
–Gap analysis 
–High frequency terms 
–Term extraction 
•Data Generation 
–Supporting grammar structures 
–Source Data Analysis 
•Cleaning of Data 
•Tuning and Test Set Preparation 
•Diagnostic Engine 
–Fine tuning 
Provided by client and gathered from third parties.
Copyright © 2014, Asia Online Pte Ltd 
•Data Preparation 
–Language ID 
–Encoding ID 
–Class Definition 
–Rule Definition 
–Writing Style Definition 
–Data Alignment 
–Data Cleaning & Repair 
–Gap Analysis 
–Word segmentation 
–De-compounding 
–Data Manufacturing 
–Spelling Correction 
–Domain detection 
–Syntax parsing 
–Reordering rules 
–Data structuring rules 
–Language Normalization 
–Term Normalization
Copyright © 2014, Asia Online Pte Ltd 
•Engine Training 
–5 major categories 
•Leverage IPC 
•Override option for user to bypass IPC logic 
–4 writing styles 
•Title, Claim, Abstract, Description 
–20 different sub-engines 
•5 categories x 4 styles 
–Tuning/testing data for each of the 20 sub-engines 
–Integration of 20 sub-engines into a single engine
Copyright © 2014, Asia Online Pte Ltd 
•Runtime Translation 
–Pre-Translation Corrections 
–Domain detection 
–Syntax parsing 
–Reordering rules 
–Data structuring rules 
–Statistical translation 
–Multi-candidate translations 
–Class extraction and processing
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•There is no magic in MT, human effort is required. 
•The quality of the output and suitability for purpose is directly in proportion to the amount of human effort. 
•Without human direction, MT will cost more in the long term and is more likely to fail.
Copyright © 2014, Asia Online Pte Ltd 
•Source 
–The entire body of data in the back file 
•Target 
–Every USPTO patent published from 1976 until current 
•Bilingual Data 
–USPTO, EPO, etc. matching documents
Copyright © 2014, Asia Online Pte Ltd 
•This is the actual format from one customer
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•Data 
–Gathered from as many sources as possible. 
–Domain of knowledge does not matter. 
–Data quality is not important. 
–Data quantity is important. 
•Theory 
–Good data will be more statistically relevant. 
•Data 
–Gathered from a small number of trusted quality sources. 
–Domain of knowledge must match target 
–Data quality is very important. 
–Data quantity is less important. 
•Theory 
–Bad or undesirable patterns cannot be learned if they don’t exist in the data. 
Dirty Data SMT Model 
Clean Data SMT Model
Copyright © 2014, Asia Online Pte Ltd 
English Source 
Human Translation 
Google Translation 
Google Context 
I went to the bank 
Fui al banco 
Fui al banco 
Bank as in finance 
I went to the bank to deposit money 
Fui al banco para depositar dinero 
Fui al banco a depositar el dinero 
Bank as in finance 
I went to the bank of the turn in my car 
Fui en coche a la inclinación de la vuelta 
Fui a la orilla de la vuelta en mi coche 
Bank as in river bank 
I put my car into the bank of the turn 
Puse mi coche en la inclinación de la vuelta. 
Pongo mi coche en el banco de la vuelta 
Bank as in finance 
I swam to the bank of the river 
Nadé en la orilla del río 
Nadé hasta la orilla del río 
Bank as in river bank 
I banked my money 
Deposité mi dinero 
Yo depositado mi dinero 
Banked as in finance 
I banked my car into the turn 
Incliné mi coche en la vuelta 
Yo depositado mi coche en la vuelta 
Banked as in finance 
I banked my plane into a steep dive 
Incliné mi avión en para una zambullida. 
Yo depositado en mi avión en picada 
Banked as in finance 
The above examples show that Google is biased towards the banking and finance domain 
Issue: 
There is much more multilingual banking and finance data available to learn from than there is aeronautical or water sports data available. 
Cause:
Copyright © 2014, Asia Online Pte Ltd 
Dirty Data SMT Baseline 
Language Studio™ Clean Data SMT Foundation 
Dirty Data SMT Baseline 
20% Required for Noticeable Improvement 
Client Data 
Initial Customization 
Improvement 
Improvement 
< 0.1% 
Language Studio™ Clean Data SMT Foundation 
Client Data 
Initial Customization 
Manufactured Data
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•Language Studio™ provides tools and processes for normalization of terminology 
•Benefits include cost reductions, faster deliverables, higher customer satisfaction and happier post editors
Copyright © 2014, Asia Online Pte Ltd 
Translation quality can be greatly improved by performing 3 similar but different cross references of data. 
All Source Data to be Translated 
Bilingual Data 
Monolingual Target Language Data 
Bilingual Data 
Bilingual Data 
Monolingual Target Language Data 
 Goal: 
Identify words in the source data to be translated that are not in the bilingual data. 
 Benefit: 
Ensures all words in the data to be translated are known and will be translated correctly. 
 Action: 
Human translate or locate word lists from industry sources and directories and add to bilingual data. 
 Goal: 
Identify words in the monolingual target language data that are not in the bilingual data. 
 Benefit: 
Ensures all words in the monolingual target language data are known, ensuring that data to be translated in future but not yet known will be translated better. 
 Action: 
Human translate or locate word lists from industry sources and directories and add to bilingual data. 
 Goal: 
Identify words in the bilingual data that are missing or low frequency in the monolingual target language data. 
 Benefit: 
Ensures that there is enough grammatical representation of the words, phrases and terminology in the monolingual target language data. This delivers greater fluency in translation output. 
 Action: 
Generate monolingual target language data using Language Studio™ Pro Crawl and Generate Tools and add to monolingual data. 
EN 
EN 
1 
2 
3
Copyright © 2014, Asia Online Pte Ltd 
Gruppenmasterdatenverarbeitungsvorrichtungssynchronisationsinformation 
Leistungswirkungsgradindexmarkierungsberechnungseinrichtung 
Schwenkmotorbetriebsdrehmomentbegrenzungswertberechnungsschritt 
Differenzialmechanismusumschaltbedingungsänderungseinrichtung 
Kraftstoffverbrauchsratenprioritätsmodusauswahlschalter 
Reproduktionsunmöglichkeitsgegenmaßnahmeneinrichtung 
Telefonbuchdatenübertragungsprotokollverbindungsabschnitts 
Leistungswirkungsgradindexmarkierungsberechnungseinrichtung 
Bezugspunktsolldrehungsgeschwindigkeitsfestlegungsabschnitt 
Höhenstandsaufnahmedifferenzdrucksondenresonanzverstimmung 
Maschinenrotationspumpenkapazitätsbefehlwandlungsabschnitt 
Brennkraftmaschinenausgangsdrehmomenterfassungseinrichtung 
Telefonbuchdatenübertragungsprotokollverbindungsabschnitt 
übermaßwankwinkelauftrittstendenzbeurteilungseinrichtung 
Unterstützungsdrehmomentbegrenzungswertberechnungsschritt 
Personenwahrscheinlichkeitsberechnungsverarbeitungsroutine 
Positionsaktualisierungsinformationsübertragungszeitpunkt 
Automatikgetriebehydraulikfluidtemperaturerfassungseinheit 
Leistungswirkungsgradindexmarkierungsberechungseinrichtung 
Octadecylaminodimethyltrimethoxysilylpropylammoniumchlorid 
Katalysatorverschlechterungsbeurteilungseinrichtung 
Kraftstoffverbrauchsprioritätsmodusauswahlschalter
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•Generic MT from Google, Bing, etc. offers unknown productivity gains and sometimes productivity loss due to lack of control. 
•Competitors offer < 20-40% productivity gains due to domain centric and “dirty data SMT” customization model. 
•Language Studio™ : 
–Targets of 150-300%+ productivity gains with granular sub-domain “clean data SMT” approach. 
–Provides complete control of writing style, terminology and is mapped to target audience reducing editing effort. 
Language Pair 
Top-Level Domain 
Engines/Sub-Domains 
EN-ES 
Automotive 
Honda 
Cars 
Motorbikes 
Toyota 
Marketing 
Service Reports 
User Manuals 
Engineering Service Manuals 
User Manuals 
Engineering Service Manuals 
Client 
Product 
Target Audience / Purpose 
Cars 
50%+ 
90%+ 
150-300%+ 
Customization Level: 
Typical Productivity Gain: 
Google/Bing Quality Level 
Typical Competitor Quality Level 
Generic 
???? 
Domain 
< 20-40%
Copyright © 2014, Asia Online Pte Ltd 
Translated text can be stylized based on the style of the Monolingual data. 
ES 
Millions of Sentence Pairs 
News paper article 
Business News 
The Economist 
New York Times 
Forbes 
Children’s Books 
Harry Potter 
Rupert the Bear 
Famous Five 
Bilingual Data 
Monolingual Data 
Text written in the style of business news 
EN 
Text written in the style of children’s books 
EN 
Possible Vocabulary 
Writing Style & Grammar
Copyright © 2014, Asia Online Pte Ltd 
Spanish Original Before Translation: 
Se necesitó una gran maniobra política muy prudente a fin de facilitar una cita de los dos enemigos históricos. 
Business News After Translation: 
Significant amounts of cautious political maneuvering were required in order to facilitate a rendezvous between the two bitter historical opponents. 
Children’s Books After Translation: 
A lot of care was taken to not upset others when organizing the meeting between the two long time enemies.
Copyright © 2014, Asia Online Pte Ltd 
•5 different main categories 
–Tests were performed on more granular categories, but they did not have much impact for the effort 
–Categories automatically detected using the IPC data 
•IPCs within various ranges are mapped into 1 of 5 categories 
•4 writing styles determined by the XML identifiers for the Title, Claims, Abstract and Description section. 
•Language Studio is configured to recognize a sentence header and change style for every sentence based on the header. 
•This permits 20 writing styles within a single engine. 
–Changes the use of bilingual and monolingual data as required per style
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
Pre-Processing Rules 
Hybrid Rules and SMT Engine Model 
Hybrid Rules and Corrective Statistical Engine Model 
• Sentence Segmentation 
• Word Segmentation 
• Phrase Reordering 
• Dates and Numbers 
• Patterns, Formulas etc. 
• Pre-Normalization 
• Spell Checking 
• Custom Runtime Glossary 
• Pre-Formatting 
• Capitalization 
• Post-Formatting 
• Grammar Checking 
• Post-Normalization 
• XML Tag Reinsertion 
• Currency Conversion 
• Cross Referencing 
• Other custom post processing 
This is more of a 
Band-Aid approach as the core MT is still a traditional Rules Based MT Engine 
Statistical Machine Translation 
Post-Processing Rules 
Statistical Correction of Rules Errors 
Translation Rules 
EN 
No 
Yes 
ES 
No 
Yes 
• Statistical Smoothing 
• “Automated Post Editing”
Copyright © 2014, Asia Online Pte Ltd 
•Problem 
–Reference numbers break translation fluency 
•Solution 
–Use JavaScript rules 
–Remove from translation recording its original position 
–Track the movement position of the word associated with the reference number and reinsert after translation 
However, malware on electronic device 103 must still make requests of resource 106 if it is to carry out malicious activities. 
Apartments are in very good condition, well equipped and furnished to a very good standard. 
los apartamentos están en |0-2,0, 0=0 0=1 1=2 2=3 | muy buenas condiciones |3-5,0, 0=0 1=1 2=2 | , |6-6,0, | bien equipados y amueblados |7-10,0, 0=0 1=1 2=2 3=3 | a un nivel muy bueno |11-15,0, 0=0 1=1 2=3 3=4 4=2 | . |16-16,0, |
Copyright © 2014, Asia Online Pte Ltd 
•Problem: 
–An infinite number or highly variable data element that statistics will not handle well 
•Solution 
–Use JavaScript rules 
–Associate the data element with the class and store data on a Session object 
–Substitute the data element with the class identifier 
–Translate with the class – all data of the class will be treated the same 
–After translation merge the data element back into the class using word tracking information 
The above-identified U.S. patent application Ser. No. 13/155,881, filed Jun. 8, 2011 provides further details of searching by image. The above-identified @PATENTNOPREFIX@ @PATENTNO@, filed @DATE@ provides further details of searching by image.
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
•Problem: 
–Sometimes it is not possible to predict the best approach to deliver the best quality 
•Solution: 
–Perform multiple approaches and score them 
•Language Studio supports multiple ordering and restructuring formats for a single segment of data. 
•Each can be evaluated independently using a number of scoring metrics and the best quality translation result returned 
–Scores for Segment Level Confidence, Language Model, Source Matching, TM Matching, Terminology Confidence
Copyright © 2014, Asia Online Pte Ltd
Copyright © 2014, Asia Online Pte Ltd 
4. Manage 
Manage translation projects while generating corrective data for quality improvement. 
2. Measure 
Measure the quality of the engine for rating and future improvement comparisons 
3. Improve 
Provide corrective feedback removing potential for translation errors. 
1. Customize 
Create a new custom engine using foundation data and your own language assets
Copyright © 2014, Asia Online Pte Ltd 
•Exception handling 
–Long sentences 
–Bad sentences 
–Bug bears 
•New Data 
–Integrate quickly as it is produced by various patent offices 
–Data produced regularly 
•Hire Specialists 
–People to work on data and rules that understand the engine and know how to refine it 
•Outsource Term Translation 
–Find a specialist that can translate terms from Gap Analysis
Copyright © 2014, Asia Online Pte Ltd 
•Coined by Laura Rossi from LexisNexis 
–A nasty or bad word that should never be in the translation output 
•Previous solution 
–Find in the phrase table data 
•Remove 
•Re-binarize 
–Find in the training data 
•Remove 
–Very time consuming 
•Language Studio Solution 
–Bad word list 
–Can be updated any time 
–Translation engine decoder will ignore any data that has a bad word in it
Copyright © 2014, Asia Online Pte Ltd 
•Training data can often have gaps in coverage and an excess of data in other areas. 
•Gaps in coverage reduce translation quality. 
•Gaps can quickly be filled via post editing the machine translated output and submitting the data back to the system for further learning. 
•Many gaps can be filled with monolingual data only. 
•Further gaps can be identified and resolved by analyzing the text that is to be translated for high frequency terms and unknown words 
•In some cases incorrect data may be statistically more relevant. Post editing will raise the relevance of the correct grammar. 
Sufficient Data Threshold 
Data Shortfall 
Post Edited Feedback and 
Generated Data to Fill Gaps 
Example of Training Data 
Data Volume 
More initial data provided for training results in greater vocabulary and grammatical coverage above the Sufficient Data Threshold and less post editing feedback required. 
Gaps in Topic Coverage
Copyright © 2014, Asia Online Pte Ltd 
•Document and Proximity Translations 
–All existing translation platforms translate at a sentence level only. 
–By leveraging information in the document or in near proximity to the current sentence, higher quality translations are possible. 
•Immediate Quality Updates 
–Updates to engine quality within 60 minutes of making edits. 
–Updates to engine quality by learning automatically from external sources. 
•Improved Slavic language support 
–Generation of inflected forms 
–Deeper grammatical and syntactical analysis
Copyright © 2014, Asia Online Pte Ltd 
High volume, High Quality Patent Translation across Multiple Domains 
Dion Wiggins 
Chief Executive Officer 
dion.wiggins@asiaonline.net

More Related Content

What's hot

II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
Dr. Haxel Consult
 
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
Dr. Haxel Consult
 
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
Dr. Haxel Consult
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
Dr. Haxel Consult
 
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
Dr. Haxel Consult
 
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
Dr. Haxel Consult
 

What's hot (20)

II-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoftII-PIC 2017: Porduct presentation minesoft
II-PIC 2017: Porduct presentation minesoft
 
II-PIC 2017: Patent Information User Group PIUG
II-PIC 2017: Patent Information User Group PIUGII-PIC 2017: Patent Information User Group PIUG
II-PIC 2017: Patent Information User Group PIUG
 
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...
 
II-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexisII-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexis
 
II-PIC 2017: To err is human – growing in experience as a patent information ...
II-PIC 2017: To err is human – growing in experience as a patent information ...II-PIC 2017: To err is human – growing in experience as a patent information ...
II-PIC 2017: To err is human – growing in experience as a patent information ...
 
Patent: Patent Searching / A Presentation at NALSAR Hyderabad - Nitin Nair
Patent: Patent Searching / A Presentation at NALSAR Hyderabad - Nitin NairPatent: Patent Searching / A Presentation at NALSAR Hyderabad - Nitin Nair
Patent: Patent Searching / A Presentation at NALSAR Hyderabad - Nitin Nair
 
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
ICIC 2014 Valuing IP in the Chemical Space – Science, Art and Special Conside...
 
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
ICIC 2017: The Use of Patent Information for Innovation and Competitive Intel...
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
II-PIC 2017: Navigating through the Biotech patent information in India
II-PIC 2017: Navigating through the Biotech patent information in IndiaII-PIC 2017: Navigating through the Biotech patent information in India
II-PIC 2017: Navigating through the Biotech patent information in India
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
ICIC 2014 Future Role of Information Professionals and Providers: Certificat...
ICIC 2014  Future Role of Information Professionals and Providers: Certificat...ICIC 2014  Future Role of Information Professionals and Providers: Certificat...
ICIC 2014 Future Role of Information Professionals and Providers: Certificat...
 
Recent Trends in the Use and Market for Patent Information in the United States
Recent Trends in the Use and Market for Patent Information in the United StatesRecent Trends in the Use and Market for Patent Information in the United States
Recent Trends in the Use and Market for Patent Information in the United States
 
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study    Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
 
II-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IPII-PIC 2017: Product presentation Lighthouse IP
II-PIC 2017: Product presentation Lighthouse IP
 
II-PIC 2017: Product Presentation Gridlogics
II-PIC 2017: Product Presentation GridlogicsII-PIC 2017: Product Presentation Gridlogics
II-PIC 2017: Product Presentation Gridlogics
 
II-PIC 2017: Optimizing R&D strategy through organized patent database
II-PIC 2017: Optimizing R&D strategy through organized patent databaseII-PIC 2017: Optimizing R&D strategy through organized patent database
II-PIC 2017: Optimizing R&D strategy through organized patent database
 
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
II-SDV 2014 The Challenges of Managing “Big Data” in the Patent Field: Patent...
 
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
II-SDV 2014 Patent Valuation: Building the tools to extract and unveil intell...
 

Viewers also liked

User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
ABBYY Language Serivces
 
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS - The Language Data Network
 

Viewers also liked (10)

machine translation beginning...
machine translation beginning...machine translation beginning...
machine translation beginning...
 
User Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia OnlineUser Empowered Machine Translation. Dion Wiggins, Asia Online
User Empowered Machine Translation. Dion Wiggins, Asia Online
 
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
ICIC 2013 Conference Proceedings Richard Garner (LexisNexis)
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine Translation
 
Why MT Matters
Why MT MattersWhy MT Matters
Why MT Matters
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
 
TAUS Scotland Asia Online Technology Platform V1
TAUS Scotland  Asia Online Technology Platform   V1TAUS Scotland  Asia Online Technology Platform   V1
TAUS Scotland Asia Online Technology Platform V1
 
Machine translation
Machine translationMachine translation
Machine translation
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 

Similar to ICIC 2014 High volume, High Quality Patent Translation across Multiple Domains

Aeren -Company Collateral - 2015
Aeren -Company Collateral - 2015Aeren -Company Collateral - 2015
Aeren -Company Collateral - 2015
Aeren IP
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kits
David Sommer
 

Similar to ICIC 2014 High volume, High Quality Patent Translation across Multiple Domains (20)

Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32
 
Aeren -Company Collateral - 2015
Aeren -Company Collateral - 2015Aeren -Company Collateral - 2015
Aeren -Company Collateral - 2015
 
On-Premise Roadmap and Cloud Touchpoints
On-Premise Roadmap and Cloud TouchpointsOn-Premise Roadmap and Cloud Touchpoints
On-Premise Roadmap and Cloud Touchpoints
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kits
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
Big Data Expo 2015 - HP Information Management & Governance
Big Data Expo 2015 - HP Information Management & GovernanceBig Data Expo 2015 - HP Information Management & Governance
Big Data Expo 2015 - HP Information Management & Governance
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Data Centre Strategy Summit 2015 "Are you ready to embark on your Data Cent...
Data Centre Strategy Summit 2015   "Are you ready to embark on your Data Cent...Data Centre Strategy Summit 2015   "Are you ready to embark on your Data Cent...
Data Centre Strategy Summit 2015 "Are you ready to embark on your Data Cent...
 
Tegrity Captioning: Strategies for Deploying Accessible Lecture Capture Video
Tegrity Captioning: Strategies for Deploying Accessible Lecture Capture VideoTegrity Captioning: Strategies for Deploying Accessible Lecture Capture Video
Tegrity Captioning: Strategies for Deploying Accessible Lecture Capture Video
 
Leveraging Packaged Analytics when Implementing your ERP
Leveraging Packaged Analytics when Implementing your ERPLeveraging Packaged Analytics when Implementing your ERP
Leveraging Packaged Analytics when Implementing your ERP
 
Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014
 
Superfast Business - The Connected Business
Superfast Business - The Connected BusinessSuperfast Business - The Connected Business
Superfast Business - The Connected Business
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015
 
Real-World RESTful Service Development Problems and Solutions
Real-World RESTful Service Development Problems and SolutionsReal-World RESTful Service Development Problems and Solutions
Real-World RESTful Service Development Problems and Solutions
 
Lyricnew ppt
Lyricnew pptLyricnew ppt
Lyricnew ppt
 
Webinar: Is Your Storage Ready for Disaster?
Webinar: Is Your Storage Ready for Disaster?Webinar: Is Your Storage Ready for Disaster?
Webinar: Is Your Storage Ready for Disaster?
 
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data ChallengesWebinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
Webinar: Preserve, Distribute and Deliver - M&E's Three Biggest Data Challenges
 
Lyricnew ppt
Lyricnew pptLyricnew ppt
Lyricnew ppt
 

More from Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
allensay1
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
ZurliaSoop
 

Recently uploaded (20)

Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 

ICIC 2014 High volume, High Quality Patent Translation across Multiple Domains

  • 1. Copyright © 2014, Asia Online Pte Ltd High volume, High Quality Patent Translation across Multiple Domains Dion Wiggins Chief Executive Officer dion.wiggins@asiaonline.net
  • 2. Copyright © 2014, Asia Online Pte Ltd •Language Studio™ is a language processing platform, not just a translation tool •We currently support 534 language pairs •Our very first customer was LexisNexis Univentio in 2008 –Our first commercial engine was translating Japanese patents into English •Not all customers are in the patent space, but patents are the most complex content that we have ever encountered
  • 3. Copyright © 2014, Asia Online Pte Ltd •Collectively our customers are translating more than 2 billion words per day •One single customer is translating more than 1 billion words a day of patent content •Our highest rate of throughput required by a customer (government) to date is 600 million words per minute –Yes, we can support this volume if you can provide the hardware – approx. 25K CPU cores –Currently being designed and architected ahead of deployment
  • 4. Copyright © 2014, Asia Online Pte Ltd •Equivalent of 20 million four drawer filing cabinets filled with text. •The volume of data is expected to increase by 20 times by 2020.
  • 5. Copyright © 2014, Asia Online Pte Ltd •Equivalent of 20 million four drawer filing cabinets filled with text. •The volume of data is expected to increase by 20 times by 2020.
  • 6. Copyright © 2014, Asia Online Pte Ltd A method of distilling a polymerizable vinyl compound selected from the group consisting of acrolein, methacrolein, acrylic acid, methacrylec acid, hydroxyethyl acrylate, hydroxyethyl methacrylate, hydroxypropyl acrylate, hydroxypropyl methacrylate, glycidyl acrylate and glycidyl methacrylate, the method comprising distilling the polymerizable vinyl compound in the presence of a polymerization inhibitor using a distillation tower having perforated trays without downcomers and wherein the temperature of the inner wall of the tower is maintained at a temperature sufficient to prevent the condensation of the vapor being distilled, whereby the polymerizable vinyl compound is distilled without the formation of polymer.
  • 7. Copyright © 2014, Asia Online Pte Ltd Translate 13 million historical patents from Japanese to English and also translate all new Japanese patents going forward. Follow this with the same task in many other languages. It would take a human translator 152,257 years to translate all existing Japanese patents into English and would cost US$ 40 billion.
  • 8. Copyright © 2014, Asia Online Pte Ltd Quality requires an understanding of the data There is no exception to this rule
  • 9. Copyright © 2014, Asia Online Pte Ltd •Structured XML –Header •Language •IPC •… –Sections •Title •Claim •Abstract •Description
  • 10. Copyright © 2014, Asia Online Pte Ltd •Writing Style Changes –Between domains of knowledge –Between sections of the patent document •Multiple Classes Of Data –Formulas •Detection •Transformation •Protection –Reference Numbers •Breaks fluency of translation •Not part of the text, meta data –Numbers + Units –Dates –Patent Numbers
  • 11. Copyright © 2014, Asia Online Pte Ltd •Content Formatting –Broken sentences –Wrong encoding –OCR •Different formats data –USPTO, EPO, WP and many others have their own formats –Changes in format in different offices •Quality of Learning Data –Spelling errors –Poor quality human translations –Words glued together –OCR •the data provided told us it wasn’t OCRed, but…
  • 12. Copyright © 2014, Asia Online Pte Ltd •Gaps in Data –Many terms are not in the learning data •Tricks By Authors –Changing writing mechanism •i.e. Switch to Katakana with there is a perfectly good Kanji term •Bilingual Data –Matching patent documents between various patent office formats –Matching sentences –Removing poor quality translations –Fixing “broken data”
  • 13. Copyright © 2014, Asia Online Pte Ltd •Sentence Length –The longest patent sentence we have seen so far is 4,500 words in a single sentence •Throughput Requirements –Front File •Translated and published within X hours of be published by Y patent office –Back File •All patents going back to X within 3 months –This is millions of documents
  • 14. Copyright © 2014, Asia Online Pte Ltd
  • 15. Copyright © 2014, Asia Online Pte Ltd •Unique Customization and Quality Improvement Plan •Clean Data Strategy •One Engine, Multiple Writing Styles –Writing Styles By •Content Domain •Document Section –Sentence by sentence domain switching •Hybrid – Rules + Syntax + Statistics •Multiple Translations –Only the best will do •Ongoing Improvement –Driven by Quality and Measurement
  • 16. Copyright © 2014, Asia Online Pte Ltd
  • 17. Copyright © 2014, Asia Online Pte Ltd Data Cleaning Data Preparation Data Collections Training Diagnostics and Fine Tuning Original Translation Sources Translate Quality Assurance Language Pair Foundation Data Domain Foundation Data
  • 18. Copyright © 2014, Asia Online Pte Ltd Language Pair Foundation Domain Foundation Client Data + = Custom Engine Asia Online Foundation Data + Sub-Domain Specific Data Manufactured Data
  • 19. Copyright © 2014, Asia Online Pte Ltd •Definition –Domain –Target Audience –Preferred Writing Style –Glossaries, Non-Translatable Terms, Preferred Capitalization –Special Formatting Requirements –Quality Requirements •Data Gathering –Source data in domain –Bilingual data to support domain –Monolingual data to support domain •Data Analysis –Gap analysis –High frequency terms –Term extraction •Data Generation –Supporting grammar structures –Source Data Analysis •Cleaning of Data •Tuning and Test Set Preparation •Diagnostic Engine –Fine tuning Provided by client and gathered from third parties.
  • 20. Copyright © 2014, Asia Online Pte Ltd •Data Preparation –Language ID –Encoding ID –Class Definition –Rule Definition –Writing Style Definition –Data Alignment –Data Cleaning & Repair –Gap Analysis –Word segmentation –De-compounding –Data Manufacturing –Spelling Correction –Domain detection –Syntax parsing –Reordering rules –Data structuring rules –Language Normalization –Term Normalization
  • 21. Copyright © 2014, Asia Online Pte Ltd •Engine Training –5 major categories •Leverage IPC •Override option for user to bypass IPC logic –4 writing styles •Title, Claim, Abstract, Description –20 different sub-engines •5 categories x 4 styles –Tuning/testing data for each of the 20 sub-engines –Integration of 20 sub-engines into a single engine
  • 22. Copyright © 2014, Asia Online Pte Ltd •Runtime Translation –Pre-Translation Corrections –Domain detection –Syntax parsing –Reordering rules –Data structuring rules –Statistical translation –Multi-candidate translations –Class extraction and processing
  • 23. Copyright © 2014, Asia Online Pte Ltd
  • 24. Copyright © 2014, Asia Online Pte Ltd •There is no magic in MT, human effort is required. •The quality of the output and suitability for purpose is directly in proportion to the amount of human effort. •Without human direction, MT will cost more in the long term and is more likely to fail.
  • 25. Copyright © 2014, Asia Online Pte Ltd •Source –The entire body of data in the back file •Target –Every USPTO patent published from 1976 until current •Bilingual Data –USPTO, EPO, etc. matching documents
  • 26. Copyright © 2014, Asia Online Pte Ltd •This is the actual format from one customer
  • 27. Copyright © 2014, Asia Online Pte Ltd
  • 28. Copyright © 2014, Asia Online Pte Ltd
  • 29. Copyright © 2014, Asia Online Pte Ltd •Data –Gathered from as many sources as possible. –Domain of knowledge does not matter. –Data quality is not important. –Data quantity is important. •Theory –Good data will be more statistically relevant. •Data –Gathered from a small number of trusted quality sources. –Domain of knowledge must match target –Data quality is very important. –Data quantity is less important. •Theory –Bad or undesirable patterns cannot be learned if they don’t exist in the data. Dirty Data SMT Model Clean Data SMT Model
  • 30. Copyright © 2014, Asia Online Pte Ltd English Source Human Translation Google Translation Google Context I went to the bank Fui al banco Fui al banco Bank as in finance I went to the bank to deposit money Fui al banco para depositar dinero Fui al banco a depositar el dinero Bank as in finance I went to the bank of the turn in my car Fui en coche a la inclinación de la vuelta Fui a la orilla de la vuelta en mi coche Bank as in river bank I put my car into the bank of the turn Puse mi coche en la inclinación de la vuelta. Pongo mi coche en el banco de la vuelta Bank as in finance I swam to the bank of the river Nadé en la orilla del río Nadé hasta la orilla del río Bank as in river bank I banked my money Deposité mi dinero Yo depositado mi dinero Banked as in finance I banked my car into the turn Incliné mi coche en la vuelta Yo depositado mi coche en la vuelta Banked as in finance I banked my plane into a steep dive Incliné mi avión en para una zambullida. Yo depositado en mi avión en picada Banked as in finance The above examples show that Google is biased towards the banking and finance domain Issue: There is much more multilingual banking and finance data available to learn from than there is aeronautical or water sports data available. Cause:
  • 31. Copyright © 2014, Asia Online Pte Ltd Dirty Data SMT Baseline Language Studio™ Clean Data SMT Foundation Dirty Data SMT Baseline 20% Required for Noticeable Improvement Client Data Initial Customization Improvement Improvement < 0.1% Language Studio™ Clean Data SMT Foundation Client Data Initial Customization Manufactured Data
  • 32. Copyright © 2014, Asia Online Pte Ltd
  • 33. Copyright © 2014, Asia Online Pte Ltd •Language Studio™ provides tools and processes for normalization of terminology •Benefits include cost reductions, faster deliverables, higher customer satisfaction and happier post editors
  • 34. Copyright © 2014, Asia Online Pte Ltd Translation quality can be greatly improved by performing 3 similar but different cross references of data. All Source Data to be Translated Bilingual Data Monolingual Target Language Data Bilingual Data Bilingual Data Monolingual Target Language Data  Goal: Identify words in the source data to be translated that are not in the bilingual data.  Benefit: Ensures all words in the data to be translated are known and will be translated correctly.  Action: Human translate or locate word lists from industry sources and directories and add to bilingual data.  Goal: Identify words in the monolingual target language data that are not in the bilingual data.  Benefit: Ensures all words in the monolingual target language data are known, ensuring that data to be translated in future but not yet known will be translated better.  Action: Human translate or locate word lists from industry sources and directories and add to bilingual data.  Goal: Identify words in the bilingual data that are missing or low frequency in the monolingual target language data.  Benefit: Ensures that there is enough grammatical representation of the words, phrases and terminology in the monolingual target language data. This delivers greater fluency in translation output.  Action: Generate monolingual target language data using Language Studio™ Pro Crawl and Generate Tools and add to monolingual data. EN EN 1 2 3
  • 35. Copyright © 2014, Asia Online Pte Ltd Gruppenmasterdatenverarbeitungsvorrichtungssynchronisationsinformation Leistungswirkungsgradindexmarkierungsberechnungseinrichtung Schwenkmotorbetriebsdrehmomentbegrenzungswertberechnungsschritt Differenzialmechanismusumschaltbedingungsänderungseinrichtung Kraftstoffverbrauchsratenprioritätsmodusauswahlschalter Reproduktionsunmöglichkeitsgegenmaßnahmeneinrichtung Telefonbuchdatenübertragungsprotokollverbindungsabschnitts Leistungswirkungsgradindexmarkierungsberechnungseinrichtung Bezugspunktsolldrehungsgeschwindigkeitsfestlegungsabschnitt Höhenstandsaufnahmedifferenzdrucksondenresonanzverstimmung Maschinenrotationspumpenkapazitätsbefehlwandlungsabschnitt Brennkraftmaschinenausgangsdrehmomenterfassungseinrichtung Telefonbuchdatenübertragungsprotokollverbindungsabschnitt übermaßwankwinkelauftrittstendenzbeurteilungseinrichtung Unterstützungsdrehmomentbegrenzungswertberechnungsschritt Personenwahrscheinlichkeitsberechnungsverarbeitungsroutine Positionsaktualisierungsinformationsübertragungszeitpunkt Automatikgetriebehydraulikfluidtemperaturerfassungseinheit Leistungswirkungsgradindexmarkierungsberechungseinrichtung Octadecylaminodimethyltrimethoxysilylpropylammoniumchlorid Katalysatorverschlechterungsbeurteilungseinrichtung Kraftstoffverbrauchsprioritätsmodusauswahlschalter
  • 36. Copyright © 2014, Asia Online Pte Ltd
  • 37. Copyright © 2014, Asia Online Pte Ltd •Generic MT from Google, Bing, etc. offers unknown productivity gains and sometimes productivity loss due to lack of control. •Competitors offer < 20-40% productivity gains due to domain centric and “dirty data SMT” customization model. •Language Studio™ : –Targets of 150-300%+ productivity gains with granular sub-domain “clean data SMT” approach. –Provides complete control of writing style, terminology and is mapped to target audience reducing editing effort. Language Pair Top-Level Domain Engines/Sub-Domains EN-ES Automotive Honda Cars Motorbikes Toyota Marketing Service Reports User Manuals Engineering Service Manuals User Manuals Engineering Service Manuals Client Product Target Audience / Purpose Cars 50%+ 90%+ 150-300%+ Customization Level: Typical Productivity Gain: Google/Bing Quality Level Typical Competitor Quality Level Generic ???? Domain < 20-40%
  • 38. Copyright © 2014, Asia Online Pte Ltd Translated text can be stylized based on the style of the Monolingual data. ES Millions of Sentence Pairs News paper article Business News The Economist New York Times Forbes Children’s Books Harry Potter Rupert the Bear Famous Five Bilingual Data Monolingual Data Text written in the style of business news EN Text written in the style of children’s books EN Possible Vocabulary Writing Style & Grammar
  • 39. Copyright © 2014, Asia Online Pte Ltd Spanish Original Before Translation: Se necesitó una gran maniobra política muy prudente a fin de facilitar una cita de los dos enemigos históricos. Business News After Translation: Significant amounts of cautious political maneuvering were required in order to facilitate a rendezvous between the two bitter historical opponents. Children’s Books After Translation: A lot of care was taken to not upset others when organizing the meeting between the two long time enemies.
  • 40. Copyright © 2014, Asia Online Pte Ltd •5 different main categories –Tests were performed on more granular categories, but they did not have much impact for the effort –Categories automatically detected using the IPC data •IPCs within various ranges are mapped into 1 of 5 categories •4 writing styles determined by the XML identifiers for the Title, Claims, Abstract and Description section. •Language Studio is configured to recognize a sentence header and change style for every sentence based on the header. •This permits 20 writing styles within a single engine. –Changes the use of bilingual and monolingual data as required per style
  • 41. Copyright © 2014, Asia Online Pte Ltd
  • 42. Copyright © 2014, Asia Online Pte Ltd Pre-Processing Rules Hybrid Rules and SMT Engine Model Hybrid Rules and Corrective Statistical Engine Model • Sentence Segmentation • Word Segmentation • Phrase Reordering • Dates and Numbers • Patterns, Formulas etc. • Pre-Normalization • Spell Checking • Custom Runtime Glossary • Pre-Formatting • Capitalization • Post-Formatting • Grammar Checking • Post-Normalization • XML Tag Reinsertion • Currency Conversion • Cross Referencing • Other custom post processing This is more of a Band-Aid approach as the core MT is still a traditional Rules Based MT Engine Statistical Machine Translation Post-Processing Rules Statistical Correction of Rules Errors Translation Rules EN No Yes ES No Yes • Statistical Smoothing • “Automated Post Editing”
  • 43. Copyright © 2014, Asia Online Pte Ltd •Problem –Reference numbers break translation fluency •Solution –Use JavaScript rules –Remove from translation recording its original position –Track the movement position of the word associated with the reference number and reinsert after translation However, malware on electronic device 103 must still make requests of resource 106 if it is to carry out malicious activities. Apartments are in very good condition, well equipped and furnished to a very good standard. los apartamentos están en |0-2,0, 0=0 0=1 1=2 2=3 | muy buenas condiciones |3-5,0, 0=0 1=1 2=2 | , |6-6,0, | bien equipados y amueblados |7-10,0, 0=0 1=1 2=2 3=3 | a un nivel muy bueno |11-15,0, 0=0 1=1 2=3 3=4 4=2 | . |16-16,0, |
  • 44. Copyright © 2014, Asia Online Pte Ltd •Problem: –An infinite number or highly variable data element that statistics will not handle well •Solution –Use JavaScript rules –Associate the data element with the class and store data on a Session object –Substitute the data element with the class identifier –Translate with the class – all data of the class will be treated the same –After translation merge the data element back into the class using word tracking information The above-identified U.S. patent application Ser. No. 13/155,881, filed Jun. 8, 2011 provides further details of searching by image. The above-identified @PATENTNOPREFIX@ @PATENTNO@, filed @DATE@ provides further details of searching by image.
  • 45. Copyright © 2014, Asia Online Pte Ltd
  • 46. Copyright © 2014, Asia Online Pte Ltd
  • 47. Copyright © 2014, Asia Online Pte Ltd
  • 48. Copyright © 2014, Asia Online Pte Ltd
  • 49. Copyright © 2014, Asia Online Pte Ltd •Problem: –Sometimes it is not possible to predict the best approach to deliver the best quality •Solution: –Perform multiple approaches and score them •Language Studio supports multiple ordering and restructuring formats for a single segment of data. •Each can be evaluated independently using a number of scoring metrics and the best quality translation result returned –Scores for Segment Level Confidence, Language Model, Source Matching, TM Matching, Terminology Confidence
  • 50. Copyright © 2014, Asia Online Pte Ltd
  • 51. Copyright © 2014, Asia Online Pte Ltd 4. Manage Manage translation projects while generating corrective data for quality improvement. 2. Measure Measure the quality of the engine for rating and future improvement comparisons 3. Improve Provide corrective feedback removing potential for translation errors. 1. Customize Create a new custom engine using foundation data and your own language assets
  • 52. Copyright © 2014, Asia Online Pte Ltd •Exception handling –Long sentences –Bad sentences –Bug bears •New Data –Integrate quickly as it is produced by various patent offices –Data produced regularly •Hire Specialists –People to work on data and rules that understand the engine and know how to refine it •Outsource Term Translation –Find a specialist that can translate terms from Gap Analysis
  • 53. Copyright © 2014, Asia Online Pte Ltd •Coined by Laura Rossi from LexisNexis –A nasty or bad word that should never be in the translation output •Previous solution –Find in the phrase table data •Remove •Re-binarize –Find in the training data •Remove –Very time consuming •Language Studio Solution –Bad word list –Can be updated any time –Translation engine decoder will ignore any data that has a bad word in it
  • 54. Copyright © 2014, Asia Online Pte Ltd •Training data can often have gaps in coverage and an excess of data in other areas. •Gaps in coverage reduce translation quality. •Gaps can quickly be filled via post editing the machine translated output and submitting the data back to the system for further learning. •Many gaps can be filled with monolingual data only. •Further gaps can be identified and resolved by analyzing the text that is to be translated for high frequency terms and unknown words •In some cases incorrect data may be statistically more relevant. Post editing will raise the relevance of the correct grammar. Sufficient Data Threshold Data Shortfall Post Edited Feedback and Generated Data to Fill Gaps Example of Training Data Data Volume More initial data provided for training results in greater vocabulary and grammatical coverage above the Sufficient Data Threshold and less post editing feedback required. Gaps in Topic Coverage
  • 55. Copyright © 2014, Asia Online Pte Ltd •Document and Proximity Translations –All existing translation platforms translate at a sentence level only. –By leveraging information in the document or in near proximity to the current sentence, higher quality translations are possible. •Immediate Quality Updates –Updates to engine quality within 60 minutes of making edits. –Updates to engine quality by learning automatically from external sources. •Improved Slavic language support –Generation of inflected forms –Deeper grammatical and syntactical analysis
  • 56. Copyright © 2014, Asia Online Pte Ltd High volume, High Quality Patent Translation across Multiple Domains Dion Wiggins Chief Executive Officer dion.wiggins@asiaonline.net