Delivered at Machine Translation Summit during a special workshop on MT for patent and scientific literature.
October 30th 2015
Miami, Florida.
In this talk, we describe how we adapted machine translation for patents to help a translation company improve their productivity.
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽❤️🧑🏻 89...
Improving Translator Productivity with MT: A Patent Translation Case Study
1. Improving Translator Productivity with MT
a patent translation case study
John Tinsley
CEO and Co-founder
PSLT @ MT Summit. Miami. 30th October 2015
2. We provide Machine Translation
solutions with Subject Matter Expertise
MT solutions and services provider, specializing in
providing customised solutions with subject
matter expertise for specific technical sectors,
such as Patents/IP, life sciences, and financial.
5. Chinese pre-ordering
rules
Statistical
Post-editing
Input
Output
Training Data
Spanish med-device
entity recognizer
Multi-output
Combination
Korean pharma
tokenizer
Patent input
classifier
Client TM/terminology (optional)
Japanese script
normalisation
German
Compounding rules
Moses
RBMT
Moses
Moses
Domain Adaptation and Data Selection
• MML with Vocabulary Saturation
Filtering (VSF)
• Language and translation model
interpolation (linear/log linear)
• Terminology extraction using IR
Hybrid is a misnomer
• Statistical MT
• Syntax-based methods
• Grammar rules
• Example-based templates
On-the-fly system combinationHierarchical models Translation Memory Integration
Syntactic pre/post-ordering Template-driven translation
Combining linguistics, statistics, and MT expertise
The Ensemble ArchitectureTM
6. The Challenge of Patents
L is an organic group selected from -CH2-
(OCH2CH2)n-, -CO-NR'-, with R'=H or
C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2>
and a maximum elongation of 700 to
1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
7. The Challenge of Patents
Very
long
sentences
as
standard
Gramma1cally
incomplete
using
nominal
and
telegraphic
style
(!)
Passive
forms
are
frequent
Frequent
use
of
subordinate
clauses,
par1ciples,
implicit
constructs
Inconsistent
and
incorrect
spelling
High
use
of
neologisms
Instances
of
synonymy
and
polysemy
Spurious
use
of
punctua1on
Authoring guide
for “to be
translated” text
Patents break
almost all of the
rules!
9. MT for Information Purposes
MT Application Areas
MT for Post-editing Productivity
• Development focuses on improving key information translation
• Terminology is important
• Evaluation driven by “usability”
• Development focuses on reducing edits required
• Feedback loop is crucial
• Evaluation through practical translation tasks
10. Lots of different ways to do evaluation
– automatic scores
• BLEU, METEOR, GTM, TER
– fluency, adequacy, comparative ranking
– task-based evaluation
• error analysis, post-edit productivity
Different metrics, different intelligence
– what does each type of metric tell us?
– which ones are usable at which stage of evaluation?
e.g. can we really use automatic scores to assess productivity?
e.g. does productivity delta really tell us how good the output is?
MT Evaluation – where do we start!?
11. Problem
Large Chinese to English patent translation project. Challenging
content and language
Question
What if any efficiencies can machine translation add to the workflow of
RWS translators?
How we applied different types of MT evaluation and different stages
in the process, at various go/no stages, to help RWS to assess whether
MT is viable for this project
Client Case Study – RWS
- UK headquartered public company
- Founded 1958
- 9th largest LSP (CSA 2013 report)
- Leader in specialist IP translations
12. Can we improve our baseline engines through customisation?
Step 1: Baseline and Customisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
BLEU TER
Iconic Baseline
Iconic Customised
What next?
How good is the output relative to the task, i.e. post-editing?
- fluency/adequacy not going to tell us
- let’s start with segment level TER
- Huge improvement
- Intuitively, scores
reflect well but don’t
really say anything
- Let’s dig deeper
13. Translation Edit Rate: correlates well with practical evaluations
If we look deeper, what can we learn?
INTELLIGENCE
• Proportion of full matches (i.e. big savings)
• Proportion of close matches (i.e. faster that fuzzy matches)
• Proportion of poor matches
ACTIONABLE INFORMATION
• Type of sentence with high/low matches
• Weaknesses and gaps
• Segments to compare and analyse in translation memory
14. TERscore
Step 2: Segment-level automatic analysis
Distribution of segment-level TER scores
This represents a 24% potential
productivity gain
segment length
15. With MT experience and previous MT integration, productivity
testing can be run in the production environment. In this case we
used, the TAUS Dynamic Quality Framework
Step 3: Productivity testing
Productivity Test
17. With MT experience and previous MT integration, productivity
testing can be run in the production environment. In this case we
used, the TAUS Dynamic Quality Framework
Beware the variables!
• Translators: different experience, speed, perceptions of MT
– 24 translators: senior, staff, and interns
• Test sets: not representative; particularly difficult
– 2 tests sets, comprising 5 documents, and cross-fold validation
• Environment and task: inexperience and unfamiliarity
– Training materials, videos, and “dummy” segments
Step 3: Productivity testing
18. Overall average
Findings and Learnings
25% productivity gain
Experienced: 22%
Staff: 23%
Interns: 30%
Test set 1.1: 25%
Test set 1.2: 35%
Test set 2.1: 06%
Test set 2.2: 35%
Correlates with TER
Rollout with junior staff
for more immediate
impact on bottom line?
Don’t be over concerned
by outliers.
Use data to facilitate
source content profiling?
What it tells us
By Translator Profile
By Test Set
19. Look our for anomalies
– segments with long timings (above average ratio words/minute)
– sentences that don’t change much from MT to post-edit
– segments with unusually short timings
In this case, the next step is production roll-out to validate these
in the actual translator workflow over an extended period.
Warnings, Tips, and Next Steps
Now would be the right time to do fluency/adequacy if you need to
verify that post-editing is producing, at least, similar quality output
The idea here is that, as with any translation job, you want someone with expertise in the area. Same goes for MT. I’ll talk about how this affects how we TRAIN AND DEVELOP engines and also how we EVALUATE engines, and I’ll wrap up with a relatively big case study on how we helped one particular LSP bring on-board MT to improve translation post-editing productivity.
We are the MT partner of choice for some of the world’s largest translation companies, information providers, and government and enterprise organisations.
For Translation Companies: We help translation companies to translate more content, more accurately for faster project turnaround, resulting in significant cost savings and increased revenue.
For Enterprise Clients: We help enterprises to translate more content in less time, resulting in faster products to market and enhanced global reach.
For Information Providers: We help information providers to translate knowledge, literature and documentary information faster and more accurately, resulting in broader knowledge offerings and faster time to market.
Just to give you some background on the origins of the company…
Pluto project was an EU FP7 project than begun in 2010 with the goal of adapting existing MT technology for the area of patent translation (high demand – europe lots of languages)
Went very well and led to the development of a service called IPTranslator.com was aimed at facilitating patent searchers through multiplingual MT that was easily integratable across patent search tools like Espacenet, Patentscope, Patbase, Questel and others…
After the project, given the technology in place, a company called Iconic was formed which expanded the techinqies and technolog developed in IPTranslator to be adapted to other sectors and for other markets such as informaiton providers, enterprise companies, and LSPs.
While IPTranslator.com still exists today (albeit in a modified format) the web service itself is not a core business but rather building on it to provide highlight customer adapted systems is the main focus of Iconic. Though in doing this, a large portion of our business is in the area of patent translation.
Ok, so just as an introduction to the topic of DOMAIN ADAPTATION from our perspective, how do Iconic translation machines work?
Existing vendors or MT providers use the follow process – if a client wants a machine translation system for a certain domain, say IT, they provider the vendor with training data and this gets churned through the various generic processes for each language required. The idea is that by pumping in data in the IT domain that an IT machine translation system comes out at the end. It’s true to a certain extent but the reality is that the quality often doesn’t cut the mustard. The problem with the data engineering approach is that you need A LOT of data and many clients simply don’t have it.
As a consequence, there’s a complete reliance on the data. If the MT output doesn’t meet the end-user requirements, the only solution is to say “we need more training data”
This shortcoming really comes to the fore when we’re dealing with complex languages and content types, like patents…
ON-THE-FLY MODEL SELECTION. CLASSIFIER BASED. USER SELECTION (INTERFACE/REQUEST)
QUALITY vs CONVENIENCE in Commission Scenario…
NOT GOING TO GIVE AWAY TOO MANY TRADE SECRETS
Customised domain-specific MT
Grew from patent translations, expanding into technology that can be applied across technical areas
Mixture of statistical, rule-based, syntactic.
Ensemble architecture
Domain adaptation and data selection
MML with Vocabulary Saturation Filtering (VSF)
language/translation model interpolation (linear/log linear)
IR based term extraction (ask Hala)
Hybrid, what is hybrid? Misnomer
SMT + rules. Rule based + APE? Where does syntax fit in?
Our hybrid architecture uses what’s most appropriate for a particlar language, domain AND style combination
Specificially
on the fly system combination
Hierarchical models
Templates driven, TM integrated
Syntactic pre/post-ordering
And, given the current environment, a little look as to why this is particularly required for patents…
Sometimes it’s hard to tell whether the translation is bad or that’s simply how the original patent was written!
Using a certain set of configurations in the ensemble gives us IPTranslator, which is our suite of MT engines that have been specifically adapted for patents.
These serve as “ready to go” tool or as a basis for customisation…
So let’s look at how IPTranslator, and other types of MT are being applied in the industry…
There are a number of different ways in which MT is use in general.
Most commonly for our solutions, we’re looking a 2 main use cases: MT for info and MT for post-editing.
Examples of the things we offer in MT FOR INFO include development to specific evaluation criteria and how we want to fit it into a particular end-user scenario – e.g. EDISCOVERY, MT FOR SEARCH, WEB STORE INTEGRATION
And, heading into our case studies, let’s look at the hardest part – evaluation…
Different metrics tell us different things, but, perhaps more appropriately is what the metrics don’t tell us
There are lots of them out there, you need to know which ones to use and when.
We’ve obviously got a lot of experience in this area given our background,
I’ll talk about how we collected this information through MT evaluations via a case study with RWS.
What I’ll focus on his WHAT MT evaluation we carried out and what STAGES to give us the information we needed to know
First step is can we improve our engines through customisation.
These automatic scores tell us CONCLUSIVELY. Yes.
But the don’t really tell us anything about QUALITY, or SUITABILITY for the TASK
We need to dig deeper on a segment level and for this, we use TER. WHY?
TER has correlated well with practical evaluations for us.
It gives us practical information which we can correlate with the bottom line
It also gives us practicable (actionable) information which we can use to improve MT and do further analysis
**If you do this over a variety of test documents like we did with RWS, where we used 10, you’ll get a sense of what the MT can bring**
For example, here we see FOR EACH SEGMENT, the TER range and how long the segments are within those ranges.
This allows us to do some calculations, which I won’t detail now, can discuss in the breakout session, but it resulted in a 24% gain
Experience is crucial here. Lot’s of variables and things to look out for, like TRANSLATORS, TEST SETS, and the ENVIRONMENT as I’m sure people here can attest to.
I won’t go into detail but here’s a high level look at what we did to try to find out different information.
We know the European landscape, the stakeholders, and the requirements
Our machine translation expertise is second to none in the commercial landscape, and we’re helping to drive machine translation adoption, and in so doing, taking concepts from the lab to the market
We specialise in collaboration – commercial, public sector, government, and research institution (so we’re well attuned to adapting to shifting priorities)
Iconic was borne out of Europe and we’d be only to happy to give back in whatever way possible (for the right price)
Experience is crucial here. Lot’s of variables and things to look out for, like TRANSLATORS, TEST SETS, and the ENVIRONMENT as I’m sure people here can attest to.
I won’t go into detail but here’s a high level look at what we did to try to find out different information.
TECHNICAL DETAILS SPECIFIC TO POST_EDIT ANALYSES THAT WE LEARNED
In terms of analysing information, there are a number of things to look out for to make sure we’re getting more accurate results.
Save to say now would be the right time to look at quality evaluation and make sure post-editing is not affecting things
MT is now!
“Domain” adaptation is more than just similar documents – it involve taking into account style and variations across languages
Patents are hard – plenty of room for improvement