This was a presentation given at the European Patent Office's annual Patent Information Conference in Madrid, Spain on November 10th, 2016.
In it, we give an overview of how machine translation works, latest advances in neural MT, and how this can be applied to patents and intellectual property content, not only for translations but also information extraction and other NLP applications.
Machine Translation: The Neural FrontierJohn Tinsley
This was a pitch for Iconic's neural machine translation technology given at the TAUS Annual Conference in Portland, Oregan on October 24th, 2016.
There has been a lot of talk, and a lot of hype about neural machine translation in the press. But not a lot of practical application. Let's change the conversation
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
Natural Language Processing is an interrelated disincline adding the capability of communicating as human beings to Computerworld. Amharic language is having much improvement over time thanks to researcher at PHD, MSC level at AAU. Here , I have tried to study and come up a limited scope solution that does syntax parsing for Amharic language and draws syntax parse trees using Python!!
Machine Translation: The Neural FrontierJohn Tinsley
This was a pitch for Iconic's neural machine translation technology given at the TAUS Annual Conference in Portland, Oregan on October 24th, 2016.
There has been a lot of talk, and a lot of hype about neural machine translation in the press. But not a lot of practical application. Let's change the conversation
Machine translation from English to HindiRajat Jain
Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
Natural Language Processing is an interrelated disincline adding the capability of communicating as human beings to Computerworld. Amharic language is having much improvement over time thanks to researcher at PHD, MSC level at AAU. Here , I have tried to study and come up a limited scope solution that does syntax parsing for Amharic language and draws syntax parse trees using Python!!
This slides covers introduction about machine translation, some technique using in MT such as example based MT and statistical MT, main challenge facing us in machine translation, and some examples of application using in MT
Machine translation is an easy tool for translating text from one language to another. You've probably used it. But do you know what machine translation really is? Or when you should or shouldn't use it? Navigate through this presentation to learn more!
In this slides the basic concept of machine translation is described.MT challenges are represented and describes rule-based and statistical MT briefly. Some notes about evaluation is described too
Subject: English 18
Translation and Editing Text
Topic: Techniques in Translation
Techniques in Translation
1. Computer assisted
2. Machine translation
3. Subtitling
4. editing/Post editing
1. COMPUTER-ASSISTED
Computer-assisted translations also called 'computer-aided translation or machine-aided human translation. It is a form of translation wherein human translator creates a target text with the assistance of a computer program. The machine supports a human translator.
What is Computer Aided Translation?
Computer aided translation (also called computer assisted translation) is a system in which a human translator uses a computer in the translation process.
Humans and computers each have their strengths and weaknesses. The idea of computer aided translation (CAT) software is to make the most of the strengths of people and computers.
Translation performed solely by computers ("machine translation") has very poor quality. Meanwhile, no human can translate as fast as a computer can. By using a CAT tool, however, you can gain some of the speed, consistency, and memory benefits of the computer, without sacrificing the high quality of human translation.
Translation Skills: Theory and practice
The theoretical base should include general information regarding the translator's workshop and the issues one should be familiar with.
*Internet
It is worth discussing is the role of the internet as a source of information. It is important to use the translations which have been on the market for some time and are recognized by other people. This is where the internet becomes very useful for it allows us to search forgiven information (google.com, yahoo.com, altavista.com, etc.), use online dictionaries and corpora, or compare different language versions of the same site (Wikipedia the Free Encyclopedia and the ability to switch from different languages defining a given notion-www.wikipedia.org). Google itself is a powerful tool since it allows us not only to search for information on webpages but also it indexes*.doc and *pdf files stored on servers, allowing us to browse through their contents in search for a context.
*Software
A successful translator needs to know how to handle various computer applications in his/her work. That's why basic software used to compress and decompress files should be mentioned (WinZip, WinRAR). PDF and multimedia files readers (images, audio). Last, the use of different word processors, are usually the first application that leads people using a computer for their work. This comprises of spell checking, standard layouts, ability to have some characters appear in bold print, italics, or underlined. We can save documents, so it can be used again, and we can print the documents.
It is important to mention CAT tool, how the
It's a brief overview of Natural Language Processing using Python module NLTK.The codes for demonstration can be found from the github link given in the references slide.
A spell checker is an application program to
process the natural languages in machine readable format
effectively. Spelling checking and correction is a basic
necessity and a tedious work in any language, so we require
spell checker software to do this, which is the fundamental
necessity for any work. Spell checker is a set of program
which analyzes the wrongly used word and corrects it by the
most possible correct word. The challenging task here is the
work done for a Kannada language. In a software system
many Kannada words are typed in several formats since
Kannada has many fonts to write the grammar properly.
In this paper, we describe some techniques used in
Kannada language by a spell checker. We use NLP, which is
a field of computer science having relationship between
human (i.e., natural languages) and computers. Usually, we
have some modern NLP algorithms based on machine
learning to carry out the work.
Too many cooks: Preventing content interference so you can do your jobJared Meyer
“I would actually say it like this…” If these words sound familiar, then you’ve experienced what’s known as “content interference.” It’s unending, unqualified, and unwanted instruction from anyone with eyes to misread and lips to speak opinions that plagues content specialists the world over.
More than an annoyance, unchecked content interference can derail even the best content strategies, and higher ed is exceptionally prone to this particular scourge. Learn why it happens, how to avoid it, and what you can do to turn content’s greatest weakness into your greatest strength.
In this session you’ll learn how to:
Identify the causes of content interference.
Learn specific tactics to prevent content interference.
Pick the right battles to fight (and avoid the ones you can’t win).
Brides Haiti: Quatrième Sondage national d’opinions renseignant les Citoyens...Stanleylucas
Cartographie Electorale et Politique du BRIDES
Elections 2016. Quatrième Sondage national d’opinions renseignant les Citoyens et Citoyennes sur les possibilités de vote pour des candidats à la présidence, au sénat et à la députation aux prochaines élections
Sondage du 13 au 16 novembre 2016
This slides covers introduction about machine translation, some technique using in MT such as example based MT and statistical MT, main challenge facing us in machine translation, and some examples of application using in MT
Machine translation is an easy tool for translating text from one language to another. You've probably used it. But do you know what machine translation really is? Or when you should or shouldn't use it? Navigate through this presentation to learn more!
In this slides the basic concept of machine translation is described.MT challenges are represented and describes rule-based and statistical MT briefly. Some notes about evaluation is described too
Subject: English 18
Translation and Editing Text
Topic: Techniques in Translation
Techniques in Translation
1. Computer assisted
2. Machine translation
3. Subtitling
4. editing/Post editing
1. COMPUTER-ASSISTED
Computer-assisted translations also called 'computer-aided translation or machine-aided human translation. It is a form of translation wherein human translator creates a target text with the assistance of a computer program. The machine supports a human translator.
What is Computer Aided Translation?
Computer aided translation (also called computer assisted translation) is a system in which a human translator uses a computer in the translation process.
Humans and computers each have their strengths and weaknesses. The idea of computer aided translation (CAT) software is to make the most of the strengths of people and computers.
Translation performed solely by computers ("machine translation") has very poor quality. Meanwhile, no human can translate as fast as a computer can. By using a CAT tool, however, you can gain some of the speed, consistency, and memory benefits of the computer, without sacrificing the high quality of human translation.
Translation Skills: Theory and practice
The theoretical base should include general information regarding the translator's workshop and the issues one should be familiar with.
*Internet
It is worth discussing is the role of the internet as a source of information. It is important to use the translations which have been on the market for some time and are recognized by other people. This is where the internet becomes very useful for it allows us to search forgiven information (google.com, yahoo.com, altavista.com, etc.), use online dictionaries and corpora, or compare different language versions of the same site (Wikipedia the Free Encyclopedia and the ability to switch from different languages defining a given notion-www.wikipedia.org). Google itself is a powerful tool since it allows us not only to search for information on webpages but also it indexes*.doc and *pdf files stored on servers, allowing us to browse through their contents in search for a context.
*Software
A successful translator needs to know how to handle various computer applications in his/her work. That's why basic software used to compress and decompress files should be mentioned (WinZip, WinRAR). PDF and multimedia files readers (images, audio). Last, the use of different word processors, are usually the first application that leads people using a computer for their work. This comprises of spell checking, standard layouts, ability to have some characters appear in bold print, italics, or underlined. We can save documents, so it can be used again, and we can print the documents.
It is important to mention CAT tool, how the
It's a brief overview of Natural Language Processing using Python module NLTK.The codes for demonstration can be found from the github link given in the references slide.
A spell checker is an application program to
process the natural languages in machine readable format
effectively. Spelling checking and correction is a basic
necessity and a tedious work in any language, so we require
spell checker software to do this, which is the fundamental
necessity for any work. Spell checker is a set of program
which analyzes the wrongly used word and corrects it by the
most possible correct word. The challenging task here is the
work done for a Kannada language. In a software system
many Kannada words are typed in several formats since
Kannada has many fonts to write the grammar properly.
In this paper, we describe some techniques used in
Kannada language by a spell checker. We use NLP, which is
a field of computer science having relationship between
human (i.e., natural languages) and computers. Usually, we
have some modern NLP algorithms based on machine
learning to carry out the work.
Too many cooks: Preventing content interference so you can do your jobJared Meyer
“I would actually say it like this…” If these words sound familiar, then you’ve experienced what’s known as “content interference.” It’s unending, unqualified, and unwanted instruction from anyone with eyes to misread and lips to speak opinions that plagues content specialists the world over.
More than an annoyance, unchecked content interference can derail even the best content strategies, and higher ed is exceptionally prone to this particular scourge. Learn why it happens, how to avoid it, and what you can do to turn content’s greatest weakness into your greatest strength.
In this session you’ll learn how to:
Identify the causes of content interference.
Learn specific tactics to prevent content interference.
Pick the right battles to fight (and avoid the ones you can’t win).
Brides Haiti: Quatrième Sondage national d’opinions renseignant les Citoyens...Stanleylucas
Cartographie Electorale et Politique du BRIDES
Elections 2016. Quatrième Sondage national d’opinions renseignant les Citoyens et Citoyennes sur les possibilités de vote pour des candidats à la présidence, au sénat et à la députation aux prochaines élections
Sondage du 13 au 16 novembre 2016
Delivered at the European Patent Office's Patent Information Conference.
November 11th 2015
Miami, Florida.
In this talk, we talk about recent advances in MT for patents and introduce our IPTranslator.com application for on-demand translation.
Delivered at the European Patent Office's annual Patent Information Conference (EPOPIC 2014)
November 5th 2014
Warsaw, Poland.
In this talk, we give an introduction as to how machine translation works and what makes certain content types and languages more difficult than others.
Big Data and Natural Language ProcessingMichel Bruley
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
Delivered at the 26th LocWorld Conference in North America.
October 31st 2014
Vancouver, Canada.
In this talk, we describe the various strands of knowledge - machine translation, language, and industry - require to develop effective MT software.
Change affects languages! In this talk, I present how change in societies affects languages, how that impacts language technologies (or how language technologies struggle with keeping up with change), what can we do about it, and what can you do to help.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
This paper presents the method of applying speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in real time calling system. This technique recognizes spoken input, analyzes and translates it, and finally utters the translation. The major part of Speech translation comes under Natural language processing. Natural language processing is a branch of Artificial Intelligence that deals with analyzing, understanding and generating the languages that humans use naturally in order to interface with computers in both written and spoken contexts using natural human languages instead of computer languages. Speech Translation involves techniques to translate the spoken sentences from one language to another. The major part of speech translation involves Speech Recognition which is the translation of spoken speech to text and identifying the context and linguistic structure of the input speech. In the current scenario, the machine does not identify whether the given word is in past tense or present tense. By using the algorithm, we search for a word to check if it is past or present by searching for the sub strings, as “ed”, ”had”, ”Done”, etc., This paper gives us an idea on working with API’s to translate the input speech to the required output speech and thus increasing the efficiency of Speech Translation in cellular devices and also a mobile application that will help us to monitor all the audios present in mobile device and translate it into required language.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Elevating Tactical DDD Patterns Through Object Calisthenics
Past, Present, and Future: Machine Translation & Natural Language Processing for Patent Information
1. ‘Past, Present, and Future’
Machine Translation & Natural Language
Processing for Patent Information
Dr. John Tinsley
CEO, Iconic Translation Machines Ltd.
EPOPIC. Madrid. 10th November 2016
2. BSc in Computational Linguistics
PhD in Machine Translation
Language Technology consultant
Founder of Iconic Translation Machines
Why listen to me?
Machine Translation is what I do!
The world’s first and only patent specific machine translation platform
3. The use of computers to translate from one language into another
The use of computers to automate some, or all, of the translation
process
An approach to Machine Translation, where translations for an input are
estimated based on previous seen translation examples and associated
(inferred) probabilities.
e.g. IPTranslator, Google Translate
Rule-based (or transfer-based): based on linguistic rules
• e.g. Systran; Altavista’s Babelfish
Example-based: based on translation examples and inferred linguistic
patterns
Machine Translation: The Basics
Machine Translation = automatic translation
Statistical Machine Translation (SMT)
Other approaches
SMT is now by far the predominant approach*
4. A corpus (pl. corpora) is a collection
of texts, in electronic format, in a
single language
document(s)
book(s)
Bilingual Corpora
a bilingual corpus
Note source language = original language or language we’re translating from
target language = language we’re translating into
A bilingual corpus is a collection of
corresponding texts, in multiple
languages
a document & its translation
a book in multiple languages
European Parliament proceedings
5. Aligned Bilingual Corpora
A document-aligned bilingual corpus corresponds on a document level
For translation, we required sentence-aligned bilingual corpora
The sentence on line 1 in the source language text corresponds
to (i.e. is a translation of) the sentence on line 1 in the target
language text etc.
Often referred to as parallel aligned corpora
Sentence aligned bilingual parallel corpora
are essential for statistical machine translation
6. Learning from Previous Translations
Suppose we already know
(from a sentence-aligned bilingual
corpus) that:
“dog” is translated as “perro”
“I have a cat” is translated as
“Tengo un gato”
We can theoretically translate:
“I have a dog” “Tengo un perro”
Even though we have never seen “I
have a dog” before
Statistical machine translation induces information about unseen input, based on
previously known translations:
Primarily co-occurrence statistics
Takes contextual information into account
9. Statistical Machine Translation
From the corpus we can infer possible target (French)
translations for various source (English) words
We can then select the most probable translations
based on simple frequencies (co-occurrence statistics)
11. Advanced MT
All modern approaches are based on building translations for complete
sentences by putting together smaller pieces of translation
Previous example is very simplistic
In reality SMT systems calculate much more complex statistical models
over millions of sentence pairs for a pair of languages
Upwards of 2M sentence pairs on average for large-scale systems
Word-to-word translation probabilities
Phrase-to-phrase translation probabilities
Word order probabilities
Linguistic information (are the words nouns, verbs?)
Fluency of the final output
Previous example is very simplistic
Other statistics calculated include
12. Data is Key
For SMT data is key
Information (word/phrase correspondences and associated statistics) is only based
on what we have seen before in the data
Important that data used to train SMT systems is:
Of sufficient size
avoid sparseness/skewed statistics
Representative and relevant
contains the right type of language
High-quality
absence of misspellings,
incorrect alignments etc.
Proofed by human
translators
training data
13. Why is MT Difficult?
A word or a phrase can have more than one meaning (ambiguity – lexical or
structural)
e.g. “bank”, “dive”, “I saw the man with the telescope”
People use language creatively
New words are cropping up all the time
Linguistic differences between languages
e.g. structure of Irish sentences vs. structure of English sentences:
“Tá (Is) ocras (hunger) orm (on me)” <-> “I am hungry”
There can be more than one way to express the same meaning.
“New York”, “The Big Apple”, “NYC”
14. Why is MT Difficult?
Israeli officials are responsible for airport security.
Israel is in charge of the security at this airport.
The security work for this airport is the responsibility of the Israel government.
Israeli side was in charge of the security of this airport.
Israel is responsible for the airport’s security.
Israel is responsible for safety work at this airport.
Israel presides over the security of the airport.
Israel took charge of the airport security.
The safety of this airport is taken charge of by Israel.
This airport’s security is the responsibility of the Israeli security officials.
15. No single solution for all languages
Number agreement: the house / the houses vs. la maison / les maisons
Gender agreement: the house / the cheese vs. la maison / le frommage
English - Spanish
English - French
16. No single solution for all languages
English - German
English - Chinese
种水果的农民
The farmer who grows fruit
[Lit: “grow fruit (particle) farmer”]
17. Not all languages are created equal
French German Turkish Finnish
Spanish Chinese Korean Hungarian
Portuguese Japanese Thai Basque
18. The Challenge of Patents
L is an organic group selected from -CH2-
(OCH2CH2)n-, -CO-NR'-, with R'=H or
C1-C4 alkyl group; n=0-8; Y=F, CF3 …
maximum stress of 1.2 to 3.5 N/mm<2>
and a maximum elongation of 700 to
1,300% at 0[deg.] C.
Long Sentences
Technical constructions
Largest single document: 249,322 words
Longest Sentence: 1,417 words
19. The Challenge of Patents
Very long sentences as standard
Grammatically incomplete using
nominal and telegraphic style (!)
Passive forms are frequent
Frequent use of subordinate clauses,
participles, implicit constructs
Inconsistent and incorrect spelling
High use of neologisms
Instances of synonymy and polysemy
Spurious use of punctuation
Authoring guide
for “to be
translated” text
Patents break
almost all of the
rules!
20. Judge the quality of an MT system by comparing its output against a
human-produced “reference” translation
Pros: Quick, cheap, consistent
Cons: Inflexible, cannot be used on ‘new’ input
Pros: Reliable, flexible, multi-faceted (fluency, error analyses,
benchmarking)
Cons: Slow, expensive, subjective
Fluency vs. Adequacy
Evaluating Machine Translation Quality
Automatic Evaluation
Human Evaluation
Task-Based Evaluation
21. Evaluating Machine Translation Quality
Task Based Evaluation
Standalone evaluation of MT systems is necessary to get a sense of the
overall quality of a system
To determine the ultimate usability of an MT system, intrinsic task-based
evaluation is required
Why? Fluency vs. Adequacy
Fluency how fluent and grammatically correct the translation
output is
Adequacy how accurately the translation conveys the meaning of the
source
Output 1 The big blue house
Output 2 The big house red
Source La gran casa roja
Task-Based Evaluation
22. Practical uses of Machine Translation
Understand its limitations and you’ll understand
its capabilities!
No
Translate a patent for filing
Translate literature for
publication
Translate marketing
materials
Anything mission critical
without review
Yes
Productivity tool for
professional translation
Understand foreign patents
Localisation processes and
“controlled’ content
High volume, e.g. eDiscovery
23. Use cases in practice
Product descriptions
to open new markets
MT for post-editing
productivity across
industries
Developer, and user
for web content
Tens of thousands of
people using online
tools daily
24. Neural Networks
Using artificial intelligence and deep learning to develop a
completely new way of doing machine translation!
Quality Estimation
Functionality through which machine translation can “self-
assess” the quality of the translations it produces.
Online Adaptive Translation
Machine translations that can automatically learn and improve
based on feedback, particularly from revisions.
Use-case specific MT
Just like patent MT, but for countless other areas.
Current Hot Topics
25. About Iconic
We are a Machine Translation and Natural
Language Processing software and
services provider, delivering expert
solutions with Subject Matter Expertise
30. Speed, Cost, and Quality
What is the difference between machine translation vs. manual translation when
translating a 10 page patent document from Chinese into English?
Machine Translation is not
designed to replace
professional translation but
there are many cases
where costly and time-
consuming manual
translation is simply not
necessary.
31. - Data confidentiality
- File formats
- Potential for customisation,
enhancements, and
improvement for specific
domains
32. More than just translation
DATA PROCESSING
E.G. OPTICAL CHARACTER
RECOGNITION, DIGITISATION
DATABASE BUILDING
E.G. COMBINING THE ABOVE, WITH
TRANSLATION, FOR EXPORT
DATA UNDERSTANDING
E.G. SUMMARISATION, CONCEPT &
KEY TERM IDENTIFICATION
INFORMATION EXTRACTION
E.G. CITATION ANALYSIS, CROSS-
LINGUAL SEARCH
34. Citation Analysis
Assessment of record and reference patterns Application for record extraction
Tracking variations across years
Application for bibliographic data fielding
Second point is important. It has different uses and usability. The concept of FAHQMT is no more. Focus is now on HAMT and PEMT.
Problems with rule-based is that they didn’t scale
You need bilingual experts for each language pair
SMT is the predominant approach
Starting point for all systems is data.
The most important aspect is the quality of the data…
They are essential and the quality is crucial.
The translations must be accurate and the alignment must be correct, otherwise we infer the wrong things. Introduce “noise” into our systems.
How do we use these corpora? It’s all about learning and remembering things we’ve seen before, the same way you might go about translating something
Ok, so the translation isn’t exactly right here. It should be “Je parle a la fille” but we haven’t seen enough examples (don’t have enough data) for reliable estimates, we’re just going on the counts of the words
How likely a word is to translate to another word – as you have seen
How likely the different phrases are to translate as one another
What’s the likelihood a certain word will have a different position in the target sentence
Sometimes we take into account linguistic information about the words, is it a verb, then it should go here, articles should proceed nouns, etc.
Look at models of the target language and see if what we have produce makes sense (can these words go together in this order?)
Google Translate aims to be a general system, but what happens when your translating a sports website? Quality issues can be caused by the fact that there’s a lot of other data in their models than sports news.
Similarly, if I have a translation system for car manuals, it won’t be any good at translating sports websites.
This is reflected in our systems at IPTranslator too where all of our models are built using patents which have been filed in multiple languages to ensure we get the style correct
(patents are a bigger fish than this though)
The simple answer is that language is complex! Which is what makes it difficult to learn but also so interesting at the same time!
Who has the telescope, him or I?
New words, especially in patents. And new usage of words. The verb “to tweet” didn’t exist so long ago…
The last piece in the puzzle is understanding the languages you’re developing MT systems for. And that’s not understanding them in isolation – that’s understanding, for each language pair, what the differences are between them, e.g. many of the things we need to look out for when developing English-Spanish translation engines we don’t need to do for French-Spanish translation
With certain language pairs, things get more complex. The processes that we need to develop are harder to develop, less studied, require smarter people!
Chinese, need to identify these DE constructions so we know to move the head noun
No tense, going into English, how do we know what tense?
There’s no article! We have to generate it!
DE particle has many translations, which one!
FIRST THINGS FIRST, which ones are the words!? We need to segment the Chinese!
ONLY WITH THESE SKILLS CAN YOU EXPLOIT THE TECHNOLOGY TO ITS FULLEST – AND WHAT DO WE GET IN DOING THIS? MT WITH SUBJECT MATTER EXPERTISE
**EFFECT ON FEASIBILITY**
Basically, some languages are easier for MT that others.
General rule, closer two languages are to one another in terms of word order, grammatical structure, the easier.
Here’s some rules of thumb (with English)
But of course it’s not just that easy.
Patents for example have a range of highly complex linguistic characteristics that make this challenging, both for PROFESSIONAL translators as well as for Translation Software.
Lets look for example at this patent – what’s highlighted in blue is a SINGLE sentence, (which is an individual legal claim).
Additionally, we have to deal with complex technical constructions such as chemical formulae, alphanumeric sequences, even genomic and amino acid sequences.
And then we have patents which introduce a whole new level of complexity on top of the language issues…
Patents are hard to read, never mind translate, never mind try to teach a computer how to translate them!
Sometimes it’s hard to tell whether the translation is bad or that’s simply how the original patent was written
Commercial machine translation is plagued with misleading marketing with unrealistic claims and promises - Need to manage expectations
When I say NO, I mean no in a fully-automatic manner with no human intervention
Filing – not when meaning is CRUCIAL
Publication – no, there will be errors
Marketing – no, not with subtleties, idioms, etc.
MT solutions and services provider, specializing in providing customised solutions with subject matter expertise for specific techincal sectors, such as Patents/IP, life sciences, and financial.
We are the MT partner of choice for some of the world’s largest translation companies, information providers, and government and enterprise organisations.
For Translation Companies: We help translation companies to translate more content, more accurately for faster project turnaround, resulting in significant cost savings and increased revenue.
For Enterprise Clients: We help enterprises to translate more content in less time, resulting in faster products to market and enhanced global reach.
For Information Providers: We help information providers to translate knowledge, literature and documentary information faster and more accurately, resulting in broader knowledge offerings and faster time to market.
THERE’S VALUE TO BE ADDED, HOW CAN WE HARNESS?
We literally already have the perfect environment to allow NMT to be another string in the bow and let us use the most appropriate MT for the job
WHETHER IT BE NEURAL FOR KOREAN, FOR CHAT TEXT, OR WHATEVER THE CASE MAY BE
It’s not a one size fits all solution and who knows when it will be, but we have developed a framework that allows us to leverage it’s strength on a case by case basis to deliver the best possible translation for a given task.
Overtime we fully expect the “brain to grow” and become the best MT on offer for various language pairs and content types, and when it is, WE”RE PERFECTLY POSITIONS FROM A TECHINCOLOGY AND EXPERTISE PERSPECTIVE to capitalise on this wave.
We’ve launched a new product this year which is essentially repurposing the technology that we have and focusing on very particular use cases…
Firstly, let’s just look at the stark motivation for using MT for patent information in the first place…
The “standard” solution to the problem of foreign language documents is translations.
But translation is costly, not that quick, and often it is complete overkill for what is required!!
This is where MT comes as a much more cost-effective, rapid solution that allow you to make a QUICK determination as to whether something is relevant or not before you invest in a professional translation.
And, while we all know that MT isn’t perfect, the reality of the situation is that the quality is often “good enough” or fit for the purpose of make this determiniation.
SO IT’S A NO-BRAINER
So going back to IPTranslator, the elephant in the room for us for a long time has been Google Translate. The first question we get asked always is “is it better than Google Translate?”
The answer is yes, the majority of the time for most of the languages that we cover. However, is that increase in quality enough to justify the cost of our server over Google which is a free service? It’s hard to beat free! The reality is now, the “fit for purpose / good enough” quality level is something that Google can achieve often, especially since it started working with the EPO.
So where does IPTranslator fit?
Confidential Data
File formats incl. pdf
Potential for customisation, enhancements, and improvement for specific domains
Not just for patents, but for journals and other non-patent literature
Why was it challenging?
Exceptions to patterns
OCR errors
Lack of formatting information
The record extraction example is from Pattern B
The bib data example is from Pattern 5