SlideShare a Scribd company logo
Bilingual Terminology Extraction from TMX
A State-of-the-Art Overview
Chelo Vargas-Sierra, PhD
University of Alicante,
Spain
2
Key words
Overview of terms
involved in the
process
1st point 2nd point 3rd point 4th point
Evaluation
BATE under evaluation
Measures for accuracy
Quality in use model and tasks
Terminology and extractors
Terminology management
Its timeline
BATE (approaches, state of the art)
Results
Precision & Recall
Parameters & Questionnaire
INDEX
Main points of this presentation
Parallel corpus
TMX
Alignment levels
Paragraph, sentence and word
level
ATE & BATE Precision/Recall
Getting only terms and all terms
Gold standard
Exhaustive, manually created
bilingual glossary
Validation
* Term validation facility
* Which TCs are real terms?
Usability
Software used to achieve
user’s objectives with
effectiveness, efficiency,
and satisfaction
Quality in use model
ISO standard
KEY WORDS
Terms involved in the process
2. Terminology &
Extractors
5
IDENTIFY
FINDRETRIEVE
the terminology in the source text adequately
Identify and interpret
terminological data
Retrieve and store
proper documentation and
information resources
Find and use
IMPORTANCE OF TERMINOLOGY
Translators were the first professionals to be aware of term-related issues
6
6
Time spent to solve terminological problems (Arntz 1993,
Walker 1993).+40%
In specialized translation
TERMINOLOGY MANAGEMENT
7
7
Managing terminology (extracting, validating, importing, adding, editing, deleting,
revising, updating, exporting, publishing) is a time-comsuming process.
Time spent to solve terminological problems (Arntz 1993, Walker 1993).
+40%
In specialized translation
TERMINOLOGY MANAGEMENT
8
8
Managing terminology (extracting, validating, importing, adding, editing, deleting, revising,
updating, exporting, publishing) is a time-comsuming process.
Time spent to solve terminological problems (Arntz 1993, Walker 1993).
+40%
In specialized translation
TERMINOLOGY MANAGEMENT
Terminology work is “on backstage”, and customer or
employers may not be fully aware of their befefits for QA.
9
9
Managing terminology (extracting, validating, importing, adding, editing, deleting, revising,
updating, exporting, publishing) is a time-comsuming process.
Time spent to solve terminological problems (Arntz 1993, Walker 1993).
+40%
In specialized translation
TERMINOLOGY MANAGEMENT
Return on Investment (ROI) on terminology management
reported by some corporate studies (Childress, 2007;
Popiolek, 2015)
90%
Terminology work is “on backstage”, and customer or employers may not be fully aware of
their befefits for QA.
10
10
TERMINOLOGY MANAGEMENT
Extraction
• List of terms extracted from ST
• List of terms to validate (accept or reject)
Translation
• List is added to a termbase
• List is translated and additional data added
Approval
• List approved by a person in charge of terminology
• When the client has requested there is an addtional
step for client approval
General model por project terminology
creation (Popiolek, 2015: 351)
Monolingual
extraction &
validation
Importing &
looking for
equivalents
11
Preparing the files and import
them into the BATE
Preparation: TMX import
List of candidate term pairs
extracted from TMX
Bilingual extraction
TIMELINE in Terminology Management
with bilingual extraction
12
- List of pair of terms to validate (accept
or reject terms and suggested
equivalents)
- Term by term and additional data are
added to a term base (Synchroterm)
Validation (& data entry)
- Export bilingual terms and additional
data in an available file format (.xls,
.txt, .TBX, …)
- Import output file to a TDB system
(to be integrated into a MT System)
Export/Import
13
Person in charge of terminology
or client
Approval
Ready to use
Finish
14
Bilingual Automatic Term Extractors
Two approaches (Foo, 2012)
EXTRACT-ALIGN
1ST step: monolingual terminology extraction
in both languages.
2nd step: cross-linguistic matching using
word-alignment or co-occurrence statistics to
find equivalents.
Commercial systems in this approach
15
ALIGN-FILTER
1ST step: word-alignment on the
parallel texts.
2nd step: rank the aligned units to
finally select the most likely pair of
candidates (statistics)
TExSIS (Macken et al, 2013)
Bilingual Automatic Term Extractors
Two approaches (Foo, 2012)
16
Bilingual Automatic Term Extractors
Academic / In-house
- English-French TERMIGHT (Dagan & Church, 1994)
- English-French (Kupiek, 1993)
- English-Dutch (Eijk, 1993)
- English-French (Gaussier, 1995)
- English and Swedish (Ahrenberg et al., 1998)
- French-Japanese (Morin et al 2010, from
ACABIT, Daille, 2003): not bilingual, but
multilingual
- Slovene and English, Luiz (Vintar, 2010);
- English and Swedish ITools suite (Foo &
Merkel, 2010)
- English and German (Gojun et al., 2012).
- English, French, German, Spanish, TTC
TermSuite (Daille, 2012)
- English-Spanish TBXTools (Oliver &
Vázquez, 2015) (under development)
- Chinese, Czech, Dutch, English, French,
German, Italian, Japanese, Korean, Polish,
Portuguese, Russian, Spanish: Sketch
Engine (Baisa et al 2015, Koval et al 2016)
- French-German (Blank, 2000)
- Japanese-English, MNH (Nakagawa & Mori, 2003)
- Spanish-Basque, Elexbi (Hernaiz et al., 2006),
from a TMX;
- Spanish-German, Autoterm (Haller, 2008);
- English-Spanish, Mutual Bilingual Term
Extractor (Ha et al, 2008)
- French-English, French-Italian and French-Dutch
(Lefever et al., 2009)
90s
2000-2009
2010 -2016
17
Bilingual Automatic Term Extractors
Other BATE (free / comercial)
- TermExtractor (Shimohata et al 2001)
- MemoQ's built-in term extractor
- Déjà Vu - Lexicon
- TermoStat Web: http://termostat.ling.umontreal.ca/
- Yate (IULA)
- Okapi
- TerMine:
http://www.nactem.ac.uk/software/termine/
- TerminologyExtractor: https://goo.gl/yA2Cuf
- PRoMT
- FiveFilters (web-based): http://fivefilters.org/term-
extraction/
- Concordace programs: WordSmith Tools,
AntConc (free), …
90s
2010 -2016MONOLINGUAL ATE
- Xerox Terminology Suite (2001)
- SDL Multiterm Extract
- Synchroterm
- CrossMining (Across)
- MultiTrans Term Extractor
- Similis™ (by Lingua et Machina™)
- Anchovy (by Swordfish)
- Araya Term Extractor
- Analysis software: Sketch Engine
(terminology extraction from TMX)
BILINGUAL
3. Evaluation
19BATE UNDER EVALUATION
Sketch Engine
SIMILIS
Multiterm Extract
Synchroterm
20
Multiterm Extract SynchroTerm Similis SkE Araya
Import TMX
Extraction config.
Extraction scores
Validation facility
Term base indexation
Export to TBX (xls, txt…)
Trados TMX
MAIN FEATURES
Others Others
21
TERMS
NO TERMS
EXTRACTED NON-EXTRACTED
A B
C D
RECALL =
𝐴𝐴
𝐴𝐴+𝐵𝐵
PRECISION =
𝐴𝐴
𝐴𝐴+𝐶𝐶
MEASURES FOR ACCURACY
Context coverage
degree to which the
product understands the
complete context of its
usage. Flexibility
Effectiveness
accuracy and completeness
with which user achieves
objectives
Satisfaction
Efficiency
resources expended in
relation to the accuracy and
completeness
Freedom from risk
no risk for the security of
users, software, context or the
environment
degree to which user needs are
satisfied when a software is
used in a specified context of use
QUALITY IN USE MODEL
Characteristics (ISO-IEC 25010: 2011)
23
Setting up the
extraction project
CONFIGURATION
Importing the source file
TMX IMPORT
Performing the
extraction to get a
bilingual list
EXTRACTION
Selecting the real terms.
VALIDATION
Creating and managing
term entries
RECORD CREATION
Exporting the final result for
later use in CAT Systems
EXPORTATION
6 TASKS TO EVALUATE
when performing bilingual extraction
4. Results
25
28,30
43,33
10,66
14,85
21,29
62,33
45,42
51,61
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
PRECISION RECALL
PRECISION & RECALL IN %
Sketch MTE Synchr Similis
EXTRACTED NON-EXTRACTED
TERMS NO TERMS TERMS NO TERMS
GOLD
STANDARD
TCs PRECISION RECALL
A C B D
Sketch 283 717 370 653 1,000 28,30 43,34
MTE 97 813 556 910 10,66 14,85
SynchroT. 407 1505 246 1,912 21,29 62,33
Similis 337 405 316 742 45,42 51,61
26
Characteristics and sub-characteristics to be measured METRICS
EFFECTIVENESS Value between 0 (minimum) and 5 (maximum) (EFE1+EFE2+EFE3)/3
EFE1.- Degree of accuracy – precision of tasks & results
(P1+P7+P13+P19+P25+P31)/6
EFE2.- Degree of completeness (tasks are accomplished and
results are not missing)
(P2+P8+P14+P20+P26+P32)/6
EFE3.- Frequency of errors
(P3+P9+P15+P21+P27+P33)/6
EFFICIENCY Value between 0 (minimum) and 5 (maximum) (EFI2+EFI3+EFI4)/3
EFI1.- Time spent in the accomplishment of the task.
(TM1+TM2+TM3+TM4+TM5+TM6)
EFI2.- Need to use additional sources (material, software, etc.)
for the task
(P4+P10+P16+P22+P28+P34)/6
EFI3.- Productivity – effort exerted by the user to carry out the
task
(P5+P11+P17+P23+P29+P35)/6
EFI4.- Need to consult the software Help to perform the task
(P6+P12+P18+P24+P30+P36)/6
SATISFACTION Value between 0 (minimum) and 5 (maximum)
(P37+P38+P39)/3
SAT1.- Usefulness
SAT2.- Trust
SAT3.- Pleasure
CONTEXT COVERAGE Value between 0 (minimum) and 5 (maximum)
(P40+P41+P42)/3COB1.- Context of use
COB2.- Flexibility
PARAMETERS
27
QUESTIONNAIRE
42 questions grouped by tasks
28
16
13
14
25
20
26
21
24
0
5
10
15
20
25
30
EXTRACTION VALIDATION
RESULTS FOR EXTRACTION & VALIDATION
Sketch MTE Synchr Similis
3,33
3,00
4,00
3,50
13,83
4,06
4,44
3,00
1,50
13,00
4,11 4,22 4,33
3,00
15,67
3,72
3,11 3,00 3,00
12,83
0,00
2,00
4,00
6,00
8,00
10,00
12,00
14,00
16,00
18,00
EFFECTIVENESS EFFICIENCY SATISFACTION CONTEXT COVERAGE TOTAL QIU
FINAL RESULTS FOR QUALITY IN USE
Sketch MTE Synchr Similis
29
CONCLUSIONS
• Managing terminology still takes a lot of time and effort, even in
this increasingly computerized profession.
• Research on automatic terminology extraction has been
around for more than 20 years and significant enhancements
concerning bilingual extraction and bilingual corpora
exploitation have been introduced.
• I briefly described the BATE under evaluation and illustrated
some results obtained for accuracy and with the QIU model.
• Results make it clear that much more work has to be done for
BATE to be considered of real help to translators and
terminologists, mainly due to poor accuracy results.
Some references
• Baisa, Vit, Barbora Ulipová, and Michal Cukr. 2015. “Bilingual Terminology Extraction in Sketch Engine.” In 9th
Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2015), 61–67.
• Childress, Mark D. 2007. “Terminology Work Saves More Time than It Cost.” Multilingual, no. April/May: 43–46.
• Foo, Jody. 2012. Computational Terminology : Exploring Bilingual and Monolingual Term Extraction.
• Foo, Jody; Merkel, Magnus. 2010. “Computer Aided Term Bank Creation and Standardization. Building Stardardize
Term Banks through Automated Term Extraction and Advanced Editing Tools.” In Terminology in Everyday Life,
edited by Marcel Thelen and Fireda Steurs, 163–80. John Benjamins Publishing Company. doi:
10.1075/tlrp.13.12foo.
• Kovář, Vojtěch, Vít Baisa, and Miloš Jakubíček. 2016. “Sketch Engine for Bilingual Lexicography.” International
Journal of Lexicography 29 (3): 339–52. doi:10.1093/ijl/ecw029.
• Macken, Lieve, Els Lefever, and Veronique Hoste. 2013. “TExSIS: Bilingual Terminology Extraction from Parallel
Corpora Using Chunk-Based Alignment.” Terminology 19 (2013): 1–30. doi:10.1075/term.19.1.01mac.
• Oliver, Antoni, and M. Vazquez. 2015. “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology
Extraction.” In Proceedings of Recent Advances in Natural Language Processing, 473–79.
• Popiolek, Monika. 2015. “Terminology Management within a Translation Quality Assurance Process.” In Handbook
of Terminology (Volume 1), edited by Hendrik J Kockaert and Frieda Steurs, 341–59. John Benjamins Publishing
Company. doi:10.1075/hot.1.ter6.
• Sauron, Véronique. 2002. “Tearing out the Terms : Evaluating Terms Extractors.” In Translating and the Computer
24: Proceedings from the Aslib Conference, 21-22 November 2002.
• Vintar, Špela. 2010. “Bilingual Term Recognition revisited<BR> The Bag-of-Equivalents Term Alignment Approach
and Its Evaluation.” Terminology 16 (2010): 141–58. doi:10.1075/term.16.2.01vin.
University of Alicante
IULMA
Campus de San Vicente
Apdo. 99
03080 Alicante
Phone & Fax
Direct Line: +34 965903438
Fax: +34 965903800
chelo.vargas@ua.es
Social Media
@chelovargas
Many thanks for your attention
Chelo Vargas-Sierra

More Related Content

Viewers also liked

Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
Tobias Wunner
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
Adrien Barbaresi
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
webLyzard technology
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
Haithem Afli
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
Estelle Delpech
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
İrem Tümer
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
Alberto Simões
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Association for Computational Linguistics
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Association for Computational Linguistics
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Estelle Delpech
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Estelle Delpech
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Sarvnaz Karimi
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Association for Computational Linguistics
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
Estelle Delpech
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
teflang
 

Viewers also liked (16)

Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Similar to Chelo Vargas-Sierra

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
Iconic Translation Machines
 
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
QueBIT Consulting
 
Simplified Technical English, Quality Control for Content
Simplified Technical English, Quality Control for ContentSimplified Technical English, Quality Control for Content
Simplified Technical English, Quality Control for Content
tedopres
 
The Essentials of a Translation Process
The Essentials of a Translation ProcessThe Essentials of a Translation Process
The Essentials of a Translation Process
International Federation of Accountants
 
Track g semiconductor test program - testinsight
Track g  semiconductor test program - testinsightTrack g  semiconductor test program - testinsight
Track g semiconductor test program - testinsight
chiportal
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
Lawrence Bernstein
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
Manuel Herranz
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS - The Language Data Network
 
eTMF ppt
eTMF ppteTMF ppt
eTMF ppt
eTMF ppteTMF ppt
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 iti
ITIRussia
 
manual-testing
manual-testingmanual-testing
manual-testing
Kanak Mane
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
Manuel Herranz
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
Perforce
 
Test construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erpTest construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erp
William Kapambwe
 
Tech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.pptTech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.ppt
SharanabasappaDegoan
 
ISTQB, ISEB Lecture Notes
ISTQB, ISEB Lecture NotesISTQB, ISEB Lecture Notes
ISTQB, ISEB Lecture Notes
onsoftwaretest
 
Advanced Testing with TTCN-3 and UML Testing Profile
Advanced Testing with TTCN-3 and UML Testing ProfileAdvanced Testing with TTCN-3 and UML Testing Profile
Advanced Testing with TTCN-3 and UML Testing Profile
Axel Rennoch
 
ISTQB / ISEB Foundation Exam Practice -1
ISTQB / ISEB Foundation Exam Practice -1ISTQB / ISEB Foundation Exam Practice -1
ISTQB / ISEB Foundation Exam Practice -1
Yogindernath Gupta
 
Test planning
Test planningTest planning
Test planning
rahulcentra
 

Similar to Chelo Vargas-Sierra (20)

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
The Importance of Performance Testing Theory and Practice - QueBIT Consulting...
 
Simplified Technical English, Quality Control for Content
Simplified Technical English, Quality Control for ContentSimplified Technical English, Quality Control for Content
Simplified Technical English, Quality Control for Content
 
The Essentials of a Translation Process
The Essentials of a Translation ProcessThe Essentials of a Translation Process
The Essentials of a Translation Process
 
Track g semiconductor test program - testinsight
Track g  semiconductor test program - testinsightTrack g  semiconductor test program - testinsight
Track g semiconductor test program - testinsight
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
eTMF ppt
eTMF ppteTMF ppt
eTMF ppt
 
eTMF ppt
eTMF ppteTMF ppt
eTMF ppt
 
Terminology management as fitness v.2 iti
Terminology management as fitness v.2 itiTerminology management as fitness v.2 iti
Terminology management as fitness v.2 iti
 
manual-testing
manual-testingmanual-testing
manual-testing
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
 
Test construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erpTest construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erp
 
Tech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.pptTech 031 Unit 5pp.ppt
Tech 031 Unit 5pp.ppt
 
ISTQB, ISEB Lecture Notes
ISTQB, ISEB Lecture NotesISTQB, ISEB Lecture Notes
ISTQB, ISEB Lecture Notes
 
Advanced Testing with TTCN-3 and UML Testing Profile
Advanced Testing with TTCN-3 and UML Testing ProfileAdvanced Testing with TTCN-3 and UML Testing Profile
Advanced Testing with TTCN-3 and UML Testing Profile
 
ISTQB / ISEB Foundation Exam Practice -1
ISTQB / ISEB Foundation Exam Practice -1ISTQB / ISEB Foundation Exam Practice -1
ISTQB / ISEB Foundation Exam Practice -1
 
Test planning
Test planningTest planning
Test planning
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Chelo Vargas-Sierra

  • 1. Bilingual Terminology Extraction from TMX A State-of-the-Art Overview Chelo Vargas-Sierra, PhD University of Alicante, Spain
  • 2. 2 Key words Overview of terms involved in the process 1st point 2nd point 3rd point 4th point Evaluation BATE under evaluation Measures for accuracy Quality in use model and tasks Terminology and extractors Terminology management Its timeline BATE (approaches, state of the art) Results Precision & Recall Parameters & Questionnaire INDEX Main points of this presentation
  • 3. Parallel corpus TMX Alignment levels Paragraph, sentence and word level ATE & BATE Precision/Recall Getting only terms and all terms Gold standard Exhaustive, manually created bilingual glossary Validation * Term validation facility * Which TCs are real terms? Usability Software used to achieve user’s objectives with effectiveness, efficiency, and satisfaction Quality in use model ISO standard KEY WORDS Terms involved in the process
  • 5. 5 IDENTIFY FINDRETRIEVE the terminology in the source text adequately Identify and interpret terminological data Retrieve and store proper documentation and information resources Find and use IMPORTANCE OF TERMINOLOGY Translators were the first professionals to be aware of term-related issues
  • 6. 6 6 Time spent to solve terminological problems (Arntz 1993, Walker 1993).+40% In specialized translation TERMINOLOGY MANAGEMENT
  • 7. 7 7 Managing terminology (extracting, validating, importing, adding, editing, deleting, revising, updating, exporting, publishing) is a time-comsuming process. Time spent to solve terminological problems (Arntz 1993, Walker 1993). +40% In specialized translation TERMINOLOGY MANAGEMENT
  • 8. 8 8 Managing terminology (extracting, validating, importing, adding, editing, deleting, revising, updating, exporting, publishing) is a time-comsuming process. Time spent to solve terminological problems (Arntz 1993, Walker 1993). +40% In specialized translation TERMINOLOGY MANAGEMENT Terminology work is “on backstage”, and customer or employers may not be fully aware of their befefits for QA.
  • 9. 9 9 Managing terminology (extracting, validating, importing, adding, editing, deleting, revising, updating, exporting, publishing) is a time-comsuming process. Time spent to solve terminological problems (Arntz 1993, Walker 1993). +40% In specialized translation TERMINOLOGY MANAGEMENT Return on Investment (ROI) on terminology management reported by some corporate studies (Childress, 2007; Popiolek, 2015) 90% Terminology work is “on backstage”, and customer or employers may not be fully aware of their befefits for QA.
  • 10. 10 10 TERMINOLOGY MANAGEMENT Extraction • List of terms extracted from ST • List of terms to validate (accept or reject) Translation • List is added to a termbase • List is translated and additional data added Approval • List approved by a person in charge of terminology • When the client has requested there is an addtional step for client approval General model por project terminology creation (Popiolek, 2015: 351) Monolingual extraction & validation Importing & looking for equivalents
  • 11. 11 Preparing the files and import them into the BATE Preparation: TMX import List of candidate term pairs extracted from TMX Bilingual extraction TIMELINE in Terminology Management with bilingual extraction
  • 12. 12 - List of pair of terms to validate (accept or reject terms and suggested equivalents) - Term by term and additional data are added to a term base (Synchroterm) Validation (& data entry) - Export bilingual terms and additional data in an available file format (.xls, .txt, .TBX, …) - Import output file to a TDB system (to be integrated into a MT System) Export/Import
  • 13. 13 Person in charge of terminology or client Approval Ready to use Finish
  • 14. 14 Bilingual Automatic Term Extractors Two approaches (Foo, 2012) EXTRACT-ALIGN 1ST step: monolingual terminology extraction in both languages. 2nd step: cross-linguistic matching using word-alignment or co-occurrence statistics to find equivalents. Commercial systems in this approach
  • 15. 15 ALIGN-FILTER 1ST step: word-alignment on the parallel texts. 2nd step: rank the aligned units to finally select the most likely pair of candidates (statistics) TExSIS (Macken et al, 2013) Bilingual Automatic Term Extractors Two approaches (Foo, 2012)
  • 16. 16 Bilingual Automatic Term Extractors Academic / In-house - English-French TERMIGHT (Dagan & Church, 1994) - English-French (Kupiek, 1993) - English-Dutch (Eijk, 1993) - English-French (Gaussier, 1995) - English and Swedish (Ahrenberg et al., 1998) - French-Japanese (Morin et al 2010, from ACABIT, Daille, 2003): not bilingual, but multilingual - Slovene and English, Luiz (Vintar, 2010); - English and Swedish ITools suite (Foo & Merkel, 2010) - English and German (Gojun et al., 2012). - English, French, German, Spanish, TTC TermSuite (Daille, 2012) - English-Spanish TBXTools (Oliver & Vázquez, 2015) (under development) - Chinese, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish: Sketch Engine (Baisa et al 2015, Koval et al 2016) - French-German (Blank, 2000) - Japanese-English, MNH (Nakagawa & Mori, 2003) - Spanish-Basque, Elexbi (Hernaiz et al., 2006), from a TMX; - Spanish-German, Autoterm (Haller, 2008); - English-Spanish, Mutual Bilingual Term Extractor (Ha et al, 2008) - French-English, French-Italian and French-Dutch (Lefever et al., 2009) 90s 2000-2009 2010 -2016
  • 17. 17 Bilingual Automatic Term Extractors Other BATE (free / comercial) - TermExtractor (Shimohata et al 2001) - MemoQ's built-in term extractor - Déjà Vu - Lexicon - TermoStat Web: http://termostat.ling.umontreal.ca/ - Yate (IULA) - Okapi - TerMine: http://www.nactem.ac.uk/software/termine/ - TerminologyExtractor: https://goo.gl/yA2Cuf - PRoMT - FiveFilters (web-based): http://fivefilters.org/term- extraction/ - Concordace programs: WordSmith Tools, AntConc (free), … 90s 2010 -2016MONOLINGUAL ATE - Xerox Terminology Suite (2001) - SDL Multiterm Extract - Synchroterm - CrossMining (Across) - MultiTrans Term Extractor - Similis™ (by Lingua et Machina™) - Anchovy (by Swordfish) - Araya Term Extractor - Analysis software: Sketch Engine (terminology extraction from TMX) BILINGUAL
  • 19. 19BATE UNDER EVALUATION Sketch Engine SIMILIS Multiterm Extract Synchroterm
  • 20. 20 Multiterm Extract SynchroTerm Similis SkE Araya Import TMX Extraction config. Extraction scores Validation facility Term base indexation Export to TBX (xls, txt…) Trados TMX MAIN FEATURES Others Others
  • 21. 21 TERMS NO TERMS EXTRACTED NON-EXTRACTED A B C D RECALL = 𝐴𝐴 𝐴𝐴+𝐵𝐵 PRECISION = 𝐴𝐴 𝐴𝐴+𝐶𝐶 MEASURES FOR ACCURACY
  • 22. Context coverage degree to which the product understands the complete context of its usage. Flexibility Effectiveness accuracy and completeness with which user achieves objectives Satisfaction Efficiency resources expended in relation to the accuracy and completeness Freedom from risk no risk for the security of users, software, context or the environment degree to which user needs are satisfied when a software is used in a specified context of use QUALITY IN USE MODEL Characteristics (ISO-IEC 25010: 2011)
  • 23. 23 Setting up the extraction project CONFIGURATION Importing the source file TMX IMPORT Performing the extraction to get a bilingual list EXTRACTION Selecting the real terms. VALIDATION Creating and managing term entries RECORD CREATION Exporting the final result for later use in CAT Systems EXPORTATION 6 TASKS TO EVALUATE when performing bilingual extraction
  • 25. 25 28,30 43,33 10,66 14,85 21,29 62,33 45,42 51,61 0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 PRECISION RECALL PRECISION & RECALL IN % Sketch MTE Synchr Similis EXTRACTED NON-EXTRACTED TERMS NO TERMS TERMS NO TERMS GOLD STANDARD TCs PRECISION RECALL A C B D Sketch 283 717 370 653 1,000 28,30 43,34 MTE 97 813 556 910 10,66 14,85 SynchroT. 407 1505 246 1,912 21,29 62,33 Similis 337 405 316 742 45,42 51,61
  • 26. 26 Characteristics and sub-characteristics to be measured METRICS EFFECTIVENESS Value between 0 (minimum) and 5 (maximum) (EFE1+EFE2+EFE3)/3 EFE1.- Degree of accuracy – precision of tasks & results (P1+P7+P13+P19+P25+P31)/6 EFE2.- Degree of completeness (tasks are accomplished and results are not missing) (P2+P8+P14+P20+P26+P32)/6 EFE3.- Frequency of errors (P3+P9+P15+P21+P27+P33)/6 EFFICIENCY Value between 0 (minimum) and 5 (maximum) (EFI2+EFI3+EFI4)/3 EFI1.- Time spent in the accomplishment of the task. (TM1+TM2+TM3+TM4+TM5+TM6) EFI2.- Need to use additional sources (material, software, etc.) for the task (P4+P10+P16+P22+P28+P34)/6 EFI3.- Productivity – effort exerted by the user to carry out the task (P5+P11+P17+P23+P29+P35)/6 EFI4.- Need to consult the software Help to perform the task (P6+P12+P18+P24+P30+P36)/6 SATISFACTION Value between 0 (minimum) and 5 (maximum) (P37+P38+P39)/3 SAT1.- Usefulness SAT2.- Trust SAT3.- Pleasure CONTEXT COVERAGE Value between 0 (minimum) and 5 (maximum) (P40+P41+P42)/3COB1.- Context of use COB2.- Flexibility PARAMETERS
  • 28. 28 16 13 14 25 20 26 21 24 0 5 10 15 20 25 30 EXTRACTION VALIDATION RESULTS FOR EXTRACTION & VALIDATION Sketch MTE Synchr Similis 3,33 3,00 4,00 3,50 13,83 4,06 4,44 3,00 1,50 13,00 4,11 4,22 4,33 3,00 15,67 3,72 3,11 3,00 3,00 12,83 0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00 18,00 EFFECTIVENESS EFFICIENCY SATISFACTION CONTEXT COVERAGE TOTAL QIU FINAL RESULTS FOR QUALITY IN USE Sketch MTE Synchr Similis
  • 29. 29 CONCLUSIONS • Managing terminology still takes a lot of time and effort, even in this increasingly computerized profession. • Research on automatic terminology extraction has been around for more than 20 years and significant enhancements concerning bilingual extraction and bilingual corpora exploitation have been introduced. • I briefly described the BATE under evaluation and illustrated some results obtained for accuracy and with the QIU model. • Results make it clear that much more work has to be done for BATE to be considered of real help to translators and terminologists, mainly due to poor accuracy results.
  • 30. Some references • Baisa, Vit, Barbora Ulipová, and Michal Cukr. 2015. “Bilingual Terminology Extraction in Sketch Engine.” In 9th Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2015), 61–67. • Childress, Mark D. 2007. “Terminology Work Saves More Time than It Cost.” Multilingual, no. April/May: 43–46. • Foo, Jody. 2012. Computational Terminology : Exploring Bilingual and Monolingual Term Extraction. • Foo, Jody; Merkel, Magnus. 2010. “Computer Aided Term Bank Creation and Standardization. Building Stardardize Term Banks through Automated Term Extraction and Advanced Editing Tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Fireda Steurs, 163–80. John Benjamins Publishing Company. doi: 10.1075/tlrp.13.12foo. • Kovář, Vojtěch, Vít Baisa, and Miloš Jakubíček. 2016. “Sketch Engine for Bilingual Lexicography.” International Journal of Lexicography 29 (3): 339–52. doi:10.1093/ijl/ecw029. • Macken, Lieve, Els Lefever, and Veronique Hoste. 2013. “TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-Based Alignment.” Terminology 19 (2013): 1–30. doi:10.1075/term.19.1.01mac. • Oliver, Antoni, and M. Vazquez. 2015. “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction.” In Proceedings of Recent Advances in Natural Language Processing, 473–79. • Popiolek, Monika. 2015. “Terminology Management within a Translation Quality Assurance Process.” In Handbook of Terminology (Volume 1), edited by Hendrik J Kockaert and Frieda Steurs, 341–59. John Benjamins Publishing Company. doi:10.1075/hot.1.ter6. • Sauron, Véronique. 2002. “Tearing out the Terms : Evaluating Terms Extractors.” In Translating and the Computer 24: Proceedings from the Aslib Conference, 21-22 November 2002. • Vintar, Špela. 2010. “Bilingual Term Recognition revisited<BR> The Bag-of-Equivalents Term Alignment Approach and Its Evaluation.” Terminology 16 (2010): 141–58. doi:10.1075/term.16.2.01vin.
  • 31. University of Alicante IULMA Campus de San Vicente Apdo. 99 03080 Alicante Phone & Fax Direct Line: +34 965903438 Fax: +34 965903800 chelo.vargas@ua.es Social Media @chelovargas Many thanks for your attention Chelo Vargas-Sierra