SlideShare a Scribd company logo
RUSSIAN LEARNER TRANSLATOR CORPUS:
design, research potential and
applications
Andrey Kutuzov
National Research University Higher School of Economics
Maria Kunilovskaya
Tyumen State University
17th International Conference on Text, Speech and Dialogue
Brno, Czech Republic, September 8–12 2014
General description
• inspired by MeLLANGE
• online and downloadable http://rus-ltc.org
• 1.3 mln tokens
• translations from 10 universities
• 11 source text genres (inc. essays, educational,
informational)
• multiple: 263 sources, 1952 translations
• bi-directional:
approx. 200 English ST(≈300K tokens) with their 1300
Russian translations (≈700 thousand tokens), and
over 40 Russian ST and approx. 600 English translations
• 10 types of linguistic and extralinguistic meta data
• Lexical and POS query interface (Freeling-based linguistic
mark-up) RusLTC at TSD-2014 2
Corpus design
1) Txt-archive structured by file-naming conventions
RU_1_23.txt and EN_1_23_9.txt
RU_1_23.head.txt and EN_1_23_9.head.txt
2) TMX file
• pair-wise alignment with LF aligner batch mode
• manual correction (Olifant /Heartsome tmx-editors)
• merging TUVs with identical source segments + adding XML tags to
link segments to head files (a homegrown script)
3) Error-tagged subcorpus
• a collection of 265 annotated translations (for 33 sources);
• stand-off machine readable annotation
• pre-defined error classification
• 6,471 error tags
• online tag-editor based of brat http://brat.nlplab.org/index.html
RusLTC at TSD-2014 3
Query interface
RusLTC at TSD-2014 4
BRAT-based online error tag editor
RusLTC at TSD-2014 5
Application and Research
RusLTC is a general purpose data source for translation studies
and translation education research, inc. study of
1. variation and choice in translation;
2. ’translationese’ and the translator interlanguage;
3. interdependence between the translation characteristics
and various meta data (direction and conditions of
translation, source text genre);
4. translation-related “problem areas” or rich points in source
texts;
5. translation quality and translation quality assessment (TQA)
Direct use
• in the curriculum and materials design
• as a teaching and learning aid.
RusLTC at TSD-2014 6
RusLTC research: gender asymmetry
in translated texts
1) The same gender asymmetry in male and
female translations as in Russian original
(based on lexical variety)
2) Sentence length figures for female
translations contradict similar statistics for
originals
RusLTC at TSD-2014 7
Research based on RusLTC: splitting in
EN-RU translation
1) types of syntactic structures that undergo
splitting in English-Russian translation:
– coordination with “, and”
– non-restrictive relative clauses
2) most frequent mistakes associated with splitting:
– loss or misinterpretation of semantic relations
between propositions,
– issues with anaphora resolution and
– greater communicative value acquired by upgraded
sentences.
RusLTC at TSD-2014 8
Error-tagged part: inter-rater reliability
AIM: to gauge reliability of mark-up results based on
error classification proposed and establish the areas of
disagreement
RusLTC at TSD-2014 9
23
38
112
130
30
114
30
30
112
130
38
93
α=0.734 versus α=0.569
Error statistics analysis to inform translation
didactics
Hypothesis 1: The better one knows L1 the better she
understands the source/the better the transfer skills.
Hypothesis 2: Final year students make less mistakes than 4th
year students
Hypothesis 3: Test translations show better results than routine
translations because students are more motivated to
perform better
Hypothesis 4: The quantitative results of the error annotation
depend on the order of translations in the set (“order
effect”)
RusLTC at TSD-2014 10
Use in the classroom
1) Students have online access to:
• their own error-tagged and commented translations;
• peer translations;
• mistakes statistics which reflects their individual
progress and difficulties.
RusLTC at TSD-2014 11
2) Students’ rating based on the
quality of final translation
RusLTC at TSD-2014 12
Quality parameters used for consecutive ranking to arrive at relative evaluation:
1. number of critical errors,
2. number of content errors and
3. total number of mistakes.
3) Follow students’ individual
progress over the year
(based on the total number of mistakes normalized by the text
size)
RusLTC at TSD-2014 13
4) Think of remedial activities
RusLTC at TSD-2014 14
The top ten mistakes in the sample
1) Theory-based exercises utilizing multiple
concordances
• discussing translation strategies, identifying translation problems
and comparing/evaluating solutions
• developing skills to overcome known transfer issues in English-
Russian translation which are due to interlingual typological
differences
2) Corpus-driven exercises to prevent most
common mistakes
• developing L1 competence through building up corpus-querying
and documentary research skills;
• extending the scope of world knowledge through information
search and developing text analysis and text comprehension
aptitude.
5) Design materials and teaching aids
RusLTC at TSD-2014 15
Summary
1) Russian Learner Translator Corpus is an available and
extensive source of data for translation studies and
translator education research (http://www.rus-ltc.org/);
2) The error-tagged subcorpus (http://dev.rus-
ltc.org/brat/#/rusltc/) is a method to provide students
extensive feedback on their translations
3) and a means of accumulating research data on TQA;
4) RusLTC content is used in designing teaching materials.
Thank you!
RusLTC at TSD-2014 16

More Related Content

What's hot

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+design
Munazza-Mah-Jabeen
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentation
Victor de Boer
 
Typology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineTypology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineDorothea Hoffmann
 
Anti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositoriesAnti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositories
Jan Mach
 
What is applied linguistics
What is applied linguisticsWhat is applied linguistics
What is applied linguistics
Claudiapastrana
 
CV
CVCV
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Hang Dong
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
Ashis Kumar Chanda
 
Research 10. how to write research methodology
Research 10. how to write research methodologyResearch 10. how to write research methodology
Research 10. how to write research methodology
University of Education, Lahore
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Sebastian Ruder
 
Assigned Task- Revised
Assigned Task- RevisedAssigned Task- Revised
Assigned Task- Revisedsyidajaafar
 
Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008
Brian Croxall
 
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic IssuesOpen Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
jpane
 
CMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; KurekCMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; Kurek
CmcTchrEdSIG
 
sw owl
 sw owl sw owl
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
Francisco Manuel Rangel Pardo
 
Collaborating to motivate second language
Collaborating to motivate second languageCollaborating to motivate second language
Collaborating to motivate second language
faridnazman
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the Web
Jorge Gracia
 
Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 

What's hot (20)

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+design
 
Rianne Nieland's final presentation
Rianne Nieland's final presentationRianne Nieland's final presentation
Rianne Nieland's final presentation
 
Typology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_onlineTypology_Course Syllabus_2014_DH_online
Typology_Course Syllabus_2014_DH_online
 
Anti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositoriesAnti-plagiarism tools for our repositories
Anti-plagiarism tools for our repositories
 
What is applied linguistics
What is applied linguisticsWhat is applied linguistics
What is applied linguistics
 
CV
CVCV
CV
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Research 10. how to write research methodology
Research 10. how to write research methodologyResearch 10. how to write research methodology
Research 10. how to write research methodology
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Assigned Task- Revised
Assigned Task- RevisedAssigned Task- Revised
Assigned Task- Revised
 
Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008Introducing Microblogging at MLA 2008
Introducing Microblogging at MLA 2008
 
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic IssuesOpen Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
Open Government Data Tutorial at CLEI 2013. Part 5 Semantic Issues
 
CMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; KurekCMC Teacher Education SIG Presentation; Kurek
CMC Teacher Education SIG Presentation; Kurek
 
sw owl
 sw owl sw owl
sw owl
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
Collaborating to motivate second language
Collaborating to motivate second languageCollaborating to motivate second language
Collaborating to motivate second language
 
Enabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the WebEnabling Language Resources to Expose Translations as Linked Data on the Web
Enabling Language Resources to Expose Translations as Linked Data on the Web
 
Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)Melissa Barton Editor Resume (2)
Melissa Barton Editor Resume (2)
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 

Similar to RusLTC at TSD-2014 (Brno)

Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
SyedNadeemAbbas6
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Dr.Badriya Al Mamari
 
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
Daphne Smith
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
Pascual Pérez-Paredes
 
“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century
DirectinterNetLocator.Com
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, Kazorin
Lidia Pivovarova
 
Researching Multilingually in Higher Education: Opportunities and Challenges
Researching Multilingually in Higher Education:  Opportunities and ChallengesResearching Multilingually in Higher Education:  Opportunities and Challenges
Researching Multilingually in Higher Education: Opportunities and Challenges
RMBorders
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY mimisy
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
King Saud University
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunity
Lawrie Hunter
 
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docxDirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
cuddietheresa
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsCALPER
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
ariadnenetwork
 
Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
Ioannis Stavrakantonakis
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicographysyila239
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
eSAT Publishing House
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian language
ICDEcCnferenece
 

Similar to RusLTC at TSD-2014 (Brno) (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...Academic Literacy  The Analysis Of First-Year Ukrainian University Students  ...
Academic Literacy The Analysis Of First-Year Ukrainian University Students ...
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century“On the path to crystal mountain” a trek and elt in the 21st century
“On the path to crystal mountain” a trek and elt in the 21st century
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, Kazorin
 
Lexicography
 Lexicography Lexicography
Lexicography
 
Researching Multilingually in Higher Education: Opportunities and Challenges
Researching Multilingually in Higher Education:  Opportunities and ChallengesResearching Multilingually in Higher Education:  Opportunities and Challenges
Researching Multilingually in Higher Education: Opportunities and Challenges
 
Lexicography
 Lexicography Lexicography
Lexicography
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunity
 
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docxDirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
DirectionsLength ~3-4 typed, double-spaced pages (approx. 750-1.docx
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian language
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

RusLTC at TSD-2014 (Brno)

  • 1. RUSSIAN LEARNER TRANSLATOR CORPUS: design, research potential and applications Andrey Kutuzov National Research University Higher School of Economics Maria Kunilovskaya Tyumen State University 17th International Conference on Text, Speech and Dialogue Brno, Czech Republic, September 8–12 2014
  • 2. General description • inspired by MeLLANGE • online and downloadable http://rus-ltc.org • 1.3 mln tokens • translations from 10 universities • 11 source text genres (inc. essays, educational, informational) • multiple: 263 sources, 1952 translations • bi-directional: approx. 200 English ST(≈300K tokens) with their 1300 Russian translations (≈700 thousand tokens), and over 40 Russian ST and approx. 600 English translations • 10 types of linguistic and extralinguistic meta data • Lexical and POS query interface (Freeling-based linguistic mark-up) RusLTC at TSD-2014 2
  • 3. Corpus design 1) Txt-archive structured by file-naming conventions RU_1_23.txt and EN_1_23_9.txt RU_1_23.head.txt and EN_1_23_9.head.txt 2) TMX file • pair-wise alignment with LF aligner batch mode • manual correction (Olifant /Heartsome tmx-editors) • merging TUVs with identical source segments + adding XML tags to link segments to head files (a homegrown script) 3) Error-tagged subcorpus • a collection of 265 annotated translations (for 33 sources); • stand-off machine readable annotation • pre-defined error classification • 6,471 error tags • online tag-editor based of brat http://brat.nlplab.org/index.html RusLTC at TSD-2014 3
  • 5. BRAT-based online error tag editor RusLTC at TSD-2014 5
  • 6. Application and Research RusLTC is a general purpose data source for translation studies and translation education research, inc. study of 1. variation and choice in translation; 2. ’translationese’ and the translator interlanguage; 3. interdependence between the translation characteristics and various meta data (direction and conditions of translation, source text genre); 4. translation-related “problem areas” or rich points in source texts; 5. translation quality and translation quality assessment (TQA) Direct use • in the curriculum and materials design • as a teaching and learning aid. RusLTC at TSD-2014 6
  • 7. RusLTC research: gender asymmetry in translated texts 1) The same gender asymmetry in male and female translations as in Russian original (based on lexical variety) 2) Sentence length figures for female translations contradict similar statistics for originals RusLTC at TSD-2014 7
  • 8. Research based on RusLTC: splitting in EN-RU translation 1) types of syntactic structures that undergo splitting in English-Russian translation: – coordination with “, and” – non-restrictive relative clauses 2) most frequent mistakes associated with splitting: – loss or misinterpretation of semantic relations between propositions, – issues with anaphora resolution and – greater communicative value acquired by upgraded sentences. RusLTC at TSD-2014 8
  • 9. Error-tagged part: inter-rater reliability AIM: to gauge reliability of mark-up results based on error classification proposed and establish the areas of disagreement RusLTC at TSD-2014 9 23 38 112 130 30 114 30 30 112 130 38 93 α=0.734 versus α=0.569
  • 10. Error statistics analysis to inform translation didactics Hypothesis 1: The better one knows L1 the better she understands the source/the better the transfer skills. Hypothesis 2: Final year students make less mistakes than 4th year students Hypothesis 3: Test translations show better results than routine translations because students are more motivated to perform better Hypothesis 4: The quantitative results of the error annotation depend on the order of translations in the set (“order effect”) RusLTC at TSD-2014 10
  • 11. Use in the classroom 1) Students have online access to: • their own error-tagged and commented translations; • peer translations; • mistakes statistics which reflects their individual progress and difficulties. RusLTC at TSD-2014 11
  • 12. 2) Students’ rating based on the quality of final translation RusLTC at TSD-2014 12 Quality parameters used for consecutive ranking to arrive at relative evaluation: 1. number of critical errors, 2. number of content errors and 3. total number of mistakes.
  • 13. 3) Follow students’ individual progress over the year (based on the total number of mistakes normalized by the text size) RusLTC at TSD-2014 13
  • 14. 4) Think of remedial activities RusLTC at TSD-2014 14 The top ten mistakes in the sample
  • 15. 1) Theory-based exercises utilizing multiple concordances • discussing translation strategies, identifying translation problems and comparing/evaluating solutions • developing skills to overcome known transfer issues in English- Russian translation which are due to interlingual typological differences 2) Corpus-driven exercises to prevent most common mistakes • developing L1 competence through building up corpus-querying and documentary research skills; • extending the scope of world knowledge through information search and developing text analysis and text comprehension aptitude. 5) Design materials and teaching aids RusLTC at TSD-2014 15
  • 16. Summary 1) Russian Learner Translator Corpus is an available and extensive source of data for translation studies and translator education research (http://www.rus-ltc.org/); 2) The error-tagged subcorpus (http://dev.rus- ltc.org/brat/#/rusltc/) is a method to provide students extensive feedback on their translations 3) and a means of accumulating research data on TQA; 4) RusLTC content is used in designing teaching materials. Thank you! RusLTC at TSD-2014 16