SlideShare a Scribd company logo
1 of 25
Download to read offline
Query Translation for Cross-lingual Search in
the Academic Search Engine PubPsych
Cristina España-Bonet1
, Juliane Stiller2
, Roland Ramthun3
, Josef van Genabith1
and Vivien Petras2
1
Universität des Saarlandes & DFKI, Saarbrücken, Germany
2
Berlin School of Library and Information Science, Humboldt-Universität zu Berlin, Germany
3
Leibniz Institute for Psychology Information (ZPID), Trier, Germany
1
Table of contents
1. Motivation: CluBS - Cross-lingual Bibliographic Search
2. Queries & Languages in PubPsych
3. Query Translation
4. Evaluation
5. Results and Future Work
2
Motivation: CluBS - Cross-lingual
Bibliographic Search
Motivation of the CLuBS project
• academic discourse happens in several languages
• language barrier ⇒ researchers might not understand or even
find relevant articles
• basing research solely on results published in dominant
languages such as English bears the risk of drawing conclusions
on sub-populations [3]
3
Goals of the CLuBS project
• aims to improve multilingual access to academic bibliographic
information
• develops, implements & evaluates four different approaches to
enable Cross-lingual Information Retrieval (CLIR)
• prototypical domain: psychology with PubPsych search engine
(https://pubpsych.eu)
4
PubPsych
• database of psychological literature, treatments, test and
research data
• aggregates bibliographic metadata from nine databases
produced by several international partners, e.g. MEDLINE,
PSYNDEX and NORART
• metadata mainly in English, German, French and Spanish
• very uneven distribution due to different indexing practices
• 20% of all content has no English metadata; 5% of content
retrievable with Spanish
• ⇒ results on a topic vary depending on query language
5
Approaches for CLIR
• translation of content (metadata in this case)
6
Approaches for CLIR
• translation of queries
• research question: can PubPsych queries be translated into the
four target languages by mapping them to purpose-built lexical
resources?
7
Queries & Languages in PubPsych
Query languages
8
Query types and distribution in different domains
Psychology
Comp. Sci [4]
General
Academic [5]
Library [1]
Web
Search [2]
0
50
100
Percentage
Informational
Navigational
Transactional
88.4% of queries in PubPsych are informational, so we chose an
approach of mapping lexical resources to translate them 9
Query Translation
Creation of quadrilingual lexicon
QuadLex: aligned dictionary in English, French, German and Spanish
from four sources
German English French Spanish
MeSH 70,694 175,004 96,333 66,828
WP (titles/categories) (81,369/38,038)
Apertium 7,792 5,935 6,020 5,846
Manual 4,262 4,142 4,047 4,081
Total unique (Lex) 202,128 304,277 225,607 195,937
10
Flowchart for query term translation
INPUT: Parsed term
action potentials
in quad-
lexicon?
Extract entry
action potentials ||| es:potenciales de
acción ||| de:aktionspotentiale |||
fr:potentiels d’action
OUTPUT: Quad-term
Split by tokens
Token alternatives
in quad-
lexicon?
Extract entry
more
tokens?
Recompose full entry
OUTPUT: Quad-term
Singular form
in quad-
lexicon?
Copy token
yes
no
yes
yes
no
no
no
11
Evaluation
Coverage of MeSH & quadrilingual lexicon
How many terms and tokens out of the 536,479 queries can we
translate?
• whole terms with MeSH lexicon: 7.7%
• whole terms with Quadrilingual Lexicon: 14.9 %
• token level with MeSH lexicon: 64.2%
• token level with Quadrilingual Lexicon: 85.0 %
12
Translation quality
Corpora of 500 queries manually rated by 3 annotators:
• 100 queries in each language (en, de, fr, es)
• 100 queries without a definite language identification (many
named entities)
Evaluation according to adequacy:
• no gold translation existed
• How much of the meaning in the source query was expressed in
the translation?
• use of a three-point scale:
0 none of the meaning was transferred
1 part of the meaning was transferred
2 all meaning was transferred
13
Inter-Annotator Agreement
Fleiss’ kappa of the three raters for different language pairs:
source 2de 2en 2fr 2es
de n/a 0.616 0.658 0.598
en 0.442 n/a 0.455 0.521
fr 0.243 0.268 n/a 0.384
es 0.422 0.354 0.472 n/a
none 0.494 0.458 0.513 0.440
disagreement example
• source: ”unfinished task” -> DE: ”unfinished aufgabe”
14
Results and Future Work
Results
• average adequacy of 1.4 most of the queries had at least
some of their terms properly translated
• 54%±20% of the queries had the maximum adequacy score
when looking at the mean over languages,
• only 14%±8% of the queries got completely incorrect
translations;
• remaining 33%±15% were partially well translated.
15
Results
Languages were quite similar with two clear exceptions:
• translation of German queries had a lower quality (mean
adequacy 1.1) mainly because the compound nature of German
increases the number of untranslated tokens with respect to the
other languages, and
• queries with undetermined language had a very high adequacy
(1.8) because they are shorter and, in most of the cases, leaving
the source token untranslated resulted in a good translation.
16
Future work
• increase coverage through multilingual word embeddings
• improve approach by removing systematic errors
• improve translation for German (due to its compound nature)
• next to translation quality we will evaluate the impact on
retrieval
17
Questions?
Cristina España i Bonet <cristinae@dfki.de>
Juliane Stiller <juliane.stiller@ibi.hu-berlin.de>
Roland Ramthun <rr@leibniz-psychology.org>
Josef van Genabith <josef.van_genabith@dfki.de>
Vivien Petras <vivien.petras@ibi.hu-berlin.de>
https://www.clubs-project.eu/en/
Acknowledgements
This research was supported by the Leibniz-Gemeinschaft under grant
SAW-2016-ZPID-2.
18
References i
C. Behnert.
Evaluation Methods within the LibRank Project.
Working Paper, LibRank, 2016.
A. Broder.
A taxonomy of web search.
In ACM Sigir forum, volume 36, pages 3–10. ACM, 2002.
J. Henrich, S. J. Heine, and A. Norenzayan.
Most people are not WEIRD.
Nature, 466(7302):29–29, jul 2010.
M. Khabsa, Z. Wu, and C. L. Giles.
Towards better understanding of academic search.
In Joint Conference on Digital Library (JCDL), pages 111–114. ACM,
2016.
19
References ii
X. Li, B. J. Schijvenaars, and M. de Rijke.
Investigating queries and search failures in academic search.
Information Processing and Management, 53(3):666 – 683, 2017.
20

More Related Content

Similar to Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych

Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Juliane Stiller
 
TPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to EnglishTPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to Englishpaula hodgson
 
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...PretaLLOD
 
Tim Goodier: Implementing the new CEFR Companion Volume
Tim Goodier: Implementing the new CEFR Companion VolumeTim Goodier: Implementing the new CEFR Companion Volume
Tim Goodier: Implementing the new CEFR Companion Volumeeaquals
 
Siop model-and-research-findings
Siop model-and-research-findingsSiop model-and-research-findings
Siop model-and-research-findingsrobinsondurango67
 
Siop model-and-research-findings
Siop model-and-research-findingsSiop model-and-research-findings
Siop model-and-research-findingsrobinsondurango67
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
Finding the gaps
Finding the gapsFinding the gaps
Finding the gapsMairLloyd
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...Digital Classicist Seminar Berlin
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfDr.Badriya Al Mamari
 
EDRD 6000 - Language issues in qualitative research shiyuan zhou
EDRD 6000 - Language issues in qualitative research   shiyuan zhouEDRD 6000 - Language issues in qualitative research   shiyuan zhou
EDRD 6000 - Language issues in qualitative research shiyuan zhouEmma Shiyuan Zhou
 
Tracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentTracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentCALPER
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Presentation of Adaptive Software at CLIL 2010 Conference
Presentation of Adaptive Software at CLIL 2010 ConferencePresentation of Adaptive Software at CLIL 2010 Conference
Presentation of Adaptive Software at CLIL 2010 ConferenceTon Koenraad
 

Similar to Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych (20)

Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)
 
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter ToolboxNCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
 
TPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to EnglishTPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to English
 
#Alfa14
#Alfa14#Alfa14
#Alfa14
 
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...
ELSE IF 2019: Multilingual Text Analytics for Extracting Pharma Real-World Ev...
 
Tim Goodier: Implementing the new CEFR Companion Volume
Tim Goodier: Implementing the new CEFR Companion VolumeTim Goodier: Implementing the new CEFR Companion Volume
Tim Goodier: Implementing the new CEFR Companion Volume
 
Translation & Localization of E-learning Courses How to Get Started
Translation & Localization of E-learning Courses How to Get StartedTranslation & Localization of E-learning Courses How to Get Started
Translation & Localization of E-learning Courses How to Get Started
 
ASE Poster
ASE PosterASE Poster
ASE Poster
 
Siop model-and-research-findings
Siop model-and-research-findingsSiop model-and-research-findings
Siop model-and-research-findings
 
Siop model-and-research-findings
Siop model-and-research-findingsSiop model-and-research-findings
Siop model-and-research-findings
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
Finding the gaps
Finding the gapsFinding the gaps
Finding the gaps
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 
EDRD 6000 - Language issues in qualitative research shiyuan zhou
EDRD 6000 - Language issues in qualitative research   shiyuan zhouEDRD 6000 - Language issues in qualitative research   shiyuan zhou
EDRD 6000 - Language issues in qualitative research shiyuan zhou
 
Tracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language DevelopmentTracking Learning: Using Corpus Linguistics to Assess Language Development
Tracking Learning: Using Corpus Linguistics to Assess Language Development
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Presentation of Adaptive Software at CLIL 2010 Conference
Presentation of Adaptive Software at CLIL 2010 ConferencePresentation of Adaptive Software at CLIL 2010 Conference
Presentation of Adaptive Software at CLIL 2010 Conference
 
Lecture 1 dr isabel tayao
Lecture 1 dr isabel tayaoLecture 1 dr isabel tayao
Lecture 1 dr isabel tayao
 

More from Juliane Stiller

KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieKOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieJuliane Stiller
 
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichKOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichJuliane Stiller
 
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Juliane Stiller
 
Berlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchBerlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchJuliane Stiller
 
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Juliane Stiller
 
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheZur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheJuliane Stiller
 
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Juliane Stiller
 
The Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesThe Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesJuliane Stiller
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityJuliane Stiller
 
Have You Hired a Refugee? - Hiring Success 2018 Europe
 Have You Hired a Refugee? - Hiring Success 2018 Europe  Have You Hired a Refugee? - Hiring Success 2018 Europe
Have You Hired a Refugee? - Hiring Success 2018 Europe Juliane Stiller
 
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Juliane Stiller
 
Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Juliane Stiller
 
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaA Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaJuliane Stiller
 
Data Quality Assessment in Europeana: Metrics for Multilinguality
Data Quality Assessment in Europeana:  Metrics for MultilingualityData Quality Assessment in Europeana:  Metrics for Multilinguality
Data Quality Assessment in Europeana: Metrics for MultilingualityJuliane Stiller
 

More from Juliane Stiller (14)

KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieKOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
 
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichKOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
 
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
 
Berlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchBerlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open Research
 
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
 
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheZur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
 
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
 
The Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesThe Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of Refugees
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
 
Have You Hired a Refugee? - Hiring Success 2018 Europe
 Have You Hired a Refugee? - Hiring Success 2018 Europe  Have You Hired a Refugee? - Hiring Success 2018 Europe
Have You Hired a Refugee? - Hiring Success 2018 Europe
 
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
 
Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03
 
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaA Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
 
Data Quality Assessment in Europeana: Metrics for Multilinguality
Data Quality Assessment in Europeana:  Metrics for MultilingualityData Quality Assessment in Europeana:  Metrics for Multilinguality
Data Quality Assessment in Europeana: Metrics for Multilinguality
 

Recently uploaded

Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureSérgio Sacani
 
GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)Areesha Ahmad
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyanmuralinath2
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPirithiRaju
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Sérgio Sacani
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinathmuralinath2
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsSérgio Sacani
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthSérgio Sacani
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdfDebdattaGhosh6
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversitySteffi Friedrichs
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Sérgio Sacani
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptxCherry
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent Universitypablovgd
 
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of CarbohydratesGBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of CarbohydratesAreesha Ahmad
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfRevenJadePalma
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surfaceSérgio Sacani
 

Recently uploaded (20)

Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)GBSN - Microbiology Lab (Compound Microscope)
GBSN - Microbiology Lab (Compound Microscope)
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Hemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. MuralinathHemoglobin metabolism: C Kalyan & E. Muralinath
Hemoglobin metabolism: C Kalyan & E. Muralinath
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of CarbohydratesGBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 

Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych

  • 1. Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych Cristina España-Bonet1 , Juliane Stiller2 , Roland Ramthun3 , Josef van Genabith1 and Vivien Petras2 1 Universität des Saarlandes & DFKI, Saarbrücken, Germany 2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin, Germany 3 Leibniz Institute for Psychology Information (ZPID), Trier, Germany 1
  • 2. Table of contents 1. Motivation: CluBS - Cross-lingual Bibliographic Search 2. Queries & Languages in PubPsych 3. Query Translation 4. Evaluation 5. Results and Future Work 2
  • 3. Motivation: CluBS - Cross-lingual Bibliographic Search
  • 4. Motivation of the CLuBS project • academic discourse happens in several languages • language barrier ⇒ researchers might not understand or even find relevant articles • basing research solely on results published in dominant languages such as English bears the risk of drawing conclusions on sub-populations [3] 3
  • 5. Goals of the CLuBS project • aims to improve multilingual access to academic bibliographic information • develops, implements & evaluates four different approaches to enable Cross-lingual Information Retrieval (CLIR) • prototypical domain: psychology with PubPsych search engine (https://pubpsych.eu) 4
  • 6. PubPsych • database of psychological literature, treatments, test and research data • aggregates bibliographic metadata from nine databases produced by several international partners, e.g. MEDLINE, PSYNDEX and NORART • metadata mainly in English, German, French and Spanish • very uneven distribution due to different indexing practices • 20% of all content has no English metadata; 5% of content retrievable with Spanish • ⇒ results on a topic vary depending on query language 5
  • 7. Approaches for CLIR • translation of content (metadata in this case) 6
  • 8. Approaches for CLIR • translation of queries • research question: can PubPsych queries be translated into the four target languages by mapping them to purpose-built lexical resources? 7
  • 9. Queries & Languages in PubPsych
  • 11. Query types and distribution in different domains Psychology Comp. Sci [4] General Academic [5] Library [1] Web Search [2] 0 50 100 Percentage Informational Navigational Transactional 88.4% of queries in PubPsych are informational, so we chose an approach of mapping lexical resources to translate them 9
  • 13. Creation of quadrilingual lexicon QuadLex: aligned dictionary in English, French, German and Spanish from four sources German English French Spanish MeSH 70,694 175,004 96,333 66,828 WP (titles/categories) (81,369/38,038) Apertium 7,792 5,935 6,020 5,846 Manual 4,262 4,142 4,047 4,081 Total unique (Lex) 202,128 304,277 225,607 195,937 10
  • 14. Flowchart for query term translation INPUT: Parsed term action potentials in quad- lexicon? Extract entry action potentials ||| es:potenciales de acción ||| de:aktionspotentiale ||| fr:potentiels d’action OUTPUT: Quad-term Split by tokens Token alternatives in quad- lexicon? Extract entry more tokens? Recompose full entry OUTPUT: Quad-term Singular form in quad- lexicon? Copy token yes no yes yes no no no 11
  • 16. Coverage of MeSH & quadrilingual lexicon How many terms and tokens out of the 536,479 queries can we translate? • whole terms with MeSH lexicon: 7.7% • whole terms with Quadrilingual Lexicon: 14.9 % • token level with MeSH lexicon: 64.2% • token level with Quadrilingual Lexicon: 85.0 % 12
  • 17. Translation quality Corpora of 500 queries manually rated by 3 annotators: • 100 queries in each language (en, de, fr, es) • 100 queries without a definite language identification (many named entities) Evaluation according to adequacy: • no gold translation existed • How much of the meaning in the source query was expressed in the translation? • use of a three-point scale: 0 none of the meaning was transferred 1 part of the meaning was transferred 2 all meaning was transferred 13
  • 18. Inter-Annotator Agreement Fleiss’ kappa of the three raters for different language pairs: source 2de 2en 2fr 2es de n/a 0.616 0.658 0.598 en 0.442 n/a 0.455 0.521 fr 0.243 0.268 n/a 0.384 es 0.422 0.354 0.472 n/a none 0.494 0.458 0.513 0.440 disagreement example • source: ”unfinished task” -> DE: ”unfinished aufgabe” 14
  • 20. Results • average adequacy of 1.4 most of the queries had at least some of their terms properly translated • 54%±20% of the queries had the maximum adequacy score when looking at the mean over languages, • only 14%±8% of the queries got completely incorrect translations; • remaining 33%±15% were partially well translated. 15
  • 21. Results Languages were quite similar with two clear exceptions: • translation of German queries had a lower quality (mean adequacy 1.1) mainly because the compound nature of German increases the number of untranslated tokens with respect to the other languages, and • queries with undetermined language had a very high adequacy (1.8) because they are shorter and, in most of the cases, leaving the source token untranslated resulted in a good translation. 16
  • 22. Future work • increase coverage through multilingual word embeddings • improve approach by removing systematic errors • improve translation for German (due to its compound nature) • next to translation quality we will evaluate the impact on retrieval 17
  • 23. Questions? Cristina España i Bonet <cristinae@dfki.de> Juliane Stiller <juliane.stiller@ibi.hu-berlin.de> Roland Ramthun <rr@leibniz-psychology.org> Josef van Genabith <josef.van_genabith@dfki.de> Vivien Petras <vivien.petras@ibi.hu-berlin.de> https://www.clubs-project.eu/en/ Acknowledgements This research was supported by the Leibniz-Gemeinschaft under grant SAW-2016-ZPID-2. 18
  • 24. References i C. Behnert. Evaluation Methods within the LibRank Project. Working Paper, LibRank, 2016. A. Broder. A taxonomy of web search. In ACM Sigir forum, volume 36, pages 3–10. ACM, 2002. J. Henrich, S. J. Heine, and A. Norenzayan. Most people are not WEIRD. Nature, 466(7302):29–29, jul 2010. M. Khabsa, Z. Wu, and C. L. Giles. Towards better understanding of academic search. In Joint Conference on Digital Library (JCDL), pages 111–114. ACM, 2016. 19
  • 25. References ii X. Li, B. J. Schijvenaars, and M. de Rijke. Investigating queries and search failures in academic search. Information Processing and Management, 53(3):666 – 683, 2017. 20