SlideShare a Scribd company logo
1/37
Cross-lingual information retrieval
Shadi Saleh
Institute of Formal and Applied Linguistics
Charles University
saleh@ufal.mff.cuni.cz
27 Nov. 2017
2/37
Information Retrieval Task
Definition
Information retrieval (IR) is finding material (usually doc-
uments) of an unstructured nature (usually text) that satis-
fies an information need from within large collections (usually
stored on computers).
source: trec.nist.gov
3/37
Information Retrieval Task
3/37
Information Retrieval Task
4/37
Information Retrieval Task
Heat-map test (golden triangle) is done by Enquiro, Eyetools, and
Didit with search engine users.
5/37
Monolingual IR system structure
6/37
IR evaluation
IR system returns ranked list of documents (scored by degree
of relevance)
Users are interested in the top k documents
Development:
Set of documents
Set of training/test queries
Metric: P@10, Percent of relevant documents among the
highest 10 retrieved ones
How to judge relevant/irrelevant documents? Assessment
process
7/37
Data & tools
CLEF eHealth 2015 IR task document collection (corpus)
For searching, queries from CLEF eHealth IR tasks
2013–2015, 166 queries in total
Queries were created in 2013 and 2014 by medical experts
In 2015, queries were created to simulate the way laypeople
write queries
Randomly split into 100 queries for training, 66 for test
Relevance assessment is done by medical experts
8/37
Sample query: CLEF 2013
<t o p i c>
<id>qtest4</ id>
< t i t l e>nausea and vomiting and
hematemesis</ t i t l e>
<desc>What are nausea , vomiting and
hematemesis</ desc>
<narr>What i s the connection with nausea ,
vomiting and hematemesis</ narr>
<p r o f i l e>A 64−year old emigrant who i s not
sure what nausea , vomiting and hematemesis
mean in h i s d i s c h a r g e summary</ p r o f i l e>
</ t o p i c>
9/37
Sample queries: CLEF 2015
<t o p i c>
<id>c l e f 2 0 1 5 . t e s t .9</ id>
< t i t l e>red i t c h y eyes</ t i t l e>
</ t o p i c>
<t o p i c>
<id>c l e f 2 0 1 5 . t e s t .16</ id>
< t i t l e>red patchy b r u i s i n g over l e g s</ t i t l e>
</ t o p i c>
<t o p i c>
<id>c l e f 2 0 1 5 . t e s t .44</ id>
< t i t l e>n a i l g e t t i n g dark</ t i t l e>
</ t o p i c>
10/37
Assessment process
10/37
Assessment process
10/37
Assessment process
11/37
Monolingual experiment
Indexing and searching is done using Terrier (an open source
IR system) 1
Set of tuning experiments
P@10: 47.10 (training set) and 50.30 (test set)
1
http://terrier.org
12/37
Cross-lingual IR problem
Definition
Cross Lingual Information Retrieval provides allows a user to
ask a query in native language and then to get the document
in different language.
Czech query
Query: nevolnost a zvracen´ı a hematemeze?
13/37
Cross-lingual IR approaches: query translation
Index
Documents (EN) User poses a query (CS)
Indexer
Top-K Retrieval system
Ranked list of documents
MT system
EN query
Reducing CLIR task into monolingual task
14/37
Cross-lingual data
166 queries in English were translated by native medical
experts into (Czech, French, German, Hungarian, Polish,
Spanish, Swedish)
Task is reduced into Monolingual IR: Same relevance data
15/37
Query translation experiment
Translate queries in all languages into collection language
using online public MT systems:
Google Translate
Bing translator
Sys Czech French German Hungarian Polish Spanish Swedish
Mono 50.30 50.30 50.30 50.30 50.30 50.30 50.30
Google 51.06 49.85 49.55 42.42 43.33 50.61 38.48
Bing 47.88 48.79 46.67 38.79 40.91 50.61 44.70
16/37
Baseline CLIR system
Translate queries into English using SMT systems, developed
by colleagues at UFAL
Trained to translate search queries (medical domain)
Returns list of alternative translation (N-best-list)
Sys Czech French German Hungarian Polish Spanish Swedish
Mono 50.30 50.30 50.30 50.30 50.30 50.30 50.30
Baseline 45.76 47.88 42.58 40.76 36.82 44.09 36.67
Google 51.06 49.85 49.55 42.42 43.33 50.61 38.48
Bing 47.88 48.79 46.67 38.79 40.91 50.61 44.70
17/37
Reranking approach
Motivation
The single best translation that is returned by SMT system
is not selected w.r.t CLIR performance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Czech French German
01020304050
Histograms of ranks of translation hypotheses with the highest P@10 for
each training query
18/37
Reranking approach
Trained to select the best translation for CLIR performance
P@10 as an objective function (predict the translation that
gives the highest P@10)
Index
Documents (EN)
nevolnost a zvracení a hematemeze
Indexer
Top-K Retrieval system
Ranked list of documents
MT system
N-best-list (EN)
Reranker
EN query
19/37
Feature set
SMT scores: Translation model, language model and
reordering models
Rank features: SMT rank and a Boolean feature (1 for best
rank, 0 otherwise)
Features based on Blind relevance feedback
IDF from the collection (inverse document frequency)
Translation pool
Retrieval statue value
Features that are based on external resources (UMLS1,
Wikipedia)
1
The Unified Medical Language System: large, multi-purpose, and
multi-lingual thesaurus that contains millions of biomedical and health related
concepts
20/37
Training
100 queries for training, 15-best-list hypotheses for each query.
Two approaches for training:
Language-Specific: Model for each language
Language-Independent: One model for all languages
Leave-One-Out cross validation
21/37
Reranker testing
Generate vectors of feature values for each query
The trained regression model predicts the hypothesis that
gives that highest P@10
Run retrieval for that hypothesis query string
22/37
Results - test set
Results of the final evaluation on the test set queries
Czech French German
system P@10 P@10 P@10
Mono 50.30 50.30 50.30
Baseline 45.61 47.73 42.42
Reranker 50.15 51.06 45.30
Google 50.91 49.70 49.39
Bing 47.88 48.64 46.52
Improvements: 9 queries in Czech, 15 queries in German, and
8 queries in French
Degradations: 2 cases for Czech, 4 cases for German, and 3
cases for French
23/37
System comparisons
Examples of translations of training queries including reference (ref ), oracle
(ora), baseline (base) and best (best) translations (system Reranker). The
scores in parentheses refer to query P@10 scores.
24/37
Adapting reranker to new languages
25/37
Queries in new languages
New SMT systems (Spanish, Hungarian, Polish and Swedish)
developed recently also within Khresmoi.
Human experts translated original English queries into these
languages, ”under KConnect project”.
We want to develop CLIR system for these languages.
26/37
Adapting reranker
To adapt the reranker, two sources of data used to create training
set:
Merged data from existing languages (Czech, French and
German)
Data from each new language (Spanish, Hungarian, Polish
and Swedish)
The data is used to create language-independent models
27/37
Language-independent model performance
Final evaluation results of language-independent models on the test set
Spanish Hungarian Polish Swedish
system P@10 P@10 P@10 P@10
Mono 50.30 50.30 50.30 47.10
Baseline 44.09 40.76 36.82 36.67
Reranker 46.36 43.18 36.67 38.79
28/37
Document translation
Last years SMT systems improved significantly
All researches regarding DT are quite old!
Reinvestigate the research question if QT is really better than
DT
29/37
Document translation
Queries are posed by users in their language
Translate the English collection into: Czech, French and
German
Create separate index for each language
Perform the retrieval using original query and the relevant
index
Index (CS)
Documents (EN)
User poses a query (CS) Ranked list of documents
MT system
Indexer
Top-K Retrieval system
Documents (CS)
30/37
Morphological processing
Both queries and documents are processed as follows:
Translate into Czech, French and German
Stemming using Snowball stemmer 1
Lemmatizing using Tree Tagger for French and German2 and
MorphoDiTa for Czech3
1
http://snowball.tartarus.org/
2
http://www.cis.uni-muenchen.de/˜schmid/tools/TreeTagger
3
http://ufal.mff.cuni.cz/morphodita
31/37
Results - Document Translation
Results of the final evaluation on the test set queries
Czech French German
system P@10 P@10 P@10
Mono 50.30 50.30 50.30
Baseline 45.61 47.73 42.42
DT 37.42 41.67 36.21
DT Stem 41.67 42.73 36.67
DT Lem 39.39 41.06 33.18
32/37
Query expansion
Users fail sometimes to create query that represents their
information need
Query expansion is the process of adding terms to their query
(also called query reformulation)
Our approach is based on machine learning model
33/37
Query expansion
Algorithm
Get 20-best-list translations for each query
Create a translation pool as bag-of-words from these
translations
Use best translation as an original query
Model can predict a term which will give the highest P@10
when it is added to the original query
Features: IDF, TF (pool), similarity between term and query
(word-embeddings)
Expand the query with one term from the translation pool
Run the retrieval using our baseline setting using the
expanded queries.
Translation pool was limited for some queries, expand it pool from
Wikipedia articles
34/37
Results - test set
Results of the final evaluation on the test set queries
Czech French German
system P@10 P@10 P@10
Mono 50.30 50.30 50.30
Baseline 45.61 47.73 42.42
QE 42.12 46.21 37.88
35/37
Query expansion (QE) improved in average 10 queries over the
baseline system, only 60% coverage, wait to complete the
assessment.
35/37
Query expansion examples
Mono: white patchiness in mouth P@10: 10.00
Base: white coating mouth, P@10: 10.00
Expanded: white coating mouth oral cavity P@10: 70.00
Mono: SOB P@10: 50.00
Base: dyspnoea P@10: 60.00
Expanded: dyspnoea rash breathing dyspnea P@10: 70.00
36/37
Conclusion and future work
Monolingual IR system evaluation and assessment
Cross-lingual IR approaches:
Query translation
Document translation and morphological analysis
Query expansion based on translation pool and Wikipedia
Reranking model to predict, for each query, which translation
hypothesis gives the highest P@10
Contribution to the CLIR community by releasing dataset with
high coverage (doc/query pair)
37/37
Our publications
Shadi Saleh and Pavel Pecina. CUNI at the ShARe/CLEF eHealth Evaluation
Lab 2014. In Working Notes of CLEF 2015 - Conference and Labs of the
Evaluation forum, Sheffield, UK,2014
Shadi Saleh, Feraena Bibyna, Pavel Pecina: CUNI at the CLEF eHealth 2015
Task 2. In: Working Notes of CLEF 2015 - Conference and Labs of the
Evaluation forum, CEUR-WS, Toulouse,France, 2015
Shadi Saleh and Pavel Pecina. Adapting SMT Query Translation Reranker to
New Languages in Cross-Lingual Information Retrieval. In Medical Information
Retrieval (MedIR) Workshop, Association for Computational Linguistics, Pisa,
Italy, 2016
Shadi Saleh and Pavel Pecina. Reranking hypotheses of machine-translated
queries for cross-lingual information retrieval. In Experimental IR Meets
Multilinguality, Multimodality, and Interaction 7th International Conference of
the CLEF Association, CLEF 2016, Evora, Portugal, 2016
Shadi Saleh, Pavel Pecina: Task3 Patient-Centred Information Retrieval: Team
CUNI, CLEF 2016 Working Notes, CEUR-WS, Evora, Portugal, 2016
Shadi Saleh, Pavel Pecina: Task3 Patient-Centred Information Retrieval: Team
CUNI, CLEF 2016 Working Notes, CEUR-WS, Dublin, Ireland, 2017

More Related Content

What's hot

Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
Vaibhav Khanna
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
prashantdahake
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
Rubén Izquierdo Beviá
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
alaa223
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
Alejandro Bellogin
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval System
Ariel Hess
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
Hao-Wen (Herman) Dong
 
Term weighting
Term weightingTerm weighting
Term weighting
Primya Tamil
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Language models
Language modelsLanguage models
Language models
Maryam Khordad
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
vimalsura
 

What's hot (20)

Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
Text summarization
Text summarizationText summarization
Text summarization
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval System
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Text clustering
Text clusteringText clustering
Text clustering
 
Language models
Language modelsLanguage models
Language models
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
 

Similar to Cross-lingual Information Retrieval

Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
Report on the CLEF-IP 2012 Experiments: Search of Topically Organized PatentsReport on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
Mike Salampasis
 
Mt summit2015 jdu_v2
Mt summit2015 jdu_v2Mt summit2015 jdu_v2
Mt summit2015 jdu_v2
Jinhua Du
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
Abhik Roychoudhury
 
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
Nawanan Theera-Ampornpunt
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
NAVER Engineering
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
Aliaksandr Birukou
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-Latvian
Matīss ‎‎‎‎‎‎‎  
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07Ethnograph 11 Jul07
Ethnograph 11 Jul07
Clara Kwan
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
Tetsuya Sakai
 
My experiment
My experimentMy experiment
My experiment
Boshra Albayaty
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
Konstantinos Zagoris
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
Mauro Dragoni
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
ISSEL
 
Course-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course AuthoringCourse-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course Authoring
Peter Brusilovsky
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
Dr. Haxel Consult
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
ivan provalov
 
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Association for Computational Linguistics
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
Waheeb Ahmed
 

Similar to Cross-lingual Information Retrieval (20)

Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
Report on the CLEF-IP 2012 Experiments: Search of Topically Organized PatentsReport on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents
 
Mt summit2015 jdu_v2
Mt summit2015 jdu_v2Mt summit2015 jdu_v2
Mt summit2015 jdu_v2
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
HL7 & HL7 CDA: The Implementation of Thailand's Healthcare Messaging Exchange...
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...Creating a dataset of peer review in computer science conferences published b...
Creating a dataset of peer review in computer science conferences published b...
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-Latvian
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07Ethnograph 11 Jul07
Ethnograph 11 Jul07
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
My experiment
My experimentMy experiment
My experiment
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
Course-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course AuthoringCourse-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course Authoring
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 

Recently uploaded

reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
mbawufebxi
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 

Recently uploaded (20)

reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 

Cross-lingual Information Retrieval

  • 1. 1/37 Cross-lingual information retrieval Shadi Saleh Institute of Formal and Applied Linguistics Charles University saleh@ufal.mff.cuni.cz 27 Nov. 2017
  • 2. 2/37 Information Retrieval Task Definition Information retrieval (IR) is finding material (usually doc- uments) of an unstructured nature (usually text) that satis- fies an information need from within large collections (usually stored on computers). source: trec.nist.gov
  • 5. 4/37 Information Retrieval Task Heat-map test (golden triangle) is done by Enquiro, Eyetools, and Didit with search engine users.
  • 7. 6/37 IR evaluation IR system returns ranked list of documents (scored by degree of relevance) Users are interested in the top k documents Development: Set of documents Set of training/test queries Metric: P@10, Percent of relevant documents among the highest 10 retrieved ones How to judge relevant/irrelevant documents? Assessment process
  • 8. 7/37 Data & tools CLEF eHealth 2015 IR task document collection (corpus) For searching, queries from CLEF eHealth IR tasks 2013–2015, 166 queries in total Queries were created in 2013 and 2014 by medical experts In 2015, queries were created to simulate the way laypeople write queries Randomly split into 100 queries for training, 66 for test Relevance assessment is done by medical experts
  • 9. 8/37 Sample query: CLEF 2013 <t o p i c> <id>qtest4</ id> < t i t l e>nausea and vomiting and hematemesis</ t i t l e> <desc>What are nausea , vomiting and hematemesis</ desc> <narr>What i s the connection with nausea , vomiting and hematemesis</ narr> <p r o f i l e>A 64−year old emigrant who i s not sure what nausea , vomiting and hematemesis mean in h i s d i s c h a r g e summary</ p r o f i l e> </ t o p i c>
  • 10. 9/37 Sample queries: CLEF 2015 <t o p i c> <id>c l e f 2 0 1 5 . t e s t .9</ id> < t i t l e>red i t c h y eyes</ t i t l e> </ t o p i c> <t o p i c> <id>c l e f 2 0 1 5 . t e s t .16</ id> < t i t l e>red patchy b r u i s i n g over l e g s</ t i t l e> </ t o p i c> <t o p i c> <id>c l e f 2 0 1 5 . t e s t .44</ id> < t i t l e>n a i l g e t t i n g dark</ t i t l e> </ t o p i c>
  • 14. 11/37 Monolingual experiment Indexing and searching is done using Terrier (an open source IR system) 1 Set of tuning experiments P@10: 47.10 (training set) and 50.30 (test set) 1 http://terrier.org
  • 15. 12/37 Cross-lingual IR problem Definition Cross Lingual Information Retrieval provides allows a user to ask a query in native language and then to get the document in different language. Czech query Query: nevolnost a zvracen´ı a hematemeze?
  • 16. 13/37 Cross-lingual IR approaches: query translation Index Documents (EN) User poses a query (CS) Indexer Top-K Retrieval system Ranked list of documents MT system EN query Reducing CLIR task into monolingual task
  • 17. 14/37 Cross-lingual data 166 queries in English were translated by native medical experts into (Czech, French, German, Hungarian, Polish, Spanish, Swedish) Task is reduced into Monolingual IR: Same relevance data
  • 18. 15/37 Query translation experiment Translate queries in all languages into collection language using online public MT systems: Google Translate Bing translator Sys Czech French German Hungarian Polish Spanish Swedish Mono 50.30 50.30 50.30 50.30 50.30 50.30 50.30 Google 51.06 49.85 49.55 42.42 43.33 50.61 38.48 Bing 47.88 48.79 46.67 38.79 40.91 50.61 44.70
  • 19. 16/37 Baseline CLIR system Translate queries into English using SMT systems, developed by colleagues at UFAL Trained to translate search queries (medical domain) Returns list of alternative translation (N-best-list) Sys Czech French German Hungarian Polish Spanish Swedish Mono 50.30 50.30 50.30 50.30 50.30 50.30 50.30 Baseline 45.76 47.88 42.58 40.76 36.82 44.09 36.67 Google 51.06 49.85 49.55 42.42 43.33 50.61 38.48 Bing 47.88 48.79 46.67 38.79 40.91 50.61 44.70
  • 20. 17/37 Reranking approach Motivation The single best translation that is returned by SMT system is not selected w.r.t CLIR performance. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Czech French German 01020304050 Histograms of ranks of translation hypotheses with the highest P@10 for each training query
  • 21. 18/37 Reranking approach Trained to select the best translation for CLIR performance P@10 as an objective function (predict the translation that gives the highest P@10) Index Documents (EN) nevolnost a zvracení a hematemeze Indexer Top-K Retrieval system Ranked list of documents MT system N-best-list (EN) Reranker EN query
  • 22. 19/37 Feature set SMT scores: Translation model, language model and reordering models Rank features: SMT rank and a Boolean feature (1 for best rank, 0 otherwise) Features based on Blind relevance feedback IDF from the collection (inverse document frequency) Translation pool Retrieval statue value Features that are based on external resources (UMLS1, Wikipedia) 1 The Unified Medical Language System: large, multi-purpose, and multi-lingual thesaurus that contains millions of biomedical and health related concepts
  • 23. 20/37 Training 100 queries for training, 15-best-list hypotheses for each query. Two approaches for training: Language-Specific: Model for each language Language-Independent: One model for all languages Leave-One-Out cross validation
  • 24. 21/37 Reranker testing Generate vectors of feature values for each query The trained regression model predicts the hypothesis that gives that highest P@10 Run retrieval for that hypothesis query string
  • 25. 22/37 Results - test set Results of the final evaluation on the test set queries Czech French German system P@10 P@10 P@10 Mono 50.30 50.30 50.30 Baseline 45.61 47.73 42.42 Reranker 50.15 51.06 45.30 Google 50.91 49.70 49.39 Bing 47.88 48.64 46.52 Improvements: 9 queries in Czech, 15 queries in German, and 8 queries in French Degradations: 2 cases for Czech, 4 cases for German, and 3 cases for French
  • 26. 23/37 System comparisons Examples of translations of training queries including reference (ref ), oracle (ora), baseline (base) and best (best) translations (system Reranker). The scores in parentheses refer to query P@10 scores.
  • 27. 24/37 Adapting reranker to new languages
  • 28. 25/37 Queries in new languages New SMT systems (Spanish, Hungarian, Polish and Swedish) developed recently also within Khresmoi. Human experts translated original English queries into these languages, ”under KConnect project”. We want to develop CLIR system for these languages.
  • 29. 26/37 Adapting reranker To adapt the reranker, two sources of data used to create training set: Merged data from existing languages (Czech, French and German) Data from each new language (Spanish, Hungarian, Polish and Swedish) The data is used to create language-independent models
  • 30. 27/37 Language-independent model performance Final evaluation results of language-independent models on the test set Spanish Hungarian Polish Swedish system P@10 P@10 P@10 P@10 Mono 50.30 50.30 50.30 47.10 Baseline 44.09 40.76 36.82 36.67 Reranker 46.36 43.18 36.67 38.79
  • 31. 28/37 Document translation Last years SMT systems improved significantly All researches regarding DT are quite old! Reinvestigate the research question if QT is really better than DT
  • 32. 29/37 Document translation Queries are posed by users in their language Translate the English collection into: Czech, French and German Create separate index for each language Perform the retrieval using original query and the relevant index Index (CS) Documents (EN) User poses a query (CS) Ranked list of documents MT system Indexer Top-K Retrieval system Documents (CS)
  • 33. 30/37 Morphological processing Both queries and documents are processed as follows: Translate into Czech, French and German Stemming using Snowball stemmer 1 Lemmatizing using Tree Tagger for French and German2 and MorphoDiTa for Czech3 1 http://snowball.tartarus.org/ 2 http://www.cis.uni-muenchen.de/˜schmid/tools/TreeTagger 3 http://ufal.mff.cuni.cz/morphodita
  • 34. 31/37 Results - Document Translation Results of the final evaluation on the test set queries Czech French German system P@10 P@10 P@10 Mono 50.30 50.30 50.30 Baseline 45.61 47.73 42.42 DT 37.42 41.67 36.21 DT Stem 41.67 42.73 36.67 DT Lem 39.39 41.06 33.18
  • 35. 32/37 Query expansion Users fail sometimes to create query that represents their information need Query expansion is the process of adding terms to their query (also called query reformulation) Our approach is based on machine learning model
  • 36. 33/37 Query expansion Algorithm Get 20-best-list translations for each query Create a translation pool as bag-of-words from these translations Use best translation as an original query Model can predict a term which will give the highest P@10 when it is added to the original query Features: IDF, TF (pool), similarity between term and query (word-embeddings) Expand the query with one term from the translation pool Run the retrieval using our baseline setting using the expanded queries. Translation pool was limited for some queries, expand it pool from Wikipedia articles
  • 37. 34/37 Results - test set Results of the final evaluation on the test set queries Czech French German system P@10 P@10 P@10 Mono 50.30 50.30 50.30 Baseline 45.61 47.73 42.42 QE 42.12 46.21 37.88
  • 38. 35/37 Query expansion (QE) improved in average 10 queries over the baseline system, only 60% coverage, wait to complete the assessment.
  • 39. 35/37 Query expansion examples Mono: white patchiness in mouth P@10: 10.00 Base: white coating mouth, P@10: 10.00 Expanded: white coating mouth oral cavity P@10: 70.00 Mono: SOB P@10: 50.00 Base: dyspnoea P@10: 60.00 Expanded: dyspnoea rash breathing dyspnea P@10: 70.00
  • 40. 36/37 Conclusion and future work Monolingual IR system evaluation and assessment Cross-lingual IR approaches: Query translation Document translation and morphological analysis Query expansion based on translation pool and Wikipedia Reranking model to predict, for each query, which translation hypothesis gives the highest P@10 Contribution to the CLIR community by releasing dataset with high coverage (doc/query pair)
  • 41. 37/37 Our publications Shadi Saleh and Pavel Pecina. CUNI at the ShARe/CLEF eHealth Evaluation Lab 2014. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Sheffield, UK,2014 Shadi Saleh, Feraena Bibyna, Pavel Pecina: CUNI at the CLEF eHealth 2015 Task 2. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, CEUR-WS, Toulouse,France, 2015 Shadi Saleh and Pavel Pecina. Adapting SMT Query Translation Reranker to New Languages in Cross-Lingual Information Retrieval. In Medical Information Retrieval (MedIR) Workshop, Association for Computational Linguistics, Pisa, Italy, 2016 Shadi Saleh and Pavel Pecina. Reranking hypotheses of machine-translated queries for cross-lingual information retrieval. In Experimental IR Meets Multilinguality, Multimodality, and Interaction 7th International Conference of the CLEF Association, CLEF 2016, Evora, Portugal, 2016 Shadi Saleh, Pavel Pecina: Task3 Patient-Centred Information Retrieval: Team CUNI, CLEF 2016 Working Notes, CEUR-WS, Evora, Portugal, 2016 Shadi Saleh, Pavel Pecina: Task3 Patient-Centred Information Retrieval: Team CUNI, CLEF 2016 Working Notes, CEUR-WS, Dublin, Ireland, 2017