SlideShare a Scribd company logo
Sarvnaz Karimi 
BioTALA group, NICTA Victoria Research Laboratory 
Tagging and Genomics Information Retrieval
Genomics Information Retrieval 
 TREC genomics defined a TEXT 
retrieval task in which a user 
seeks infomration in a sub-area of 
biology linked with genomics 
information.
Task Definition 
 TREC 2006 and 2007 Genomics: 
 Passage retrieval 
 Collection was full-text articles 
(12GB) from 49 medical journals. 
 Queries were medical questions. 
 Evaluation based on passages, 
documents, and aspect level.
Queries 
 Questions were based on the Generic Topic Types (GTTs) 
 2006: Five different GTTS, such as “Find articles describing the role of 
a gene involved in a given disease.” 
e.g. What is the role of DRD4 in alcoholism? 
 2007: Questions asking for a specific entity type based on controlled 
terminologies (14 types). 
e.g. What [DISEASES] are associated with lysosomal abnormalities in the 
nervous system? 
 Relevance Judgements: Human experts idetified the relevant 
passages, and the relevant concepts.
Can Tagging Entity Types Help Retrieval? 
 Tagging: Manual or automatic association of a tag from a controlled 
vocabulary to a term or phrase in a text. 
 Controlled vocabulary: set of entity types as defined for the TREC 
tasks. 
e.g. What is the role of DRD4 in alcoholism? 
What is the role of DRD4 [GENE] in alcoholism [DISEASE]? 
 Possible benefits: 
 disambiguation: e.g. nur-77 can refer to a gene or protein. 
 Increase the chance of retrieving a document by increasing the 
common terms it has with the query.
Inside the Text Collection 
 Distribution of tag terms (entity type) in the 
documents: 
 Large majority of documents contain more than 
one distinct tag (57.4% for 2006, and 94.6% for 
2007) 
 for the 2006 collection, on average 46.5% of the 
irrelevant documents contain the tag terms, 
compared to 63.4% of the relevant documents. 
For 2007, these numbers are 47.8% and 80.3%, 
respectively. 
 the proportion of relevant documents that would 
not be retrieved without tagging. For the 2006 
collection, this holds for only 0.4% of relevant 
documents; for 2007, it is only 0.005%.
Inside the Text Collection: Conclusion 1 
• Tag terms occur somewhat more frequently in relevant 
documents compared to irrelevant documents (No extra 
information/disambiguation is expected with tagging). 
• In case of no annotation, the relevant documents would 
nearly all still be retrievable because of other term overlap 
between the queries and documents.
Inside the Text Collection: IR Flavoured 
Recall versus documents sorted based on the descending frequency of the 
tag words:
Inside the Text Collection: Conclusion 2 
• A small correlation (ρ=0.09) between the number of distinct 
tags and the likelihood that a document is relevant. 
• Tag frequency appears to be related to relevance (ρ=0.84) for 
most tags.
Retrieval Experiments
What's happening? 
• An example quer y: 
– Q1. What serum [PROTEINS] change expression in association with high 
disease activity in lupus? (original query) 
– Q2. What serum change expression in association with high disease 
activity in lupus 
– Q3. proteins 
– Q4. lupus
What's happening? 
Protein 
Lupus
Conclusions 
 Does tagging help a text retrieval task? We still dont have a 
strong evidence to prove it does. 
 Maybe tags are too general to be discriminative enough. 
 What level of accuracy a tagger should have to be beneficial? 
 In the explained framework we could not define a threshold. Even 
randon assignment of tags improved MAP over the baseline.

More Related Content

Viewers also liked

Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Sarvnaz Karimi
 
Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical Text
Sarvnaz Karimi
 
2011 calendar
2011 calendar2011 calendar
2011 calendar
Kevin Marston
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Sarvnaz Karimi
 
Karimi esair2015
Karimi esair2015Karimi esair2015
Karimi esair2015
Sarvnaz Karimi
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
1. decreto legislativo nº 1180
1. decreto legislativo nº 11801. decreto legislativo nº 1180
1. decreto legislativo nº 1180
Julio Orlando Velasquez Camilo
 
Tophunt - Работа в Беларуси
Tophunt - Работа в БеларусиTophunt - Работа в Беларуси
Tophunt - Работа в Беларуси
Evgeni Parovoi
 
Informació general (1)
Informació general (1)Informació general (1)
Informació general (1)skimer97
 
Ajid abdulmazid 250 phs a escc ccb (investasi)
Ajid abdulmazid 250 phs a escc ccb (investasi)Ajid abdulmazid 250 phs a escc ccb (investasi)
Ajid abdulmazid 250 phs a escc ccb (investasi)
rimmyzia
 
Internet security and privacy
Internet security and privacyInternet security and privacy
Internet security and privacy
gbemis00
 
Pinpointing Location Focus in Microblogs
Pinpointing Location Focus in MicroblogsPinpointing Location Focus in Microblogs
Pinpointing Location Focus in Microblogs
Sarvnaz Karimi
 
Jobvite 2012 Social Recruiting Survey
Jobvite 2012 Social Recruiting SurveyJobvite 2012 Social Recruiting Survey
Jobvite 2012 Social Recruiting Survey
Ciara Fogarty
 
Anda & hiv_aids,_ims
Anda & hiv_aids,_imsAnda & hiv_aids,_ims
Anda & hiv_aids,_ims
bintangagustina
 
Pollution
PollutionPollution
Pollution
atchadwi
 
Sources de trafic site internet
Sources de trafic site internetSources de trafic site internet
Sources de trafic site internet
ohmyweb!
 

Viewers also liked (16)

Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Tran...
 
Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical Text
 
2011 calendar
2011 calendar2011 calendar
2011 calendar
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Karimi esair2015
Karimi esair2015Karimi esair2015
Karimi esair2015
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
1. decreto legislativo nº 1180
1. decreto legislativo nº 11801. decreto legislativo nº 1180
1. decreto legislativo nº 1180
 
Tophunt - Работа в Беларуси
Tophunt - Работа в БеларусиTophunt - Работа в Беларуси
Tophunt - Работа в Беларуси
 
Informació general (1)
Informació general (1)Informació general (1)
Informació general (1)
 
Ajid abdulmazid 250 phs a escc ccb (investasi)
Ajid abdulmazid 250 phs a escc ccb (investasi)Ajid abdulmazid 250 phs a escc ccb (investasi)
Ajid abdulmazid 250 phs a escc ccb (investasi)
 
Internet security and privacy
Internet security and privacyInternet security and privacy
Internet security and privacy
 
Pinpointing Location Focus in Microblogs
Pinpointing Location Focus in MicroblogsPinpointing Location Focus in Microblogs
Pinpointing Location Focus in Microblogs
 
Jobvite 2012 Social Recruiting Survey
Jobvite 2012 Social Recruiting SurveyJobvite 2012 Social Recruiting Survey
Jobvite 2012 Social Recruiting Survey
 
Anda & hiv_aids,_ims
Anda & hiv_aids,_imsAnda & hiv_aids,_ims
Anda & hiv_aids,_ims
 
Pollution
PollutionPollution
Pollution
 
Sources de trafic site internet
Sources de trafic site internetSources de trafic site internet
Sources de trafic site internet
 

Similar to Biomedical Search

Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
Lars Juhl Jensen
 
Searching for Relevant Studies
Searching for Relevant StudiesSearching for Relevant Studies
Searching for Relevant Studies
Effective Health Care Program
 
TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
Jeremy Yang
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
Dan Sullivan, Ph.D.
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
Helena Deus
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
Lars Juhl Jensen
 
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology PerspectiveVisual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Findwise
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
Yasmine Gaber
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
Lars Juhl Jensen
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
David
DavidDavid
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
Lars Juhl Jensen
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
Yannick Pouliot
 
The TDR Targets Database, an introduction.
The TDR Targets Database, an introduction.The TDR Targets Database, an introduction.
The TDR Targets Database, an introduction.
tdrtargets
 
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaPA Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
Division of Biomedical Informatics, UC San Diego
 
Coding
CodingCoding
Biostats2019 5
Biostats2019 5Biostats2019 5
Biostats2019 5
daforerog
 

Similar to Biomedical Search (20)

Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Searching for Relevant Studies
Searching for Relevant StudiesSearching for Relevant Studies
Searching for Relevant Studies
 
TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
 
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology PerspectiveVisual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
David
DavidDavid
David
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
The TDR Targets Database, an introduction.
The TDR Targets Database, an introduction.The TDR Targets Database, an introduction.
The TDR Targets Database, an introduction.
 
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaPA Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
 
Coding
CodingCoding
Coding
 
Biostats2019 5
Biostats2019 5Biostats2019 5
Biostats2019 5
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Biomedical Search

  • 1. Sarvnaz Karimi BioTALA group, NICTA Victoria Research Laboratory Tagging and Genomics Information Retrieval
  • 2. Genomics Information Retrieval  TREC genomics defined a TEXT retrieval task in which a user seeks infomration in a sub-area of biology linked with genomics information.
  • 3. Task Definition  TREC 2006 and 2007 Genomics:  Passage retrieval  Collection was full-text articles (12GB) from 49 medical journals.  Queries were medical questions.  Evaluation based on passages, documents, and aspect level.
  • 4. Queries  Questions were based on the Generic Topic Types (GTTs)  2006: Five different GTTS, such as “Find articles describing the role of a gene involved in a given disease.” e.g. What is the role of DRD4 in alcoholism?  2007: Questions asking for a specific entity type based on controlled terminologies (14 types). e.g. What [DISEASES] are associated with lysosomal abnormalities in the nervous system?  Relevance Judgements: Human experts idetified the relevant passages, and the relevant concepts.
  • 5. Can Tagging Entity Types Help Retrieval?  Tagging: Manual or automatic association of a tag from a controlled vocabulary to a term or phrase in a text.  Controlled vocabulary: set of entity types as defined for the TREC tasks. e.g. What is the role of DRD4 in alcoholism? What is the role of DRD4 [GENE] in alcoholism [DISEASE]?  Possible benefits:  disambiguation: e.g. nur-77 can refer to a gene or protein.  Increase the chance of retrieving a document by increasing the common terms it has with the query.
  • 6. Inside the Text Collection  Distribution of tag terms (entity type) in the documents:  Large majority of documents contain more than one distinct tag (57.4% for 2006, and 94.6% for 2007)  for the 2006 collection, on average 46.5% of the irrelevant documents contain the tag terms, compared to 63.4% of the relevant documents. For 2007, these numbers are 47.8% and 80.3%, respectively.  the proportion of relevant documents that would not be retrieved without tagging. For the 2006 collection, this holds for only 0.4% of relevant documents; for 2007, it is only 0.005%.
  • 7. Inside the Text Collection: Conclusion 1 • Tag terms occur somewhat more frequently in relevant documents compared to irrelevant documents (No extra information/disambiguation is expected with tagging). • In case of no annotation, the relevant documents would nearly all still be retrievable because of other term overlap between the queries and documents.
  • 8. Inside the Text Collection: IR Flavoured Recall versus documents sorted based on the descending frequency of the tag words:
  • 9. Inside the Text Collection: Conclusion 2 • A small correlation (ρ=0.09) between the number of distinct tags and the likelihood that a document is relevant. • Tag frequency appears to be related to relevance (ρ=0.84) for most tags.
  • 11. What's happening? • An example quer y: – Q1. What serum [PROTEINS] change expression in association with high disease activity in lupus? (original query) – Q2. What serum change expression in association with high disease activity in lupus – Q3. proteins – Q4. lupus
  • 13. Conclusions  Does tagging help a text retrieval task? We still dont have a strong evidence to prove it does.  Maybe tags are too general to be discriminative enough.  What level of accuracy a tagger should have to be beneficial?  In the explained framework we could not define a threshold. Even randon assignment of tags improved MAP over the baseline.