SlideShare a Scribd company logo
Five years of DisGeNET: Lessons 
learned and challenges ahead 
Laura 
I. 
Furlong 
IMIM-­‐UPF 
Big 
Data 
in 
Biomedicine 
Barcelona 
November 
11, 
2014
• Knowledge 
plaDorm 
on 
human 
diseases 
and 
their 
genes 
• Aims 
to 
cover 
all 
disease 
therapeuHc 
areas 
• Developed 
by 
integraHon 
of 
data 
from 
expert-­‐curated 
resources 
and 
from 
the 
literature 
by 
text 
mining 
• Centered 
on 
gene-­‐disease 
associaHon 
(GDA) 
and 
its 
supporHng 
evidence 
and 
provenance 
11/11/14 
Laura 
I. 
Furlong 
2
Re# 
Syndrome 
UMLS:C0035372 
MCEP2 
NCBI:4204 
11/11/14 
Laura 
I. 
Furlong 
3
Re# 
Syndrome 
UMLS:C0035372 
MCEP2 
NCBI:4204 
ü 
ü 
ü 
x 
ü 
11/11/14 
Laura 
I. 
Furlong 
4
UniProt 
CTD 
MGD 
LHGDN 
CTD 
Curated 
RGD 
BEFREE 
GAD 
Predicted 
Literature 
11/11/14 
Laura 
I. 
Furlong 
5
UniProt 
CTD 
MGD 
LHGDN 
381,056 
GDAs 
16,666 
CTD 
genes 
13,172 
diseases 
Curated 
RGD 
BEFREE 
GAD 
Predicted 
Literature 
11/11/14 
Laura 
I. 
Furlong 
6
Key 
points 
1. FragmentaPon 
of 
informaPon 
as 
a 
barrier 
to 
knowledge 
on 
the 
mechanisms 
of 
human 
diseases 
11/11/14 
Laura 
I. 
Furlong 
7
11/11/14 
Laura 
I. 
Furlong 
8
hVp://semanHcscience.org 
hVp://lod-­‐cloud.net/ 
StandardizaPon 
• Controlled 
vocabularies 
and 
ontologies 
• DisGeNET 
associaHon 
type 
ontology 
Open 
access 
• RDF 
and 
nanopublicaHons 
• Data 
distributed 
under 
the 
Open 
database 
commons 
license 
hVp://opendatacommons.org/
Key 
points 
2. High 
rate 
of 
data 
generaPon 
on 
GDAs 
poses 
challenges 
to 
biocuraPon 
pipelines 
11/11/14 
Laura 
I. 
Furlong 
10
PublicaHons 
on 
diseases 
and 
genes 
from 
1980 
(737,712 
publicaHons) 
Supervised 
machine-­‐learning 
approach 
for 
relaHon 
extracHon 
between 
Text 
Mining 
530,347 
associaPons 
14,777 
genes 
12,650 
diseases 
355,976 
publicaHons 
genes 
and 
diseases 
Aprox. 
30,000 
GDAs 
11/11/14 
Laura 
I. 
Furlong 
11
• Nearly 
70 
% 
of 
GDAs 
supported 
by 
only 
one 
publicaHon! 
• 900 
GDAs 
supported 
by 
> 
200 
publicaHons 
• Average 
2.8 
publicaHons/GDA 
11/11/14 
Laura 
I. 
Furlong 
12
GDAs 
supported 
by 
only 
one 
publicaHon 
11/11/14 
Laura 
I. 
Furlong 
13
Li#le 
overlap 
between 
text 
mined 
data 
and 
data 
present 
in 
curated 
repositories 
• Other 
publicaHons, 
full 
text, 
etc 
• FN 
from 
TM 
• Non-­‐relevant 
data 
for 
DBs 
• Relevant 
data 
but 
sHll 
not 
curated 
• FP 
from 
TM 
11/11/14 
Laura 
I. 
Furlong 
14
• This 
is 
not 
raw 
data 
(e.g. 
from 
NGS 
analysis), 
is 
data 
already 
collated 
and 
filtered 
that 
have 
passed 
peer 
review 
before 
publicaHon 
• Text 
mined 
from 
abstracts: 
• Not 
mining 
full 
text, 
nor 
supplementary 
material 
• Not 
mining 
tables, 
figures 
• Focus 
on 
relaHons 
stated 
on 
sentences, 
not 
handling 
anaphoras 
• We 
are 
just 
looking 
into 
a 
small 
frac)on 
of 
all 
the 
available 
data!!! 
11/11/14 
Laura 
I. 
Furlong 
15
Key 
points 
2. High 
rate 
of 
data 
generaPon 
on 
GDAs 
poses 
challenges 
to 
biocuraPon 
pipelines 
• Need 
to 
find 
alternaHve 
strategies 
to 
expert 
curaHon 
• “wisdom 
of 
the 
crowds” 
approaches 
(e.g. 
crowdsourcing) 
• ? 
11/11/14 
Laura 
I. 
Furlong 
16
Key 
points 
3. Data 
prioriPzaPon 
to 
support 
interpretaPon 
of 
data 
on 
the 
genePc 
determinants 
of 
human 
diseases 
11/11/14 
Laura 
I. 
Furlong 
17
Ranks 
gene-­‐disease 
associaHons 
based 
on 
the 
supporHng 
evidence 
DisGeNET 
score 
= 
SCURATED 
+ 
SPREDICTED 
+ 
SLITERATURE 
GDAs 
prioriPzaPon 
SCURATED 
= 
WUNIPROT 
+ 
WCTD 
SPREDICTED 
= 
WRat 
+ 
WMouse 
SLITERATURE 
= 
WGAD 
+ 
WLHGDN 
+ 
WBeFree 
11/11/14 
Laura 
I. 
Furlong 
18
Disease 
Gene 
Score 
UniProt 
CTD 
Rat 
Mouse 
Number 
of 
publicaPons 
BeFree 
GAD 
LHGDN 
Wilson’s 
Disease 
ATP7B 
0.99 
0.3 
0.3 
0.1 
0.1 
174 
31 
23 
ReV 
Syndrome 
MECP2 
0.9 
0.3 
0.3 
0 
0.1 
438 
27 
43 
CysHc 
Fibrosis 
CFTR 
0.9 
0.3 
0.3 
0 
0.1 
1429 
150 
78 
Obesity 
MC4R 
0.94 
0.3 
0.3 
0.1 
0.1 
220 
46 
0 
Alzheimer 
Disease 
APP 
0.88 
0.3 
0.3 
0 
0.1 
1096 
18 
81 
11/11/14 
Laura 
I. 
Furlong 
19
Key 
points 
3. Data 
prioriPzaPon 
to 
support 
interpretaPon 
of 
data 
on 
the 
genePc 
determinants 
of 
human 
diseases 
• SHll 
we 
have 
a 
large 
dataset 
with 
low 
score 
(350,000 
GDAs) 
• Other 
approaches 
based 
on: 
• type 
of 
associaHon 
of 
the 
gene 
to 
the 
disease 
• experimental 
evidence 
for 
the 
GDAs 
• network-­‐based 
gene-­‐prioriHzaHon 
algorithms 
• Take 
into 
account 
contradictory 
findings 
• Different 
prioriHzaHon 
approaches 
for 
different 
purposes? 
11/11/14 
Laura 
I. 
Furlong 
20
Key 
points 
4. Large 
number 
of 
genes 
associated 
to 
some 
diseases 
might 
reflect 
phenotypic 
diversity 
11/11/14 
Laura 
I. 
Furlong 
21
11/11/14 
Laura 
I. 
Furlong 
22
Need 
for 
deep 
phenotyping 
of 
diseases 
to 
idenHfy 
disease 
subtypes 
associated 
to 
different 
gene 
networks 
Human 
Phenotype 
Ontology 
(HPO) 
hVp://www.human-­‐phenotype-­‐ontology.org/ 
• Ontology 
of 
phenotypic 
abnormaliHes 
of 
human 
diseases 
• Based 
on 
OMIM, 
Orphanet 
and 
DECIPHER 
11/11/14 
Laura 
I. 
Furlong 
23
DisGeNET-­‐HPO 
annotaPons 
33 
% 
of 
DisGeNET 
diseases 
annotated 
with 
HPO 
terms 
Mainly 
come 
from 
OMIM 
diseases 
(28 
% 
of 
DisGeNET 
diseases 
come 
from 
OMIM) 
Need 
for 
other 
approaches 
to 
annotate 
the 
full 
spectrum 
of 
diseases 
at 
phenotypic 
level 
11/11/14 
Laura 
I. 
Furlong 
24
Key 
points 
1. FragmentaHon 
of 
informaHon 
as 
a 
barrier 
to 
knowledge 
on 
the 
mechanisms 
of 
human 
diseases 
2. High 
rate 
of 
data 
generaHon 
on 
GDAs 
poses 
challenges 
to 
biocuraHon 
pipelines 
3. Data 
prioriHzaHon 
to 
support 
interpretaHon 
of 
data 
on 
the 
geneHc 
determinants 
of 
human 
diseases 
4. Large 
number 
of 
genes 
associated 
to 
some 
diseases 
might 
reflect 
phenotypic 
diversity
IBI 
Group 
Alba 
GuHérrez 
Àlex 
Bravo 
Janet 
Piñero 
Núria 
Queralt-­‐Rosiñach 
Miguel 
A. 
Mayer 
Pablo 
Carbonell 
Laura 
I. 
Furlong 
Ferran 
Sanz 
IBI 
Past 
members 
Montserrat 
Cases 
Michael 
Rautschka 
Anna 
Bauer-­‐Mehren 
Solène 
Grosdidier 
11/11/14 
Laura 
I. 
Furlong 
26

More Related Content

Similar to Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014

Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
Dr. Juan Rodriguez-Tafur
 
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Ryan Squire
 
Fondale et al PLOSone 2014 (dragged) 2
Fondale et al PLOSone 2014 (dragged) 2Fondale et al PLOSone 2014 (dragged) 2
Fondale et al PLOSone 2014 (dragged) 2
Hyunsun Park
 
Schizophrenia
SchizophreniaSchizophrenia
Schizophrenia
guest0781e91
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
Martín Arrieta
 
Bioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in BiotechnologyBioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in Biotechnology
Tuhin Samanta
 
Pone.0034901
Pone.0034901Pone.0034901
Pone.0034901
鋒博 蔡
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Jeremy Yang
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Michel Dumontier
 
Harnessing the Power of Infectious Disease Information with a Relational Data...
Harnessing the Power of Infectious Disease Information with a Relational Data...Harnessing the Power of Infectious Disease Information with a Relational Data...
Harnessing the Power of Infectious Disease Information with a Relational Data...
Jay Brown
 
Epigenetics_Poster_Nafisah.v1
Epigenetics_Poster_Nafisah.v1Epigenetics_Poster_Nafisah.v1
Epigenetics_Poster_Nafisah.v1
Nafisah Islam
 
Angela VanVeldhuizen (Sanford Health) Registries: CoRDS
Angela VanVeldhuizen (Sanford Health) Registries: CoRDSAngela VanVeldhuizen (Sanford Health) Registries: CoRDS
Angela VanVeldhuizen (Sanford Health) Registries: CoRDS
Canadian Organization for Rare Disorders
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
mhaendel
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
Chirag Patel
 
BRN Seminar 12/06/14 Integrative Bioinformatics
BRN Seminar 12/06/14 Integrative Bioinformatics BRN Seminar 12/06/14 Integrative Bioinformatics
BRN Seminar 12/06/14 Integrative Bioinformatics
brnmomentum
 
Integrative Bioinformatics
Integrative BioinformaticsIntegrative Bioinformatics
Integrative Bioinformatics
brnbarcelona
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
RayhanulMasud1
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
Eli Rosenthal
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
teamchaotex
 

Similar to Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014 (20)

Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
Associations of MHC Ancestral Haplotypes with Resistance/Susceptibility to AI...
 
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
 
Fondale et al PLOSone 2014 (dragged) 2
Fondale et al PLOSone 2014 (dragged) 2Fondale et al PLOSone 2014 (dragged) 2
Fondale et al PLOSone 2014 (dragged) 2
 
Schizophrenia
SchizophreniaSchizophrenia
Schizophrenia
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Bioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in BiotechnologyBioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in Biotechnology
 
Pone.0034901
Pone.0034901Pone.0034901
Pone.0034901
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMaking the most of phenotypes in ontology-based biomedical knowledge discovery
Making the most of phenotypes in ontology-based biomedical knowledge discovery
 
Harnessing the Power of Infectious Disease Information with a Relational Data...
Harnessing the Power of Infectious Disease Information with a Relational Data...Harnessing the Power of Infectious Disease Information with a Relational Data...
Harnessing the Power of Infectious Disease Information with a Relational Data...
 
Epigenetics_Poster_Nafisah.v1
Epigenetics_Poster_Nafisah.v1Epigenetics_Poster_Nafisah.v1
Epigenetics_Poster_Nafisah.v1
 
Angela VanVeldhuizen (Sanford Health) Registries: CoRDS
Angela VanVeldhuizen (Sanford Health) Registries: CoRDSAngela VanVeldhuizen (Sanford Health) Registries: CoRDS
Angela VanVeldhuizen (Sanford Health) Registries: CoRDS
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
 
BRN Seminar 12/06/14 Integrative Bioinformatics
BRN Seminar 12/06/14 Integrative Bioinformatics BRN Seminar 12/06/14 Integrative Bioinformatics
BRN Seminar 12/06/14 Integrative Bioinformatics
 
Integrative Bioinformatics
Integrative BioinformaticsIntegrative Bioinformatics
Integrative Bioinformatics
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 

Recently uploaded

ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 

Recently uploaded (20)

ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 

Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014

  • 1. Five years of DisGeNET: Lessons learned and challenges ahead Laura I. Furlong IMIM-­‐UPF Big Data in Biomedicine Barcelona November 11, 2014
  • 2. • Knowledge plaDorm on human diseases and their genes • Aims to cover all disease therapeuHc areas • Developed by integraHon of data from expert-­‐curated resources and from the literature by text mining • Centered on gene-­‐disease associaHon (GDA) and its supporHng evidence and provenance 11/11/14 Laura I. Furlong 2
  • 3. Re# Syndrome UMLS:C0035372 MCEP2 NCBI:4204 11/11/14 Laura I. Furlong 3
  • 4. Re# Syndrome UMLS:C0035372 MCEP2 NCBI:4204 ü ü ü x ü 11/11/14 Laura I. Furlong 4
  • 5. UniProt CTD MGD LHGDN CTD Curated RGD BEFREE GAD Predicted Literature 11/11/14 Laura I. Furlong 5
  • 6. UniProt CTD MGD LHGDN 381,056 GDAs 16,666 CTD genes 13,172 diseases Curated RGD BEFREE GAD Predicted Literature 11/11/14 Laura I. Furlong 6
  • 7. Key points 1. FragmentaPon of informaPon as a barrier to knowledge on the mechanisms of human diseases 11/11/14 Laura I. Furlong 7
  • 8. 11/11/14 Laura I. Furlong 8
  • 9. hVp://semanHcscience.org hVp://lod-­‐cloud.net/ StandardizaPon • Controlled vocabularies and ontologies • DisGeNET associaHon type ontology Open access • RDF and nanopublicaHons • Data distributed under the Open database commons license hVp://opendatacommons.org/
  • 10. Key points 2. High rate of data generaPon on GDAs poses challenges to biocuraPon pipelines 11/11/14 Laura I. Furlong 10
  • 11. PublicaHons on diseases and genes from 1980 (737,712 publicaHons) Supervised machine-­‐learning approach for relaHon extracHon between Text Mining 530,347 associaPons 14,777 genes 12,650 diseases 355,976 publicaHons genes and diseases Aprox. 30,000 GDAs 11/11/14 Laura I. Furlong 11
  • 12. • Nearly 70 % of GDAs supported by only one publicaHon! • 900 GDAs supported by > 200 publicaHons • Average 2.8 publicaHons/GDA 11/11/14 Laura I. Furlong 12
  • 13. GDAs supported by only one publicaHon 11/11/14 Laura I. Furlong 13
  • 14. Li#le overlap between text mined data and data present in curated repositories • Other publicaHons, full text, etc • FN from TM • Non-­‐relevant data for DBs • Relevant data but sHll not curated • FP from TM 11/11/14 Laura I. Furlong 14
  • 15. • This is not raw data (e.g. from NGS analysis), is data already collated and filtered that have passed peer review before publicaHon • Text mined from abstracts: • Not mining full text, nor supplementary material • Not mining tables, figures • Focus on relaHons stated on sentences, not handling anaphoras • We are just looking into a small frac)on of all the available data!!! 11/11/14 Laura I. Furlong 15
  • 16. Key points 2. High rate of data generaPon on GDAs poses challenges to biocuraPon pipelines • Need to find alternaHve strategies to expert curaHon • “wisdom of the crowds” approaches (e.g. crowdsourcing) • ? 11/11/14 Laura I. Furlong 16
  • 17. Key points 3. Data prioriPzaPon to support interpretaPon of data on the genePc determinants of human diseases 11/11/14 Laura I. Furlong 17
  • 18. Ranks gene-­‐disease associaHons based on the supporHng evidence DisGeNET score = SCURATED + SPREDICTED + SLITERATURE GDAs prioriPzaPon SCURATED = WUNIPROT + WCTD SPREDICTED = WRat + WMouse SLITERATURE = WGAD + WLHGDN + WBeFree 11/11/14 Laura I. Furlong 18
  • 19. Disease Gene Score UniProt CTD Rat Mouse Number of publicaPons BeFree GAD LHGDN Wilson’s Disease ATP7B 0.99 0.3 0.3 0.1 0.1 174 31 23 ReV Syndrome MECP2 0.9 0.3 0.3 0 0.1 438 27 43 CysHc Fibrosis CFTR 0.9 0.3 0.3 0 0.1 1429 150 78 Obesity MC4R 0.94 0.3 0.3 0.1 0.1 220 46 0 Alzheimer Disease APP 0.88 0.3 0.3 0 0.1 1096 18 81 11/11/14 Laura I. Furlong 19
  • 20. Key points 3. Data prioriPzaPon to support interpretaPon of data on the genePc determinants of human diseases • SHll we have a large dataset with low score (350,000 GDAs) • Other approaches based on: • type of associaHon of the gene to the disease • experimental evidence for the GDAs • network-­‐based gene-­‐prioriHzaHon algorithms • Take into account contradictory findings • Different prioriHzaHon approaches for different purposes? 11/11/14 Laura I. Furlong 20
  • 21. Key points 4. Large number of genes associated to some diseases might reflect phenotypic diversity 11/11/14 Laura I. Furlong 21
  • 22. 11/11/14 Laura I. Furlong 22
  • 23. Need for deep phenotyping of diseases to idenHfy disease subtypes associated to different gene networks Human Phenotype Ontology (HPO) hVp://www.human-­‐phenotype-­‐ontology.org/ • Ontology of phenotypic abnormaliHes of human diseases • Based on OMIM, Orphanet and DECIPHER 11/11/14 Laura I. Furlong 23
  • 24. DisGeNET-­‐HPO annotaPons 33 % of DisGeNET diseases annotated with HPO terms Mainly come from OMIM diseases (28 % of DisGeNET diseases come from OMIM) Need for other approaches to annotate the full spectrum of diseases at phenotypic level 11/11/14 Laura I. Furlong 24
  • 25. Key points 1. FragmentaHon of informaHon as a barrier to knowledge on the mechanisms of human diseases 2. High rate of data generaHon on GDAs poses challenges to biocuraHon pipelines 3. Data prioriHzaHon to support interpretaHon of data on the geneHc determinants of human diseases 4. Large number of genes associated to some diseases might reflect phenotypic diversity
  • 26. IBI Group Alba GuHérrez Àlex Bravo Janet Piñero Núria Queralt-­‐Rosiñach Miguel A. Mayer Pablo Carbonell Laura I. Furlong Ferran Sanz IBI Past members Montserrat Cases Michael Rautschka Anna Bauer-­‐Mehren Solène Grosdidier 11/11/14 Laura I. Furlong 26