SlideShare a Scribd company logo
Keynote presented at the
Phenotype Foundation first annual
meeting.
Amsterdam, January 18, 2016
Prof. Chris Evelo
Department Bioinformatics –
BiGCaT
Maastricht University
@Chris_Evelo
The use and needs of data sharing in biology
Data
• Things we know
• Things we measure
Knowledge is hard to get
And it doesn’t even play it…
But you can gamify collection
Since we structure it, it can be easier to store
Sharing Data
I would like to exploit common genotype-phenotype relations
between Alzheimer’s Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help with
that!
I can help with
that!
Source: Marcos Roos
Who wants to share data?
• People who want to use data
• Funders
• Publishers
• But the researchers?
You only need MS-Excel
People hide data
• I did all this work I want to reuse
• They don’t need this part, might be my next…
• I might get a patent on this
• Or… It needs a patent to be valuable
• I can’t even patent because ...
How?
• Don’t add specifics
(ohh those really were knockout cells, but..)
• Leave out important steps
(I did these PCRs, why show the array)
• And “we used an approach slightly modified
from…”
• ...
FAIR data
• Findable
• Accessible
• Interoperable
• Reusable
Sharing Data
I would like to exploit common genotype-phenotype relations
between Alzheimer’s Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help with
that!
I can help with
that!
Source: Marcos Roos
Sharing Data
Source: Marcos Roos
???
Here’s my data,
have fun!
Here’s my data,
have fun!
Sharing Linkable Data
Source: Marcos Roos
I can go straight to answering my questions with data from
multiple data owners!
Patients will be so pleased with this speed-up!
Here’s my
Linked Data,
have fun!
Here’s my
Linked Data,
have fun!
Really?
From terms “liver, hepar, hepatic tissue”
To URI’s:
http://identifiers.org/tissueont1/liver
http://identifiers.org/tissueont2/hepar
….
Just a first step
And we didn’t even get that…
Reality:
Ontology inspired pull-down menu’s
Nothing is ever “same-as”
• We may need more meaningful predicates
• Or learn to use the better
• We need lenses, context matters
Too many standards
Source XKCD: https://xkcd.com/927/
Too many standards
And ontologies…
But they are there for a reason!
Research fields have different focus/needs
Don’t standardise, map!
We need mapping
• Ontology mapping
• Identifier mapping
• Identity (text mapping)
• Chemistry mapping
We need mapping
• Ontology mapping: NCBO
• Identifier mapping: BridgeDb, IMS
• Identity (text) mapping: Conceptwiki?
• Chemistry mapping: CRS??
There is a lot out there
Discussed last Friday:
Serum and adipose tissue amino acid homeostasis in
the MHO (Badoud 2014)
– Objective: Integrate metabolite and gene expression profiling to elucidate the
molecular distinctions between Metabolically Healthy Obese (MHO) and
Metabolically Unhealthy Obese (MUO)
• Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic
acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed
that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals.
– (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO
• Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an
overnight fast
– Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray
Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men.
Not very clear why->They selected samples having RNA integrity number higher than
8
– Gene expression data only for the 23 participants
– No gender or biological information (e.g glucose, total triglycerides, etc)
– Not initial serum metabolites concentration (only mean)
– dx.doi.org/10.1021/pr500416v
– Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
Discussed last Friday:
Serum and adipose tissue amino acid homeostasis in
the MHO (Badoud 2014)
– Objective: Integrate metabolite and gene expression profiling to elucidate the
molecular distinctions between Metabolically Healthy Obese (MHO) and
Metabolically Unhealthy Obese (MUO)
• Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic
acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed
that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals.
– (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO
• Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an
overnight fast
– Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray
Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men.
Not very clear why->They selected samples having RNA integrity number higher than
8
– Gene expression data only for the 23 participants
– No gender or biological information (e.g glucose, total triglycerides, etc)
– Not initial serum metabolites concentration (only mean)
– dx.doi.org/10.1021/pr500416v
– Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
Adding phenotypic data
Diversity, not size, makes big data hard
SAM module
- small assays
- diverse assays
For now annotation, used after you find it
Repositories are technology driven
• Expression data
• Protein data
• Metabolomics data
• Genetic variation data
Repositories are technology driven
• Expression data: ArrayExpress, GEO
• Protein data: PRIDE
• Metabolomics data: MetaboLight
• Genetic variation data: dbSNP
Start with the samples?
Or the studies?
ISA-tab inspired
investigations links to studies
which link to assays
samples
and the actual data
Study capturing…
Capturing needs meta-ontologies
Examples:
EFO (experimental factor ontology),
eNanomapper (nanomaterials)
•Combine
•Map
•Slim
•Extend
•Feed extensions back to source
•Reproduce from (extended) source
If you can find it in a database
Can you find the database?
Discoverable fairports?
What about institute repo’s?
If study in dbNP
• Large data in repo’s (e.g. MetaboLight)
• Study descriptions still hidden
Combine with knowledge
• Can you find a study by the results?
• Integrate results
(pathway and ontology profiles)
Challenges needed
Teams answering real questions
• Finds needs and solutions
• Combines across communities
• Fun! And inspiring
• Interesting, publishable results
Starting a database is easy
• What about sustainability:
• Core resources need:
– Long time funding
– Regular monitoring
• Integration in communities
Use of data

More Related Content

What's hot

Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
mikaelhuss
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
Biogeeks
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
Mas Kot
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
contactsoorya
 
Data retrieval
Data retrievalData retrieval
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
Shiv Kumar
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
naveed ul mushtaq
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
Melanie Courtot
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
Carole Goble
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
BITS
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 

What's hot (20)

Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 

Viewers also liked

Adição de ácido clorídrico no meio reacional
Adição de ácido clorídrico no meio reacionalAdição de ácido clorídrico no meio reacional
Adição de ácido clorídrico no meio reacional
Anderson Lima
 
Participant-Centered Consent Toolkit Overview
Participant-Centered Consent Toolkit OverviewParticipant-Centered Consent Toolkit Overview
Participant-Centered Consent Toolkit Overview
john wilbanks
 
Dh presentation helig 2014
Dh presentation helig 2014Dh presentation helig 2014
Dh presentation helig 2014
HELIGLIASA
 
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
1gymkalamp
 
Introducción a la Computación MAE 29
Introducción a la Computación  MAE 29Introducción a la Computación  MAE 29
Introducción a la Computación MAE 29lagreda76
 
Introducción a la Arquitectura Básica del Computador
Introducción a la Arquitectura Básica del ComputadorIntroducción a la Arquitectura Básica del Computador
Introducción a la Arquitectura Básica del ComputadorFranklin Campoverde
 
幽霊島の殺人ルールサマリー
幽霊島の殺人ルールサマリー幽霊島の殺人ルールサマリー
幽霊島の殺人ルールサマリー
niconico_sho
 
交點高雄vol.7 - 安蓉 - 傳說中的文化差異
交點高雄vol.7 - 安蓉 - 傳說中的文化差異交點高雄vol.7 - 安蓉 - 傳說中的文化差異
交點高雄vol.7 - 安蓉 - 傳說中的文化差異交點
 
Nuevas tecnologías de la informacion, montse
Nuevas tecnologías de la informacion, montseNuevas tecnologías de la informacion, montse
Nuevas tecnologías de la informacion, montseMonica Castillo
 
The Science of Guru
The Science of GuruThe Science of Guru
The Science of Guru
Puneet Srivastava
 
Outubro jardim
Outubro jardimOutubro jardim
Outubro jardim
patronatobonanca
 
Google analytics для тизерной рекламы
Google analytics для тизерной рекламыGoogle analytics для тизерной рекламы
Google analytics для тизерной рекламы
Олег Подлуцкий
 
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLUSAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
Eason Chan
 
Laughter is the best medicine
Laughter is the best medicineLaughter is the best medicine
Laughter is the best medicine
OH TEIK BIN
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
Marc Smith
 
Final pr uppehallstillstand
Final pr uppehallstillstand Final pr uppehallstillstand
Final pr uppehallstillstand
LinkedIn Nordic
 
Most Hilarious Moments of FIFA 2014
Most Hilarious Moments of FIFA 2014Most Hilarious Moments of FIFA 2014
Most Hilarious Moments of FIFA 2014
ixigo.com
 
Gamification at SharePoint Saturday Belgium
Gamification at SharePoint Saturday BelgiumGamification at SharePoint Saturday Belgium
Gamification at SharePoint Saturday Belgium
Jussi Mori
 
Клиническая психология - Шизофрения лекция 8 часть 7
Клиническая психология - Шизофрения лекция 8 часть 7Клиническая психология - Шизофрения лекция 8 часть 7
Клиническая психология - Шизофрения лекция 8 часть 7
Igor Kleiner
 

Viewers also liked (20)

Adição de ácido clorídrico no meio reacional
Adição de ácido clorídrico no meio reacionalAdição de ácido clorídrico no meio reacional
Adição de ácido clorídrico no meio reacional
 
Participant-Centered Consent Toolkit Overview
Participant-Centered Consent Toolkit OverviewParticipant-Centered Consent Toolkit Overview
Participant-Centered Consent Toolkit Overview
 
Dh presentation helig 2014
Dh presentation helig 2014Dh presentation helig 2014
Dh presentation helig 2014
 
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
1st_HIGH-SCHOOL_KALAMPAKA_E.Q.
 
Introducción a la Computación MAE 29
Introducción a la Computación  MAE 29Introducción a la Computación  MAE 29
Introducción a la Computación MAE 29
 
Introducción a la Arquitectura Básica del Computador
Introducción a la Arquitectura Básica del ComputadorIntroducción a la Arquitectura Básica del Computador
Introducción a la Arquitectura Básica del Computador
 
幽霊島の殺人ルールサマリー
幽霊島の殺人ルールサマリー幽霊島の殺人ルールサマリー
幽霊島の殺人ルールサマリー
 
交點高雄vol.7 - 安蓉 - 傳說中的文化差異
交點高雄vol.7 - 安蓉 - 傳說中的文化差異交點高雄vol.7 - 安蓉 - 傳說中的文化差異
交點高雄vol.7 - 安蓉 - 傳說中的文化差異
 
Nuevas tecnologías de la informacion, montse
Nuevas tecnologías de la informacion, montseNuevas tecnologías de la informacion, montse
Nuevas tecnologías de la informacion, montse
 
The Science of Guru
The Science of GuruThe Science of Guru
The Science of Guru
 
Outubro jardim
Outubro jardimOutubro jardim
Outubro jardim
 
Google analytics для тизерной рекламы
Google analytics для тизерной рекламыGoogle analytics для тизерной рекламы
Google analytics для тизерной рекламы
 
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLUSAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
SAFER AND MORE NATURAL WAY TO PREVENT COLD AND FLU
 
Laughter is the best medicine
Laughter is the best medicineLaughter is the best medicine
Laughter is the best medicine
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
 
Final pr uppehallstillstand
Final pr uppehallstillstand Final pr uppehallstillstand
Final pr uppehallstillstand
 
Most Hilarious Moments of FIFA 2014
Most Hilarious Moments of FIFA 2014Most Hilarious Moments of FIFA 2014
Most Hilarious Moments of FIFA 2014
 
Path visio3
Path visio3Path visio3
Path visio3
 
Gamification at SharePoint Saturday Belgium
Gamification at SharePoint Saturday BelgiumGamification at SharePoint Saturday Belgium
Gamification at SharePoint Saturday Belgium
 
Клиническая психология - Шизофрения лекция 8 часть 7
Клиническая психология - Шизофрения лекция 8 часть 7Клиническая психология - Шизофрения лекция 8 часть 7
Клиническая психология - Шизофрения лекция 8 часть 7
 

Similar to Use of data

Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
Russ Altman
 
2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
Prof. Wim Van Criekinge
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
Neuro, McGill University
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Jeremy Yang
 
Kernel-based machine learning methods
Kernel-based machine learning methodsKernel-based machine learning methods
Kernel-based machine learning methods
Department of Computer Science, Aalto University
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
inscit2006
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
Amos Watentena
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
Russ Altman
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11
Russ Altman
 
Amia tb-review-10
Amia tb-review-10Amia tb-review-10
Amia tb-review-10
Russ Altman
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
Antica Culina
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
fisherp
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
adcobb
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Neuroscience Information Framework
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
Human Variome Project
 

Similar to Use of data (20)

Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Kernel-based machine learning methods
Kernel-based machine learning methodsKernel-based machine learning methods
Kernel-based machine learning methods
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
Data mining ppt
Data mining pptData mining ppt
Data mining ppt
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11
 
Amia tb-review-10
Amia tb-review-10Amia tb-review-10
Amia tb-review-10
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
 

Recently uploaded

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 

Recently uploaded (20)

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 

Use of data

  • 1. Keynote presented at the Phenotype Foundation first annual meeting. Amsterdam, January 18, 2016 Prof. Chris Evelo Department Bioinformatics – BiGCaT Maastricht University @Chris_Evelo The use and needs of data sharing in biology
  • 2. Data • Things we know • Things we measure
  • 3. Knowledge is hard to get And it doesn’t even play it… But you can gamify collection Since we structure it, it can be easier to store
  • 4. Sharing Data I would like to exploit common genotype-phenotype relations between Alzheimer’s Disease and Huntington’s Disease… I need to combine AD and HD data… I can help with that! I can help with that! Source: Marcos Roos
  • 5. Who wants to share data? • People who want to use data • Funders • Publishers • But the researchers?
  • 6. You only need MS-Excel
  • 7. People hide data • I did all this work I want to reuse • They don’t need this part, might be my next… • I might get a patent on this • Or… It needs a patent to be valuable • I can’t even patent because ...
  • 8. How? • Don’t add specifics (ohh those really were knockout cells, but..) • Leave out important steps (I did these PCRs, why show the array) • And “we used an approach slightly modified from…” • ...
  • 9. FAIR data • Findable • Accessible • Interoperable • Reusable
  • 10. Sharing Data I would like to exploit common genotype-phenotype relations between Alzheimer’s Disease and Huntington’s Disease… I need to combine AD and HD data… I can help with that! I can help with that! Source: Marcos Roos
  • 11. Sharing Data Source: Marcos Roos ??? Here’s my data, have fun! Here’s my data, have fun!
  • 12. Sharing Linkable Data Source: Marcos Roos I can go straight to answering my questions with data from multiple data owners! Patients will be so pleased with this speed-up! Here’s my Linked Data, have fun! Here’s my Linked Data, have fun!
  • 13. Really? From terms “liver, hepar, hepatic tissue” To URI’s: http://identifiers.org/tissueont1/liver http://identifiers.org/tissueont2/hepar …. Just a first step
  • 14. And we didn’t even get that… Reality: Ontology inspired pull-down menu’s
  • 15. Nothing is ever “same-as” • We may need more meaningful predicates • Or learn to use the better • We need lenses, context matters
  • 16. Too many standards Source XKCD: https://xkcd.com/927/
  • 17. Too many standards And ontologies… But they are there for a reason! Research fields have different focus/needs Don’t standardise, map!
  • 18. We need mapping • Ontology mapping • Identifier mapping • Identity (text mapping) • Chemistry mapping
  • 19. We need mapping • Ontology mapping: NCBO • Identifier mapping: BridgeDb, IMS • Identity (text) mapping: Conceptwiki? • Chemistry mapping: CRS??
  • 20. There is a lot out there
  • 21. Discussed last Friday: Serum and adipose tissue amino acid homeostasis in the MHO (Badoud 2014) – Objective: Integrate metabolite and gene expression profiling to elucidate the molecular distinctions between Metabolically Healthy Obese (MHO) and Metabolically Unhealthy Obese (MUO) • Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals. – (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO • Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an overnight fast – Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men. Not very clear why->They selected samples having RNA integrity number higher than 8 – Gene expression data only for the 23 participants – No gender or biological information (e.g glucose, total triglycerides, etc) – Not initial serum metabolites concentration (only mean) – dx.doi.org/10.1021/pr500416v – Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
  • 22. Discussed last Friday: Serum and adipose tissue amino acid homeostasis in the MHO (Badoud 2014) – Objective: Integrate metabolite and gene expression profiling to elucidate the molecular distinctions between Metabolically Healthy Obese (MHO) and Metabolically Unhealthy Obese (MUO) • Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals. – (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO • Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an overnight fast – Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men. Not very clear why->They selected samples having RNA integrity number higher than 8 – Gene expression data only for the 23 participants – No gender or biological information (e.g glucose, total triglycerides, etc) – Not initial serum metabolites concentration (only mean) – dx.doi.org/10.1021/pr500416v – Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200
  • 23. Adding phenotypic data Diversity, not size, makes big data hard SAM module - small assays - diverse assays For now annotation, used after you find it
  • 24. Repositories are technology driven • Expression data • Protein data • Metabolomics data • Genetic variation data
  • 25. Repositories are technology driven • Expression data: ArrayExpress, GEO • Protein data: PRIDE • Metabolomics data: MetaboLight • Genetic variation data: dbSNP
  • 26. Start with the samples?
  • 27. Or the studies? ISA-tab inspired investigations links to studies which link to assays samples and the actual data Study capturing…
  • 28. Capturing needs meta-ontologies Examples: EFO (experimental factor ontology), eNanomapper (nanomaterials) •Combine •Map •Slim •Extend •Feed extensions back to source •Reproduce from (extended) source
  • 29. If you can find it in a database Can you find the database? Discoverable fairports? What about institute repo’s?
  • 30. If study in dbNP • Large data in repo’s (e.g. MetaboLight) • Study descriptions still hidden
  • 31. Combine with knowledge • Can you find a study by the results? • Integrate results (pathway and ontology profiles)
  • 33. Teams answering real questions • Finds needs and solutions • Combines across communities • Fun! And inspiring • Interesting, publishable results
  • 34. Starting a database is easy • What about sustainability: • Core resources need: – Long time funding – Regular monitoring • Integration in communities