Use of data

Keynote presented at the
Phenotype Foundation first annual
meeting.
Amsterdam, January 18, 2016
Prof. Chris Evelo
Department Bioinformatics –
BiGCaT
Maastricht University
@Chris_Evelo
The use and needs of data sharing in biology

Data
• Things we know
• Things we measure

Knowledge is hard to get
And it doesn’t even play it…
But you can gamify collection
Since we structure it, it can be easier to store

Sharing Data
I would like to exploit common genotype-phenotype relations
between Alzheimer’s Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help with
that!
I can help with
that!
Source: Marcos Roos

Who wants to share data?
• People who want to use data
• Funders
• Publishers
• But the researchers?

People hide data
• I did all this work I want to reuse
• They don’t need this part, might be my next…
• I might get a patent on this
• Or… It needs a patent to be valuable
• I can’t even patent because ...

How?
• Don’t add specifics
(ohh those really were knockout cells, but..)
• Leave out important steps
(I did these PCRs, why show the array)
• And “we used an approach slightly modified
from…”
• ...

FAIR data
• Findable
• Accessible
• Interoperable
• Reusable

Sharing Data
Source: Marcos Roos
???
Here’s my data,
have fun!
Here’s my data,
have fun!

Sharing Linkable Data
Source: Marcos Roos
I can go straight to answering my questions with data from
multiple data owners!
Patients will be so pleased with this speed-up!
Here’s my
Linked Data,
have fun!
Here’s my
Linked Data,
have fun!

Really?
From terms “liver, hepar, hepatic tissue”
To URI’s:
http://identifiers.org/tissueont1/liver
http://identifiers.org/tissueont2/hepar
….
Just a first step

And we didn’t even get that…
Reality:
Ontology inspired pull-down menu’s

Nothing is ever “same-as”
• We may need more meaningful predicates
• Or learn to use the better
• We need lenses, context matters

Too many standards
Source XKCD: https://xkcd.com/927/

Too many standards
And ontologies…
But they are there for a reason!
Research fields have different focus/needs
Don’t standardise, map!

We need mapping
• Ontology mapping
• Identifier mapping
• Identity (text mapping)
• Chemistry mapping

We need mapping
• Ontology mapping: NCBO
• Identifier mapping: BridgeDb, IMS
• Identity (text) mapping: Conceptwiki?
• Chemistry mapping: CRS??

Discussed last Friday:
Serum and adipose tissue amino acid homeostasis in
the MHO (Badoud 2014)
– Objective: Integrate metabolite and gene expression profiling to elucidate the
molecular distinctions between Metabolically Healthy Obese (MHO) and
Metabolically Unhealthy Obese (MUO)
• Conclusion: SAT gene expression profiling revealed that genes related to branched-chain amino acid catabolism and the tricarboxylic
acid cycle were less down-regulated in MHO individuals compared to MUO individuals. Together, this integrated analysis revealed
that MHO individuals have an intermediate amino acid homeostasis compared to LH and MUO individuals.
– (Diabetes Risk Assessment study) 3 groups: Lean Healthy (LH), MHO and MUO
• Fasting serum samples from all participants and adipose tissue from the periumbilical region under local anesthesia after an
overnight fast
– Initially 30 participants, 10 in each group (7 women, 3 men), but for the Microarray
Analysis they analyzed SAT from 7 LH, 8 MHO and 8 MUO each group having 2 men.
Not very clear why->They selected samples having RNA integrity number higher than
8
– Gene expression data only for the 23 participants
– No gender or biological information (e.g glucose, total triglycerides, etc)
– Not initial serum metabolites concentration (only mean)
– dx.doi.org/10.1021/pr500416v
– Data can be found: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55200

Adding phenotypic data
Diversity, not size, makes big data hard
SAM module
- small assays
- diverse assays
For now annotation, used after you find it

Repositories are technology driven
• Expression data
• Protein data
• Metabolomics data
• Genetic variation data

Repositories are technology driven
• Expression data: ArrayExpress, GEO
• Protein data: PRIDE
• Metabolomics data: MetaboLight
• Genetic variation data: dbSNP

Or the studies?
ISA-tab inspired
investigations links to studies
which link to assays
samples
and the actual data
Study capturing…

Capturing needs meta-ontologies
Examples:
EFO (experimental factor ontology),
eNanomapper (nanomaterials)
•Combine
•Map
•Slim
•Extend
•Feed extensions back to source
•Reproduce from (extended) source

If you can find it in a database
Can you find the database?
Discoverable fairports?
What about institute repo’s?

If study in dbNP
• Large data in repo’s (e.g. MetaboLight)
• Study descriptions still hidden

Combine with knowledge
• Can you find a study by the results?
• Integrate results
(pathway and ontology profiles)

Teams answering real questions
• Finds needs and solutions
• Combines across communities
• Fun! And inspiring
• Interesting, publishable results

Starting a database is easy
• What about sustainability:
• Core resources need:
– Long time funding
– Regular monitoring
• Integration in communities

Use of data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Use of data

Similar to Use of data (20)

Recently uploaded

Recently uploaded (20)

Use of data