ICG-11 - genomic data projects around the world - nov 5 2016

ICG-11: Genomic Data Projects around the World
- How to find data for your research
Fiona Nielsen – November 4th 2016

We are always looking for data
Genetics,
Cancer,
Rare disease
research
We need
access to the
right data at
the right time
DNA
interpretation
requires
lots of data

How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
?
2016:
Exac consortium 65,000 exomes
GnomAD ~126,000 exomes
2020:

Data is not easy to find and access
FRAGMENTED
Poor visibility of available
genomic data
ADMIN BURDEN
Huge overhead to manage
data access
BAD CULTURE
Lack of data sharing habits in
research culture

Finding and accessing data can take months
40%
48%
11%
< 1 week
1-3 months
+6 months
Time spent data scouting per project

Why the barrier?
Barriers
• Difficult to find data, let alone
find the RIGHT data
• Time-consuming and difficult
to apply for access to data
• Complicated and labourious to
submit data to public
repositories
http://blog.repositive.io/tag/data-access/
http://blog.repositive.io/tag/data-sharing/

But where in the world is the data?
?

How to make data easy to discover?

We have identified hundreds of data sources
Universities – Or repositories
affiliated to a university.
Projects/Consortia – Has a
specific purpose/aim. Often
focussed on a specific
research question or disease.
Public repositories – Allows
download and upload of
data from multiple
institutions.
Companies – For profit
organisations making data
available for free or as a
service.
Biobanks – many have sequence data of their biological samples.
Researchers
know on
average 4-5
data sources
More data sources appear every day,
to date we have identified 350+

Simpler workflow
for data access
And indexed them on a the Repositive platform
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
Free to use: http://discover.repositive.io

Platform launched in Sept 2016
Discover and
access
Efficient Search,
see related results
Find colleagues &
Co-annotate data &
community feedback
1 Million+
Human genomic
datasets indexed

Discover and
access
Efficient Search,
see related results
Find colleagues &
Co-annotate data &
community feedback
1 Million+
datasets indexed
Simpler workflow
for data access
177k
Whole Exomes
213k
Whole Genomes
2400
23andMe samples

Discover and
access
Efficient Search,
see related results
Find colleagues &
Co-annotate data &
community feedback
1 Million+
datasets indexed
Simpler workflow
for data access
61+
Countries
426+
Research organisations
Using Repositive
PDX Consortium
With AstraZeneca

11
155
2
2
4
4
7
780
0
5
10
15
20
25
30
35
40
45
GB FI NL FR DE CH EE BE DK ES SI IE SE
0
5
10
15
20
25
30
35
CA MD MA WA NY TX AZ DC NJ NC PA UT TN CO IN FL LA VA IL ME OH MO MI SC OR
1
1
1
1
1
1
Data sources across the globe
GEO location of 278
data sources analysed.
Found by tracking IP address
of the source.
These include:
 Public Repositories
 Universities
 Companies
 BioBanks
 Research consortiums

Data source content
Assay Types
Dedicated to…

Sequenced ethnicities
Aboriginals
African Americans
Africans
Australians
Chinese
Malays
Indians
Danish
Dutch Estonian
Russian
European Ancestry
Finnish
Icelandic
Japanese
Korean
Latin Americans
Saudi
Swedish

Machines & Data sources
947
5600
88
660
26
68
50
62
3
25
0
0
23 International
Interesting site to look at:
http://omicsmaps.com/stats

• Repositive is supporting the whole research workflow
• Faster, more efficient data discovery
• Streamlining data access applications
• Developing technology for efficient data access
• Setting up pre-competitive data sharing agreements
• Running workshops and training programmes
More efficient data access
 Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx

Building upon best practices
MAKE DATA
DISCOVERABLE
SIMPLIFY
WORKFLOWS
CONTRIBUTE TO
COMMUNITY
DNAdigest and Repositive – Connecting the world of genomic data
http://www.tinyurl.com/plos-biology-repositive
First 30 data sources listed here:

Connecting the world of genomic data
Visit us at: http://repositive.io
Or tweet us @repositiveio Free to use: http://discover.repositive.io
Fiona Nielsen, CEO
Email us: info@repositive.io

ICG-11 - genomic data projects around the world - nov 5 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to ICG-11 - genomic data projects around the world - nov 5 2016

Similar to ICG-11 - genomic data projects around the world - nov 5 2016 (20)

More from Fiona Nielsen

More from Fiona Nielsen (10)

Recently uploaded

Recently uploaded (20)

ICG-11 - genomic data projects around the world - nov 5 2016

Editor's Notes