Biosb2017_Repositive

We are always looking for data
Finding & Accessing
Human Genomic
Data for research
BioSB 2017
Tweets welcome
#dataeureka
@repositiveio

Genomic data is important for research
Pre-clinical
drug discovery
Diagnostics and treatments
of genetic diseases

“Consensus among researchers, clinicians,
politicians & the public that
genomics will transform biomedical
research, healthcare and lifestyle choices”
Stephan Beck, UCL
OPPORTUNITY

Genome Technology Evolution
2001: 1 human genome
2005: Personal Genome Project
Human Genome Diversity Project
HapMap
2016: 2M AstraZeneca - HLI
2008: 1000 Genomes (1092 genomes, since increased to ~2500)
Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE)
2011: H3Africa
2012: International Cancer Genome Consortium

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈ .5PB
Sequence
available
80+PB
Sequenced
every year
WGS data available
in public repos
Exponential
growth rate
Under-utilised data
has huge potential for
medical research

How many data sources?
How many sources of human
genomic data do you know about?

Hundreds of data sources
…but they aren’t easy to find!
http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here:
10 25 33 35
102
174
239
314
506
582
0
100
200
300
400
500
600
700
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-16 Sep-16 Dec-16 Mar-17
Data Sources Identified

The researchers’ pain points
FRAGMENTED

11
155
2
2
4
4
7
780
0
5
10
15
20
25
30
35
40
45
GB FI NL FR DE CH EE BE DK ES SI IE SE
0
5
10
15
20
25
30
35
CA MD MA WA NY TX AZ DC NJ NC PA UT TN CO IN FL LA VA IL ME OH MO MI SC OR
1
1
1
1
1
1
Data sources across the globe
GEO location of 278
data sources analysed.
Found by tracking IP address
of the source.
These include:
 Public Repositories
 Universities
 Companies
 BioBanks
 Research consortiums

CONFUSING

• Required by funders
• Cannot publish unless accession
number given
• Specialised for genomics
• ArrayExpress
• EGA
• dbGaP
• GEO…
• Generalist
• Dryad
• Figshare…
See http://discover.repositive.io for more
Public Repositories

FRAGMENTED
No holistic approach
to discover new data
HIDDEN

FRAGMENTED
No holistic approach
to discover new data
ADMIN
BURDEN

Open Access
• Eg. PGP, CC0
• Bermuda Accord
Managed (Restricted or Controlled Access)
• Data Access Committee
• No effective agreement (policy vacuum)
GOVERNANCE Models

Data accessibility
Can download the
data straight away
or after logging in.
Need to apply for
access to the data.
Has both Open and Restricted
access data within one repository.

Access to Restricted Data
Benefits:
• Strict governance
• Individuals are protected
• Review of consent
• Applicant signs for full
responsibility for governance
Disadvantages:
• No control of data once access
is given
• High barrier for access – too
high?

Often a long process
Bottlenecks:
• Finding relevant and usable
data
• Getting authorisation to
access data
• Formatting data
• Storing and moving data
We studied the problem with
qualitative interviews followed
by a survey of researchers in
human genetics
T. A. van Schaik et al
The need to redefine genomic data sharing: a focus on
data accessibility, Applied & Translational Genomics, 2014
http://tinyurl.com/schaik-dnadigest

NIH / eRA Commons login
No
Yes
Organisation registered with eRA
Organisation has DUNS number
No
No
Write research proposal
Yes
+ 2-3 days
+ 1-2 weeks
+ 1 week
Yes
Submit proposal
+ 1-2 days
Access granted
Find/Download/Decrypt data
+ 1-4 weeks
Science…
+ 1-2 days
PRO Tip: If you use human
genomic data, apply for the
GRU datasets in dbGaP, one
application – access to all the
GRU datasets.
dbGaP application process
Blog Post:
http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/

Sanger eDAM Account
No
Write research proposal
+ 1 hour
Yes
Submit proposal
+ 1-2 days
Access granted
Find/Download/Decrypt data
+ 2-7 days
Science…
+ 1-2 days
EGA application process
Blog Post:
http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/

Where Repositive came from…
Fiona Nielsen
FOUNDER & CEO

We are enabling best practices
MAKE DATA
DISCOVERABLE
SIMPLIFY
WORKFLOWS
CONTRIBUTE TO
COMMUNITY
A platform to make human genomic data accessible for research

1-click to human genomic data access
to make finding data as easy as finding a book
on Amazon, book a hotel on Expedia!
Repositive

Simpler workflow
for data access
Our expertise is data search platforms
Discover and
access
Search, see
related results
Find colleagues &
their data interests
Co-annotate data &
community feedback

Connecting the world of genomic data

http://discover.repositive.io
charlotte@repositive.io

Biosb2017_Repositive

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Biosb2017_Repositive

Similar to Biosb2017_Repositive (20)

Recently uploaded

Recently uploaded (20)

Biosb2017_Repositive

Editor's Notes