Presented in session 48 - Sharing of sensitive data - presented by Fiona Nielsen on September 12, 2016 at #SciDataCon http://scidatacon.org
We have addressed the most pressing problem for public genomic data, that of data discoverability, by indexing worldwide resources for genomic research data on an online platform (repositive.io) providing a single point of entry to find and access available genomic research data.
http://www.scidatacon.org/2016/sessions/48/paper/26/
http://www.scidatacon.org/2016/sessions/48/
International data week - #RDAPlenary #IDW2016
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data
1. SciDataCon: How to increase accessibility and reuse for
clinical and personal genomic data
Fiona Nielsen – September 12th 2016
2. We are always looking for data
Genetics,
Cancer,
Rare disease
research
We need
access to the
right data at
the right time
DNA
interpretation
requires
lots of data
3. How much data do you need to publish a paper?
2001: 1 human genome
2012: 1000 Genomes (1092 genomes, since increased to ~2500)
2015:
UK10K, Icelandic population (2,636 + 100k imputed),
Cancer genome atlas ~11,000 genomes
?
2016:
Exac consortium 65,000 exomes
2020:
4. Data is not easy to find and access
FRAGMENTED
Poor visibility of available
genomic data
ADMIN BURDEN
Huge overhead to manage
data access
BAD CULTURE
Lack of data sharing habits in
research culture
5. Finding and accessing data can take months
40%
48%
11%
< 1 week
1-3 months
+6 months
Time spent data scouting per project
6. Why the barrier?
Barriers
• Difficult to find data, let alone
find the RIGHT data
• Time-consuming and difficult
to apply for access to data
• Complicated and labourious to
submit data to public
repositories
http://blog.repositive.io/tag/data-access/
http://blog.repositive.io/tag/data-sharing/
7. Data access applications for sensitive data
• Benefits: strict governance, review of consent, applicant signs for full
responsibility for governance
8. Data access applications for sensitive data
• Benefits: strict governance, review of consent, applicant signs for full
responsibility for governance
• Disadvantages: No control of data once access is given, high barrier for
access – too high?
9. Alternative process – castle and moat
• Vetted users are allowed into the system where they can investigate and
analyse data.
• No raw data exports are allowed and results for export are manually
reviewed
• Example: Genomics England
10. • Allow vetted users access to privacy-preserving or manually curated
exports from the data
• Example: Browsing UK census data – available for all
Alternative process – controlled disclosure
Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx
12. Building upon best practices
MAKE DATA
DISCOVERABLE
SIMPLIFY
WORKFLOWS
CONTRIBUTE TO
COMMUNITY
DNAdigest and Repositive – Connecting the world of genomic data
http://www.tinyurl.com/plos-biology-repositive
16. Machines & Data sources
947
5600
88
660
26
68
50
62
3
25
0
0
23 International
Interesting site to look at:
http://omicsmaps.com/stats
17. Main Repository funders
BGI = 4
EBI = 9NIH = 10
NCBI = 9
The Broad = 8
Wellcome = 4
EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all
NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_
18. We have identified hundreds of data sources
Universities – Or repositories
affiliated to a university.
Projects/Consortia – Has a
specific purpose/aim. Often
focussed on a specific
research question or disease.
Public repositories – Allows
download and upload of
data from multiple
institutions.
Companies – For profit
organisations making data
available for free or as a
service.
Biobanks – many have sequence data of their biological samples.
Researchers
know on
average 4-5
data sources
More data sources appear every day,
to date we have identified 270+
19. Simpler workflow
for data access
And indexed them on a the Repositive platform
Discover and
access
Efficient Search,
see related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
Free to use: http://discover.repositive.io
20. Benefit for both sides of data collaborations
Data consumers Data producers
Find relevant data faster
Feedback from other users
through ratings and comments to
evaluate data quality
Find collaborators with data
Make your data visible
Build credibility as a trusted
provider of quality data
Find collaborators to analyse
your data
21. • Supporting the whole research workflow
• Faster, more efficient data discovery
• Streamlining data access applications
• Developing technology for efficient data access
• Setting up pre-competitive data sharing agreements
• Running workshops and training programmes
More efficient data access
Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx
22. Recap: Still a lot of work to do
Barriers
• Difficult to find data, let alone
find the RIGHT data
• Time-consuming and difficult
to apply for access to data
• Complicated and labourious to
submit data to public
repositories
http://blog.repositive.io/tag/data-access/
http://blog.repositive.io/tag/data-sharing/
23. Connecting the world of genomic data
Visit us at: http://repositive.io
Or tweet us @repositiveio
Editor's Notes
Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
Because interpretation requires LOTS of data
And although data exists around the world, it is siloed, and even if available, it is not accessible
This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer
She needs data from other patients to compare and interpret Mabels DNA
She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and vetting of users
Data is fragmented in unconnected silos – makes it very difficult to discover data
Tracking data and working with data access requests is a time-consuming and bureaucratic exercise
Difficult to build a user community without best practices and tools/platforms where users can share their data experience / findings
Because interpretation requires LOTS of data
And although data exists around the world, it is siloed, and even if available, it is not accessible
This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer
She needs data from other patients to compare and interpret Mabels DNA
She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
Public repositories: default is apply for access -> full access
Benefits: strict governance, review of consent, applicant signs for full responsibility for governance
Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
Public repositories: default is apply for access -> full access
Benefits: strict governance, review of consent, applicant signs for full responsibility for governance
Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
Public repositories: default is apply for access -> full access
Benefits: strict governance, review of consent, applicant signs for full responsibility for governance
Disadvantages: No control of data once access is given, high barrier for access – too high? (researchers giving up, even patients can’t get access to their own data)
There are many public repositories, but It can be hugely confusing to know where to look for the right kind of data
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
Because interpretation requires LOTS of data
And although data exists around the world, it is siloed, and even if available, it is not accessible
This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer
She needs data from other patients to compare and interpret Mabels DNA
She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
DNA.land
OpenSNP
PersonalGenomesProject
Direct to consumer genetic tests & microbiome