Presenting at The Data Dialogue. Time to Share: Navigating Boundaries & Benefits - Afternoon session: Sharing difficult data.
July 28 - 2016 @ University of Cambridge
http://www.ses.ac.uk/event/data-dialogue-time-share-navigating-boundaries-benefits/
In this talk I present an overview of human genomic data sources around the world, their funding, access policies and type of data they contain. Discussing why data sharing is hard, including issues of data privacy and a research culture that does not incentivise sharing of data and results.
Presented by Fiona Nielsen, founder and CEO of Repositive
http://repositive.io
1. Human Genomic Data Discoverability
Fiona Nielsen – Data Dialogue, Cambridge – July 28th 2016
2. The surge of genomics data
• High throughput technologies – biology is moving from the lab to the
computer
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Genomes Sequenced
80+PB
Sequenced
every year
4. Where is the data?
• A researcher in human genomics knows on average 4-5 data sources
The need to redefine data sharing: http://www.sciencedirect.com/science/article/pii/S2212066114000386
5. Hundreds of data sources
• Content overview of 163 data sources
Assay Types
Dedicated to…
6. Hundreds of data sources
• Sizes vary from tens to 100s of thousands of samples
1
10
100
1000
10000
100000
1000000
Sample#(Log10)
Top 5:
GEO (1.8M)
PMI Cohort Program (1M)
Auria Biopankki (1M)
EGA (~0.6M)
SRA (~0.5M)
7. Which populations are represented?
Aboriginals
African Americans
Africans
Australians
Chinese
Malays
Indians
Danish
Dutch Estonian
Russian
European Ancestry
Finnish
Icelandic
Japanese
Korean
Latin Americans
Saudi
Swedish
8. Where does the data come from?
947560
0
8
8
66
0
2
6
6
8
50
62
3
2
5
0
0
2
3
International
Interesting site to look at:
http://omicsmaps.com/stats
9. Why is some data not shared?
• Challenges for international research community: How to work across
borders and silos?
10. Why is some data not shared?
• Additional challenges for biomedical: Data privacy, data governance,
patient consent, medical legislation
12. What needs to change?
• Increased data visibility and accessibility positively benefit both
researchers and patients
?
13. Pain points
FRAGMENTED
Poor visibility of available
genomic data
ADMIN BURDEN
Huge overhead to manage
data access
BAD CULTURE
Lack of data sharing habits in
research culture
15. Panel discussion
• What are best practices for sharing difficult data?
FAIR data: Findable, Accessible, Interoperable, Reuseable
16. Translating and Commercialising Genomic Research
7-9 December 2016| Wellcome Genome Campus, Hinxton, Cambridge UK
Applications open soon!
Scientific programme committee
Emmanuelle Astoul Wellcome Trust Sanger Institute, UK
Fiona Nielsen Repositive/DNAdigest, UK
Abel Ureta-Vidal Eagle Genomics, UK
Ross Rounsevell Wellcome Trust Sanger Institute, UK
Full details at:
www.wellcomegenomecampus.org/coursesandconferences
Topics will include:
• Commercial opportunities arising from data aggregation
• Exploiting bioinformatics tools
• Externalising bioinformatics pipelines
• Translating biomarkers, genetic signatures or gene panels
17. CEO Fiona Nielsen, fiona@repositive.io
Try our free platform for discovering human genomic data http://repositive.io
Follow us on twitter @repositiveio
Editor's Notes
Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
Falling cost of sequencing and techological advances
General intro to the subject – assume audience are novices but from technical background
What’s hot, what’s not
Major recent advances
Key tactical challenges
Strategic issues faced
Relevance to Pistoia Alliance activity and strategy (if appropriate)
General intro to the subject – assume audience are novices but from technical background
What’s hot, what’s not
Major recent advances
Key tactical challenges
Strategic issues faced
Relevance to Pistoia Alliance activity and strategy (if appropriate)
The need to redefine data sharing: http://www.sciencedirect.com/science/article/pii/S2212066114000386
DNA.land
OpenSNP
PersonalGenomesProject
Direct to consumer genetic tests & microbiome
In the light of the increasing costs of drug development, this is an opportunity not to miss!
The Repositive platform is an online community and marketplace connecting data consumers with data providers.
On Repositive, Jenn has
Easy, Interactive search
Faster data access workflow
Easy access to new data collaborators
Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility
The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
Data is fragmented in unconnected silos – makes it very difficult to discover data
Tracking data and working with data access requests is a time-consuming and bureaucratic exercise
Difficult to build a user community without best practices and tools/platforms where users can share their data experience / findings
General intro to the subject – assume audience are novices but from technical background
What’s hot, what’s not
Major recent advances
Key tactical challenges
Strategic issues faced
Relevance to Pistoia Alliance activity and strategy (if appropriate)
Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data