Open research cambridge December 16 2013 - presentation by Fiona Nielsen DNAdigest


On December 16th 2013, Fiona Nielsen from DNAdigest gave a presentation at the Open Research meetup in Cambridge. The meetup organiser had invited DNAdigest to participate in a discussion on genomics and data sharing. Keren introduced the evening with a video explaining what a genome is and what it means to have your genome sequenced. Fiona gave a general presentation on the field of genetics data sharing including topics such as data sharing for research, patient consent and direct-to-consumer genetic testing.
The audience was an approximately 50/50 division of researchers vs other professions which gave an insightful discussion on the advantages of genetic research and the potential risks of data sharing and the high hopes of the impact of genetics on the future of medicine.
The Open Research meetup is a meetup group initiated by members of the Open Knowledge Foundation. You can read about their upcoming events on their OKFN web page:

Read more about DNAdigest online:
and follow the Twitter feed @DNAdigest

  1. 1. Secure the data – share the knowledge Open Research Cambridge Fiona Nielsen, December 16, 2013
  2. 2. Take home messages • DNA sequencing = exciting new opportunities + new challenges • Your genome is your data (like your EHR) • Options for sharing today are limited mostly to all-or-nothing • Take the opportunity to voice your opinion My aim with this talk: to give you a little view of how you can share genetic data today, and give you an idea of the challenges involved. I will end the presentation with a brief introduction to the DNAdigest project, and then open the floor for questions and discussion.
  3. 3. Data is donated to research Individuals are offered to opt-in their consent for their data to be used for research to aid development of diagnostics and treatments for genetic diseases Genetic data is needed for research into inborn illnesses, heriditary diseases, rare diseases and cancer.
  4. 4. Genome research today the patient the researcher the sample the data
  5. 5. Direct-to-consumer genetic testing You can order your own personal genotyping kit online for only $99 from You can assess your own carrier status for known disease genes before you get pregnant, for example You can obtain non-invasive pre-natal testing by detecting foetus DNA in the mothers blood, example You can have your whole genome sequenced for about $7,000
  6. 6. Example: 23andme Manuel Corpas used direct-to-consumer testing for himself and his family
  7. 7. The “Corpasome” Family genome and analysis published open access online • • • Deceased 1M 23andMe • v3 SNP chip • • • • • • • Age: 75 1M 23andMe v3 SNP chip 15,823,554 HiSeq Exome PE Reads Age: 51 1M 23andMe v3 SNP chip 14,123,580 HiSeq Exome PE Reads • • • • Age: 79 1M 23andMe v3 SNP chip 15,190,489 HiSeq Exome PE Reads Age: 36 0.5M 23andMe v2 SNP chip 32,116,828 HiSeq Exome SE Reads Metagenomics
  8. 8. The “Corpasome” Family genome and analysis published open access online open access online as a free resource for research
  9. 9. But what about privacy? There is large variation between individuals, and we are all unique This means that your genome sequence can identify you - or your heritage Similarly, your medical record may contain information that is unique to you 3,000,000,000bp ~ 3billion basepairs in the human genome
  10. 10. Consent for research The head of the research project will create a custom consent form: - Purpose - How, when and who
  11. 11. Consent forms vary Guidelines from NHS on informed consent:
  12. 12. Consent forms vary Guidelines from NHS on privacy: Example wordings
  13. 13. Consent is obtained in the interest of the patient However, Contact with patient may be lost after data collection The institution acts as the custodian of the collected data,
  14. 14. and data is locked up Institutions do not freely share data because revealing entire datasets breaches confidentiality Data access is restricted if available at all If available, access requires specific application per dataset per project
  15. 15. Results are published When sufficient data is collected for a project, an analysis is made and a paper may be published to report the results. Results from confidential data are reported in anonymized form Anonymization = removal of identifiers • Name, birthdate, NHS number, town of birth
  16. 16. Published information Level of data detail varies per project and per publication. Rare disease research usually includes family pedigrees and detailed description of the disease symptoms per individual GWAS studies usually include only aggregated statistics
  17. 17. Problem for data sharing Trade-off: details are necessary for data re-use! Restricted access repositories Completely public data • • • • • Advantage: access to complete datasets of genetics and medical data Disadvantage: cumbersome, timeconsuming and slow processing of application for access Disadvantage: difficult to discover the data you need • • Advantage: Easy access Disadvantage: if medical data is removed = no value for research Disadvantage: no guarantee of privacy Example: The Personal Genomes Project
  18. 18. Limitations of current mechanisms • Not easy to discover data • Not easy to apply for access to data • Not easy to deal with bulk datasets As a consequence: • • • • Researchers do not cross-check their results Data is not re-used for analysis Researchers duplicate existing work Results are published based on small sample sizes
  19. 19. What if? • • • • Every individuals would be custodian of their own data? What if there would be different ways of sharing data? What if you could share just part of your data? What if the consent form included options for the level of sharing of the data? • What if you gave patients the option to share their data with no restrictions? • What if you could share data in aggregate statistics? • What if you could share your data today and change your mind tomorrow?
  20. 20. New approaches • Crowdsourcing of genetic testing results: #freethedata for breast cancer genes BRCA1 and BRCA2 • Share your 23andme data with OpenSNP • Control your EHR with PatientsKnowBest • DNAdigest: Allow sharing of aggregated data to enable discovery and faster access for research
  21. 21. Our mission To create a self sustainable platform that supports the widest possible sharing and access of genomic data in accordance with patient consent.
  22. 22. DNAdigest is designing an ethical data sharing platform Allowing hypothesis centered queries, returning anonymised aggregated data by patented mechanism
  23. 23. Results are delivered as anonymised aggregated statistics DEMO
  24. 24. Further reading • • • • • What to consider before undergoing a DNA test article in the Wall Street Journal Manuel Corpas blog: Interview about the DNAdigest project on Genetic Privacy Network (launched Dec 2013) resources about the risks and legal issues for US residents Anonymization and re-identification: Routes for breaching and protecting genetic privacy by Erlich and Narayanan
  25. 25. Take home messages • DNA sequencing = exciting new opportunities + new challenges • Your genome is your data (like your EHR) • Options for sharing today are limited mostly to all-or-nothing • Take the opportunity to voice your opinion • It is question time!
  26. 26. Thanks for listening And thanks to OpenResCam and Panton Arms for hosting! DNAdigest is a not-for-profit organisation, founded for the purpose of enabling faster and easier access and sharing of genomic data for research. Please visit us at and on twitter @dnadigest Thank you!