Crowdsourcing Genome
                           23.01.12, Bastian Greshake
Wide Association Studies
some words about me

• BSc in Life Sciences (2010)

• Working at Biodiversity & Climate
  Research Center (since 2010)

• MSc studies at the Goethe University in
  Frankfurt/Main (since 2011)

• Not exactly a biologist with much
  professional background in human
  genetics, but...
some words about me

• some background in data mining (mainly
  transcriptomics)

• some experience with web applications

• interest in social media & crowd-sourcing

• customer of DTC genetic testing myself
finding DTC results up to now
mining DTC genetic
tests

• results are hidden somewhere
  on the web

• often no phenotypic annotation

• not easily re-usable
let’s code it:

• wants to be a central repository for sharing DTC results

• enables users to share phenotypes as well

• lowers barrier to participate

• motivation to share through benefits for users

• can we take it a step further and provide data for GWAS?
mining DTC genetic tests

• lots of potential for open data (100k+ customers)

• cheap data source for scientists
                                 Would you share DTC test results? (n=226)



                                                 6 %


                                        26 %



                                                            68 %
                     Yes
                     Only with DTC company
                     No
the front
technical implementation

     • framework: Ruby on Rails

     • database: PostgreSQL

     • task management via resque (known of GitHub)

     • basic API via JSON-queries
other resources

• Personal Genome Project

     • data is open

     • participation not
Personal Genome
          Project
other resources

• Personal Genome Project

      • data is open

      • participation not

      • no easy way to download data, no API etc.

• genomera

      • participation will be open (currently invited beta)

      • focus on small scale studies/experiments
genomera
problems & potential of patient driven/crowd-
sourced research

• problems

    • sample sizes

    • bias in participants

    • motivation of participants

    • accuracy of data

• potential

    • possible sample sizes

    • low costs

    • "warm fuzzy feeling inside" for patients
positive examples: PatientsLikeMe

• around since ~2006

• published a dozen studies since then

• famous example: ALS research on lithium carbonate
  intake (149 patients, 447 controls)




                          Paul Wicks et al. (2011) Accelerated clinical discovery using self-reported patient data
                          collected online and a patient-matching algorithm, Nature Biotechnology 29, 411–414
positive examples: 23andMe

• published some studies in 2010/2011

• done with self-reported data

• studies include 10.000+ to 30.000+ participants
positive examples: 23andMe – general traits




“
Replications of associations [...] for hair color, eye color,
and freckling validate the Web-based, self-reporting
paradigm. The identification of novel associations for hair
morphology [...], freckling [...], the ability to smell the
methanethiol produced after eating asparagus [...], and
photic sneeze reflex [...] illustrates the power of the
approach.



                           Nicolas Eriksson et al. (2010) Web-Based, Participant-Driven Studies Yield Novel
                           Genetic Associations for Common Traits. PLoS Genet 6(6): e1000993. doi:10.1371/
                           journal.pgen.1000993
positive examples: 23andMe – Parkinson’s Disease




“
We discovered two novel, genome-wide significant
associations with [Parkinson’s Disease]—both replicated
in an independent cohort. We also replicated 20
previously discovered genetic associations (including
LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region),
providing support for our novel study design.




                        Chuong B. Do et al. (2011) Web-Based Genome-Wide Association Study Identifies
                        Two Novel Loci and a Substantial Genetic Component for Parkinson's Disease. PLoS
                        Genet 7(6): e1002141. doi:10.1371/journal.pgen.1002141
Quantified Self and Science
Quantified Self Movement
QS projects

• tracking health in response to work-outs (minimizing
  impacts of disease/genetic predisposition)

• track response to different drugs

• tracking well-being in response to eating habits (butter vs
  arithmetics)
butter vs arithmetics




                        source: Seth Roberts - quantifiedself.com
my conclusions

• technology enables new kinds of research

• DTC results and patient driven research can lead to new
  scientific knowledge

• can be a valuable addition to traditional research
openSNP: now & future

• won the Mendeley/PLoS Binary Battle in 2011

• got some funding of the German WikiMedia foundation to
  get more people genotyped

• collaborating with consent to research to get IRB
  approved consent-process

• working on implementing the Distributed Annotation
  System
thanks for your attention
                            source: xkcd.com
                            CC-BY-NC

openSNP - Crowdsourcing Genome Wide Association Studies

  • 1.
    Crowdsourcing Genome 23.01.12, Bastian Greshake Wide Association Studies
  • 2.
    some words aboutme • BSc in Life Sciences (2010) • Working at Biodiversity & Climate Research Center (since 2010) • MSc studies at the Goethe University in Frankfurt/Main (since 2011) • Not exactly a biologist with much professional background in human genetics, but...
  • 3.
    some words aboutme • some background in data mining (mainly transcriptomics) • some experience with web applications • interest in social media & crowd-sourcing • customer of DTC genetic testing myself
  • 4.
  • 5.
    mining DTC genetic tests •results are hidden somewhere on the web • often no phenotypic annotation • not easily re-usable
  • 6.
    let’s code it: •wants to be a central repository for sharing DTC results • enables users to share phenotypes as well • lowers barrier to participate • motivation to share through benefits for users • can we take it a step further and provide data for GWAS?
  • 7.
    mining DTC genetictests • lots of potential for open data (100k+ customers) • cheap data source for scientists Would you share DTC test results? (n=226) 6 % 26 % 68 % Yes Only with DTC company No
  • 8.
  • 12.
    technical implementation • framework: Ruby on Rails • database: PostgreSQL • task management via resque (known of GitHub) • basic API via JSON-queries
  • 14.
    other resources • PersonalGenome Project • data is open • participation not
  • 15.
  • 16.
    other resources • PersonalGenome Project • data is open • participation not • no easy way to download data, no API etc. • genomera • participation will be open (currently invited beta) • focus on small scale studies/experiments
  • 17.
  • 18.
    problems & potentialof patient driven/crowd- sourced research • problems • sample sizes • bias in participants • motivation of participants • accuracy of data • potential • possible sample sizes • low costs • "warm fuzzy feeling inside" for patients
  • 19.
    positive examples: PatientsLikeMe •around since ~2006 • published a dozen studies since then • famous example: ALS research on lithium carbonate intake (149 patients, 447 controls) Paul Wicks et al. (2011) Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm, Nature Biotechnology 29, 411–414
  • 20.
    positive examples: 23andMe •published some studies in 2010/2011 • done with self-reported data • studies include 10.000+ to 30.000+ participants
  • 21.
    positive examples: 23andMe– general traits “ Replications of associations [...] for hair color, eye color, and freckling validate the Web-based, self-reporting paradigm. The identification of novel associations for hair morphology [...], freckling [...], the ability to smell the methanethiol produced after eating asparagus [...], and photic sneeze reflex [...] illustrates the power of the approach. Nicolas Eriksson et al. (2010) Web-Based, Participant-Driven Studies Yield Novel Genetic Associations for Common Traits. PLoS Genet 6(6): e1000993. doi:10.1371/ journal.pgen.1000993
  • 22.
    positive examples: 23andMe– Parkinson’s Disease “ We discovered two novel, genome-wide significant associations with [Parkinson’s Disease]—both replicated in an independent cohort. We also replicated 20 previously discovered genetic associations (including LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region), providing support for our novel study design. Chuong B. Do et al. (2011) Web-Based Genome-Wide Association Study Identifies Two Novel Loci and a Substantial Genetic Component for Parkinson's Disease. PLoS Genet 7(6): e1002141. doi:10.1371/journal.pgen.1002141
  • 23.
  • 24.
  • 25.
    QS projects • trackinghealth in response to work-outs (minimizing impacts of disease/genetic predisposition) • track response to different drugs • tracking well-being in response to eating habits (butter vs arithmetics)
  • 26.
    butter vs arithmetics source: Seth Roberts - quantifiedself.com
  • 27.
    my conclusions • technologyenables new kinds of research • DTC results and patient driven research can lead to new scientific knowledge • can be a valuable addition to traditional research
  • 28.
    openSNP: now &future • won the Mendeley/PLoS Binary Battle in 2011 • got some funding of the German WikiMedia foundation to get more people genotyped • collaborating with consent to research to get IRB approved consent-process • working on implementing the Distributed Annotation System
  • 29.
    thanks for yourattention source: xkcd.com CC-BY-NC