Roots tech 2016

650 views

Published on

rule your genome: democratize health
RootsTech 2016
Salt Lake City, UT

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Reason I’m sharing this story is because this study is just one small scale example of the importance of data sharing.
  • What is imputation?
  • -Common words are easier to impute. Rare words are hard, as with genetic variants.
    -variable success rates for diff populations
  • DNA.Land is nothing without use
  • -responses on social media crucial to improving and refining our algorithms.
  • We are indebted to these early adopters and to anyone who sends feedback. You, the users are our most valuable asset.
    We’re listening!
  • 1. relative matching: feedback from users help us validate our relative-matching algorithms, and also unearth interesting family structures
    2. ancestry: feedback about the expected ethnicity help us to validate and improve the Ancestry detection algorithm
    3. segment sharing: some users even go as far as providing us with their results from other websites - and that is truly helpful in refining our pipeline parameters. We are very grateful for those
  • Collecting data is not enough. How can we up our game and what can we do with the data?
  • Note: this was ONLY publicly available data that was approved by both GENI and our IRB and which is available online
  • We’ve implemented lessons we learned from previous work in DNA land that converged on these 3 topics
    Data is very noisy.
    1. clean it and 2. do some validation before we can draw any sort of conclusions

  • After we clean the data we have this enormous pedigree. Is it correct?
    1st validation step= what is obama’s bacon #? 6th cousin twice removed
  • 1. Record hyping
  • Cleaned data but is it correct?
  • Geni profile concordance with known geographic settlements
  • place of birth distances (log scale) between sibs, cousins, parent-child
    take home = 5th cousins; <1000km away. people don't move that much
  • We saw how to actually clean and validate crowd-sourced data from >40 million public profiles
    data is scientifically usable
  • Roots tech 2016

    1. 1. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1Roots Tech Rule Your Genome, Democratize Health 3 Feb 2016 @dinazielinski @dl1dl1 Dina Zielinski
    2. 2. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Proband Grandmother Cousin Exonic mutations 23,175 22,252 22,746 Rare mutations 660 591 646 Harmful 358 308 331 Shared 8 variations Goldenhar Syndrome Hemifacial microsomia
    3. 3. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1
    4. 4. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1
    5. 5. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 If it takes a village to raise a child… …it takes the world to help a child with a genetic disorder image: Victor Ngai
    6. 6. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 feedback reciprocity genealogy crowd sourcing data sharing
    7. 7. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 https://dna.land - Free to use - Not for profit - Run by scientists
    8. 8. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1
    9. 9. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land consent
    10. 10. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land consent
    11. 11. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1
    12. 12. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 legalgenealogist.com “…from the standpoint of the rules of the road, there’s no reason not to consider playing in the DNA.Land playground.” Judy G. Russell, JD
    13. 13. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 feedback reciprocity genealogy crowd sourcing data sharing
    14. 14. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land features
    15. 15. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Ancestry report
    16. 16. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Ancestry report
    17. 17. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Ancestry across all users
    18. 18. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Relative matching
    19. 19. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Relative matching sam@dna.land cherie@dna.land
    20. 20. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Relative matching
    21. 21. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Relative matching >90% of users have at least 1 match
    22. 22. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land features in depth
    23. 23. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 imputation 26 possible solutions 8 make actual words 1 probable solution You had a blue ca_ on your head.p You had a blue ca_ yesterday. b,n,p,r,t?
    24. 24. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Power to detect recent common ancestry between pairs of individuals known to be related at varying degrees. Chad D. Huff et al. Genome Res. 2011;21:768-774
    25. 25. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 NEW FEATURES
    26. 26. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Relatives of relatives Dena Dena@dna.land bruce@dna.land Bruce Cherie Cherie@dna.land Cherie@dna.land Cherie Relative of relative Actual match
    27. 27. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Import Geni profile
    28. 28. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 feedback reciprocity genealogy crowd sourcing data sharing
    29. 29. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land: the first days
    30. 30. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 DNA.Land early adopters days
    31. 31. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 CeCe Moore genetic genealogist Carl Zimmer science writer Henry Louis Gates, Jr. historian/journalist AJ Jacobs journalist/author
    32. 32. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 info@dna.land We value your feedback! Richard Aufrichtig support specialist
    33. 33. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 From:  To: info@dna.land Subject: Match relationships on my DNA match predictions I thought it might be helpful to you to know my cousin  and I are estimated to be  cousins, and we are, in fact,  cousins once removed. We have a documented paper trail with many cousin marriages, so it is a case of endogamy. Thanks for your ongoing research! From:  To: info@dna.land Subject: Ethnic Feedback Hey, I just wanted to give ethnic feedback to help improve the ancestry algorithm, at least, if the ethnic data Ancestry gave me from the Autosomal Test is any help. The regions are as follows: % Europe West % Scandinavia % Ireland % Italy/Greece % Iberian Peninsula <1% Caucasus From: :  To: info@dna.land Subject: Feedback Both  and I have published our data on gedmatch.com, and ftdna puts us in the  to  cousin range with  shared cM, with a longest block of . [...] I know that the different companies use different defaults of cMs and other data for comparison. It will be interesting to know what you find out in comparing our data. Thank you very much.
    34. 34. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 facebook.com/knowyourgenome/
    35. 35. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 feedback reciprocity genealogy crowd sourcing data sharing
    36. 36. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Social media paradigm for large pedigrees >40 million public profiles  IRB approval  Geni approval
    37. 37. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 genealogy crowd sourcing data sharing
    38. 38. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Cleaning the graph Ideal: What we see (0.4%): >2 parents Cycles Each union should contain up to 2 individuals. Biologically impossible situations… Union Individual
    39. 39. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Cleaning the graph in three steps Pre-graph Cycle removal Merging nodes Removing illegal nodes Graph Mike Ann Al Anni Bert Betty Charlie Ed Fred Bert Charlie Diane Eddie Frank Victor Brad ChrisSamHillary
    40. 40. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Cleaning the graph: removing illegal nodes Pre-graph Cycle removal Merging nodes Removing illegal nodes Graph Mike AnnAl BertBetty Charlie Ed Fred Bert Charlie Diane Eddie Frank
    41. 41. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Can we obtain large family trees? Individual Marriage A family tree with 6000 people Family tree of 13 million people… 1440 px 900px ~1 million px 70,000 (0.5% of the data)
    42. 42. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Validation using genetic markers Maternal line (mito) Paternal line (y-chr) Total edges (meioses) 1768 324 Mismatches 5 6 Error rate per edge 0.3% 2.0% Andreson, 2006
    43. 43. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Longevity: cleaning
    44. 44. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Longevity: further validation 10 20 30 40 50 60 70 80 90 100 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 1910 HMD Geni Age of death
    45. 45. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 45 50 55 60 65 70 75 80 1840 1890 1940 1990 Oeppen et al., Science, 2002 45 50 55 60 65 70 75 80 1840 1890 1940 1990 Our resource Year of death Avg.lifespan Validating life expectancy R2=0.96
    46. 46. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Use case: the genetics of longevity MZ Twins Sibs 2nd cousins 3rd-5th cousins Avuncular 1st cousins Relatives from consanguineous marriages >1 million Geni profiles with date of birth and death
    47. 47. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Big data visualization: lifespan 1550 1600 1650 1700 1750 1800 1850 1900 19500 Year of birth 45 1840 1890 1940 Yearof Death 40 5 1650 1700 1750 1800 1850 1900 1950 2000 -1-2-3 Fraction of pro les/year [log10] 1600 Yearofdeath Lifespan C 2 4 10 20 30 40 50 60 70 80 90 100 %prolesOverall: 2 6 10 14 2 6 10 14 2 6 10 14 2 6 10 20 40 60 80 100 2 4 6 0 5000 10000 #pro les Comparing Geni to HMD 0 Geni HMD 0 0 0 0 QQ plotHistograms Age of death
    48. 48. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Environment: location event + location # events BIRTH 7,352,478 RESIDENCES 1,667,895 DEATH 1,492,908 BURIAL 314,344 “died in infancy, Upshur, (West) Virginia” “Санкт-Петербург, Россия” Examples: How to convert free text into GPS coordinates? “Санкт-Петербург, Россия” Lat:59.9408 Long:29.6728 Quality: 10 Yahoo! Geoparser
    49. 49. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1
    50. 50. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Location validation
    51. 51. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Environment
    52. 52. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Where is your family?
    53. 53. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Year of birth Where is the love of your life? Quantitative anthropology Year of birth Who is the love of your life? ~4th cousins
    54. 54. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 genealogy crowd sourcing data sharing
    55. 55. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 13,900 genomes and counting! genealogy crowd sourcing data sharing It takes more than a village
    56. 56. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 https://dna.land
    57. 57. Yaniv Erlich2/3/16 Rule Your Genome, Democratize Health @dl1dl1 Acknowledgements DNA Land Yaniv Erlich Joe Pickrell Assaf Gordon Jie Yuan Tristan Hayeck Richard Munoz Mary Wahl Kevin Shi Nathan Pearson Robert Aboukhalil Goldenhar Syndrome Barak Marcus Mona Sheikh Balaji Srinivasan Clement Chu Melissa Gymrek Dror Aizenbud Funding Burroughs Wellcome Career Award Broad Institute SPARC Award Andria and Paul Heafy Whitehead Institute Geni Joanna Kaplanis Assaf Gordon Mary Wahl Mickey Gershovits Mona Sheikh Barak Marcus Pratheek Nagaraj Alkes Price Daniel MacArthur

    ×