Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Renaissance in Medicine - Strata - NoSQL and Genomics

3,271 views

Published on

Renaissance in Medicine: Next-Generation Big Data Workloads

Instead of using 1s and 0s (base2), biological software is encoded as A, T, C, and G (base4). DNA sequencers are simply devices for converting information encoded in base4 to base2. Improvements in DNA sequencing technology are happening at a rate that outstrips even Moore’s Law of Computing. As a result, the number of human genomes converted to base2 and uploaded for analysis is rapidly increasing.

Medicine is undergoing a renaissance made possible by analyzing and creating insights from this huge and growing number of genomes. Personalized medicine is simply the practical application of these insights.

In this session, I will show how ETL and MapReduce can be applied in a clinical setting. I will also show how NoSQL and advanced analytics can be used to “reverse engineer” the genetic causes of disease. Such information can be used to predict and prevent individual suffering, as well as to increase the overall health of a society.

Published in: Health & Medicine
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • can't wait to see the video from this presentation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Renaissance in Medicine - Strata - NoSQL and Genomics

  1. 1. © 2014 MapR Technologies 1 “…all men are created equal…”
  2. 2. © 2014 Star Wars V: The Empire Strikes Back MapR Technologies 2
  3. 3. © 2014 Dahm Triplets, Playboy 1998 MapR Technologies 3
  4. 4. All Men Are Created Equal • Data Scientist: “depends on the distance metric” © 2014 MapR Technologies 4 DLaw(xa, xb) DGenetic(xa, xb)
  5. 5. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 5 DGenetic(x, x) = ??? Copenhagen, Denmark 1911
  6. 6. © 2014 MapR Technologies 6 An Unusual Observation • Wilhelm Johannsen. 1911, Copenhagen – Coins the word “gene” • Discovery: sizes are not identical for genetically identical peas. The Genotype Concept of Heredity. Johannsen, 1911
  7. 7. © 2014 MapR Technologies 7 An Unusual Observation • Wilhelm Johannsen. 1911, Copenhagen – Coins the word “gene” • Discovery: sizes are not identical for genetically identical peas. DGenetic(x, x) != 0 The Genotype Concept of Heredity. Johannsen, 1911 WTF?
  8. 8. © 2014 MapR Technologies 8 An Unusual Observation • Wilhelm Johannsen. 1911, Copenhagen – Coins the word “gene” • Discovery: sizes are not identical for genetically identical peas. • Introduces new concept: – hidden and causal genotype distinct from – observed phenotype The Genotype Concept of Heredity. Johannsen, 1911
  9. 9. © 2014 MapR Technologies 9 An Unusual Observation • Wilhelm Johannsen. 1911, Copenhagen – Coins the word “gene” • Discovery: sizes are not identical for genetically identical peas. Further, size follows a Gaussian process, where P ~ G + E Var(P) ~ Var(G) + Var(E) + 2Cov(G, E) • This is the basis of quantitative genetics https://www.youtube.com/watch?v=AjI_LcQOOs4
  10. 10. © 2014 Star Wars: Episode VII MapR Technologies 10
  11. 11. © 2014 Star Wars V: The Empire Strikes Back MapR Technologies 11
  12. 12. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 12 Renaissance In Medicine: Next Generation BigData Workloads
  13. 13. © 2014 The Big Lebowski MapR Technologies 13
  14. 14. 1st Renaissance In Medicine: 1400-1700, Europe Enabling Factor • Movable Type © 2014 MapR Technologies 14 1450, Gutenberg • Weak Church • Compound Microscope 1624, Galileo • Math-driven Hypotheses 1687, Newton Effect • Rapid diffusion of ideas • New data sources Human dissection • Dense, precise data Diagrams • Paradigm shift in reasoning
  15. 15. 1st Renaissance In Medicine: 1400-1700, Europe Enabling Factor • Movable Type © 2014 MapR Technologies 15 1450, Gutenberg • Weak Church • Compound Microscope 1624, Galileo • Math-driven Hypotheses 1687, Newton’s H0 Effect • Rapid diffusion of ideas • New data sources Human dissection • Dense, precise metrics Diagrams • Paradigm shift in discovery
  16. 16. © 2014 MapR Technologies 16 2nd Renaissance In Medicine: 1900- Enabling Factor • Telecom Networks 1969, ARPANET. 2008 • Globalization • Next-Gen DNA Sequencer 1992, Lynx Therapeutics • Data-driven Hypotheses 4th scientific paradigm Effect • Rapid diffusion of ideas • New data sources GMOs, stem cells • Dense, precise metrics Human genomes • Paradigm shift in discovery
  17. 17. DNA Sequencer – Dense, Precise Metrics © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 17
  18. 18. Current Generation “Next-Generation” Sequencer • Jan 2014 – launch of Illumina XTen @ $10M (cap cost) • Produces 6T basepairs / day @ $20K (ops cost) © 2014 MapR Technologies 18 Here’s how it works…
  19. 19. © 2014 MapR Technologies 19
  20. 20. © 2014 MapR Technologies 20
  21. 21. © 2014 MapR Technologies 21
  22. 22. Current Generation “Next-Generation” Sequencer • Jan 2014 – launch of Illumina XTen @ $10M (cap cost) • Produces 6T basepairs / day @ $20K (ops cost) A few other facts… ★ 300G basepairs / human @ $1000 (medical grade genome) ★ 4M births / year (2012, United States) © 2014 MapR Technologies 22
  23. 23. ILMN HiSeq XTen (Jan 2014) $1000 Genome “Even Moore’s Law” begins in 2004 © 2014 MapR Technologies 23 Even Moore’s Law Storage: MB/$ Stein. 2010. The case for cloud computing in genome informatics DNA: bp/$
  24. 24. © 2014 MapR Technologies, confidential Impact of XTen on Genomic Medicine 6T 7K HUMANS / YEAR $4B ALL BABIES’ DNA At a cost of $20K At medical grade of 100x oversampling, $1000 / human Operating cost of 500 XTen, Capital cost of $5B BASEPAIRS / DAY …and remember, throughput doubles every 5 months
  25. 25. Ion Torrent Technology © 2014 MapR Technologies, confidential
  26. 26. © 2014 MapR Technologies, confidential
  27. 27. Nanopore Tech * Roche * Oxford Nanopore © 2014 MapR Technologies, confidential
  28. 28. Mobile Devices? IoT DNA Sensors? © 2014 MapR Technologies, confidential
  29. 29. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 29 Social Impact
  30. 30. “The genetic engineers gave him that birthmark as part of a sponsorship deal.” © 2014 ww wMap.Rt Thechenonlogeiesss.com 30
  31. 31. © 2014 MapR Technologies 31 Problem Prevention Genetically deterministic diseases • Muscular dystrophy • Cystic fibrosis • Albinism • Phenylketonuria • Hemophilia • These are all completely preventable with pre-conception and pre-natal screening http://www.nature.com/scitable/topicpage/rare-genetic-disorders-learning-about-genetic-disease-979
  32. 32. © 2014 MapR Technologies 32 Problem Prevention? • FACT: US paternity fraud rate is 1 in 20 http://pandawhale.com/post/13851/my-report-card-came-in-my-paternity-test-came-in http://en.wikipedia.org/wiki/Paternity_fraud
  33. 33. Problem? Prevention – More troubling diseases • Huntingtons disease – autosomal dominant • Inherited cancers – 10% of occurrences (!!!) • Total annual cancer spending is $50B over 1M © 2014 MapR Technologies 33 people http://www.nature.com/scitable/topicpage/rare-genetic-disorders-learning-about-genetic-disease-979 http://meps.ahrq.gov/mepsweb/data_files/publications/st80/stat80.pdf
  34. 34. © 2014 MapR Technologies 34 Genetic Architecture of Cancer Risk Inherited Cancer: Large proportion of incidence and cost http://www.cancer.gov/cancertopics/pdq/genetics/overview/healthprofessional/page1/AllPages
  35. 35. Problem? Prevention – More troubling diseases • Huntingtons disease – autosomal dominant • Inherited cancers – 10% of occurrences (!!!) • Total annual cancer spending is $50B over 1M © 2014 MapR Technologies 35 people Metrics => Diagnostics => Prognostics (domestication)
  36. 36. Problem? Prevention – More troubling diseases • Huntingtons disease – autosomal dominant • Inherited cancers – 10% of occurrences (!!!) • Total annual cancer spending is $50B over 1M Metrics => Diagnostics => Prognostics (domestication) Metrics => Diagnostics => Therapeutics (traditional medicine) © 2014 MapR Technologies 36 people
  37. 37. Singapore: Global Leader of Social Engineering “Our government wants smart ladies to meet smart guys to get smart children.” –Director, Club2040 (Singapore matchmaking agency) Domestication: the process whereby a population of living organisms is changed at the genetic level, through generations of selective breeding, to accentuate traits that ultimately benefit the interests of humans. http://www.nytimes.com/2008/04/29/world/asia/29iht-sing.1.12428974.html © 2014 MapR Technologies 37
  38. 38. © 2014 MapR Technologies 38 Science Fiction-like Unprecedented Control Dystopia / Eutopia Scenarios are Possible
  39. 39. © 2014 MapR Technologies 39 Many DNA-Based Apps Coming… 25 20 15 10 5 0 2014 2020 • 2014: US$ 2B, mostly research, mostly chemical costs • 2020: US$ 20B, mostly clinical, mostly analytics costs Macquarie Capital, 2014. Genomics 2.0: It’s just the beginning Cinical Non-Clinical
  40. 40. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 40 DNA Diagnostics
  41. 41. © 2014 MapR Technologies 41 Traditional Therapeutics Trial & Error http://www.taylorwessing.com/synapse/regulatory_personalised_medicines.html
  42. 42. Unsuitable Therapy Suitable Therapy © 2014 MapR Technologies 42 Personalized Therapeutics Trial & Error http://www.taylorwessing.com/synapse/regulatory_personalised_medicines.html
  43. 43. © 2014 http://blog.covance.com/tag/companion-diagnostics/ MapR Technologies 43
  44. 44. Let’s look at some real data… © 2014 MapR Technologies 44 http://blog.covance.com/tag/companion-diagnostics/
  45. 45. © 2014 MapR Technologies 45 Folate (aka vitamin B9, aka glutamate) Yes, of Monosodium Glutamate (MSG) fame… Essential for fetal development and brain Function Also used to detoxify
  46. 46. Folate (aka vitamin B9, aka glutamate) rich foods © 2014 MapR Technologies 46 http://www.easytolovebut.com/?p=2782
  47. 47. Folate (aka vitamin B9, aka glutamate) rich foods © 2014 MapR Technologies 47 http://www.easytolovebut.com/?p=2782
  48. 48. Folate (aka vitamin B9, aka glutamate) rich foods © 2014 MapR Technologies 48 MTHFR is Frequently Mutated (50%) => Dysfunction http://www.easytolovebut.com/?p=2782
  49. 49. © 2014 MapR Technologies 49
  50. 50. Clinical Sequencing Business Process Workflow © 2014 MapR Technologies 50 Patient Physician Clinic blood/saliva Clinical Lab Analytics extract
  51. 51. © 2014 MapR Technologies 51 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The-Link-Between- MTHFR-Gene-Mutations-and-Disease-Including-Thyroid-Health.htm Pulp Fiction Pulp Fiction & BMFwallets.com
  52. 52. © 2014 Star Wars: Episode VII MapR Technologies 52
  53. 53. © 2014 MapR Technologies 53
  54. 54. © 2014 MapR Technologies 54
  55. 55. Clinical Sequencing Business Process Workflow © 2014 MapR Technologies 55 Patient Physician Clinic blood/saliva Clinical Lab Analytics extract
  56. 56. Clinical Genomics, Information Systems Perspective © 2014 MapR Technologies 56 Stakeholder Analyst Patient Physician Compressed Structured Base4 Data Unstructured Base2 Data extract Base4=>Base2 Converter Reporting and Viz
  57. 57. Clinical Genomics, Information Systems Perspective © 2014 MapR Technologies 57 Stakeholder Analyst Patient Physician Compressed Structured Base4 Data Unstructured Base2 Data extract Base4=>Base2 Converter Reporting and Viz 1º analytics Base4=>Base2 Destroys Structure
  58. 58. Base4 => Base2 conversion, DNA fragmentation © 2014 MapR Technologies 58 Genome (base4) Sequencer DNA Fragments (base2)
  59. 59. Clinical Genomics, Information Systems Perspective 2º analytics Recovers Structure © 2014 MapR Technologies 59 Stakeholder Analyst Patient Physician Compressed Structured Base4 Data Unstructured Base2 Data extract Base4=>Base2 Converter Reporting and Viz 1º analytics Base4=>Base2 Destroys Structure
  60. 60. Clinical Genomics, Information Systems Perspective © 2014 MapR Technologies 60 Stakeholder Analyst Patient Physician ETL Reporting and Viz Data Store
  61. 61. Data Scientist © 2014 MapR Technologies 61 Clinical Genomics, Data Science Process Stakeholder Analyst Patient Physician ETL Reporting and Viz Data Store Analytics x Improve Tests
  62. 62. Improve Tests Data Scientist © 2014 MapR Technologies 62 Clinical Genomics, Political Process Stakeholder Analyst Patient Physician ETL Reporting and Viz Data Store Analytics x Improve Policy Public Health System
  63. 63. © 2014 MapR Technologies 63 Many DNA-Based Apps Coming… 25 20 15 10 5 0 2014 2020 • 2014: US$ 2B, mostly research, mostly chemical costs • 2020: US$ 20B, mostly clinical, mostly analytics costs Macquarie Capital, 2014. Genomics 2.0: It’s just the beginning Cinical Non-Clinical
  64. 64. © 2014 MapR Technologies 64 Clinical Genomics, Data Science Process Stakeholder Analyst Patient Physician ETL Reporting and Viz Data Store Analytics Secondary Analytics Downstream Analytics Experiment Design DNA Sequencing
  65. 65. © 2014 MapR Technologies 65 DNA Sequencing Value Chain 100 % Effort 0 Pre-NGS ~2000 Future ~2020 Now Experiment Design DNA Sequencing Secondary Analytics Downstream Analytics Sboner, et al, 2011. The real cost of sequencing: higher than you think!
  66. 66. Sequence is Becoming Free (Even Moore’s Law) © 2014 MapR Technologies 66 Commoditization Pattern Huge influx of inexpensive data Creates new medical and biotech use-cases 100 % Effort 0 Pre-NGS ~2000 Future ~ Now
  67. 67. Experiment Design and Downstream Analytics © 2014 MapR Technologies 67 Specialization will grow to 100% effort This is the desirable scenario Biologists ought to be doing biology 100 % Effort 0 Pre-NGS ~2000 Future ~ Now Downstream Analytics
  68. 68. © 2014 MapR Technologies 68 BigData Bottleneck Time currently being spent on BigData problems Not ideal 100 % Effort 0 Pre-NGS ~2000 Future ~ Now Physicians & Biologists need help from CS & SW Engineers
  69. 69. Optimize This © 2014 MapR Technologies 69 BigData Bottleneck Time currently being spent on BigData problems Not ideal 100 % Effort 0 Pre-NGS ~2000 Future ~ Now Physicians & Biologists need help from CS & SW Engineers
  70. 70. 2º Analytics (fragment re-assembly), Quick Overview Some of these differences occur in the personal DNA… http://www.bioinfor.© 2014 com/MapR zoom/Technologies support/tutorials.html 70
  71. 71. © 2014 MapR Technologies 71 Some Errors come from the Sequencer • This is why a “medical grade” genome is 100x oversampled http://www.medicinalgenomics.com/dna-assembly/
  72. 72. © 2014 MapR Technologies 72 Goal: Find the Real Differences Some of these differences occur in the personal DNA… http://www.bioinfor.com/zoom/support/tutorials.html
  73. 73. 1. What is the (Probable) Color of Each Column? © 2014 MapR Technologies 73
  74. 74. © 2014 MapR Technologies 74 2. Is the Probable Color Surprising? Expected
  75. 75. Row-major strategies: inefficient with CPU or Mem Expected © 2014 MapR Technologies 75
  76. 76. © 2014 MapR Technologies 76 Strategy 1: foreach column, foreach row O(rows*cols) + Expected O(1 col) mem
  77. 77. Strategy 2: foreach row, keep running tallies O(rows) + Expected O(rows*cols) mem © 2014 MapR Technologies 77
  78. 78. Strategy 3: rotate => foreach (transposed) column © 2014 MapR Technologies 78 Expected O(rows log rows) + O(cols) + O(1 col) mem REQUIRES SORT Personal variants
  79. 79. Fortunately, Hadoop is Really Good at Sorting © 2014 MapR Technologies 79
  80. 80. Comparison of Strategies Strategy 1 Strategy 2 Strategy 3 O(rows*cols) © 2014 MapR Technologies 80 + O(1 col) mem O(rows) + O(rows*cols) mem O(rows log rows) + O(cols) + O(1 col) mem Compute Intense Memory Efficient Compute Efficient Memory Intense Compute Efficient Memory Efficient
  81. 81. Comparison of Strategies Strategy 1 Strategy 2 Strategy 3 O(rows*cols) © 2014 MapR Technologies 81 + O(1 col) mem O(rows) + O(rows*cols) mem O(rows log rows) ÷ shards + O(cols) ÷ shards + O(1 col) mem OK for small matrixes Infinitely attractive for infinitely large matrixes
  82. 82. Comparison of Strategies Strategy 1 Strategy 2 Strategy 3 O(rows*cols) © 2014 MapR Technologies 82 + O(1 col) mem O(rows) + O(rows*cols) mem O(rows log rows) ÷ shards + O(cols) ÷ shards + O(1 col) mem Distributed matrix math techniques are common in tech, but not common in genomics Some commercial leaders: Seven Bridges Genomics, Ion Flux Twitter Algebird: Matrix math on Hadoop
  83. 83. © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 83 Paradigm Shift in Discovery
  84. 84. All next-gen drugs will require a “companion diagnostic”… …as part of Rx to determine your personal response segment © 2014 MapR Technologies 84 Patient Physician Patient Physician ETL Reporting and Viz Data Store
  85. 85. © 2014 MapR Technologies 85 Personalized Therapeutics Unsuitable Therapy Suitable Therapy http://www.taylorwessing.com/synapse/regulatory_personalised_medicines.html
  86. 86. How to Determine Response Segment? EHR Mining Building the graph of medicine from millions of clinical narratives. Finlayson, 2014 © 2014 MapR Technologies 86
  87. 87. How to Determine Response Segment? EHR Mining Metrics => Diagnostics => Prognostics Metrics => Diagnostics => Therapeutics © 2014 MapR Technologies 87
  88. 88. How to Determine Response Segment? EHR Mining Metrics => Diagnostics => Prognostics Metrics => Diagnostics => Therapeutics © 2014 MapR Technologies 88
  89. 89. How to Determine Response Segment? EHR Mining © 2014 MapR Technologies 89
  90. 90. Personal Genome in EHR => Better Therapeutics © 2014 MapR Technologies 90 Personal Genome
  91. 91. Personal Genome in EHR => Prognostic Medicine © 2014 MapR Technologies 91 Personal Genome >Predict personal outcome >Preventative action
  92. 92. © 2014 MapR Technologies 92 How to Improve? http://www.cancer.gov/cancertopics/pdq/genetics/overview/healthprofessional/page1/AllPages
  93. 93. Strategy 1: Improve Diagnostics & Therapeutics © 2014 MapR Technologies 93 Improve Health with Personalized Therapy http://www.cancer.gov/cancertopics/pdq/genetics/overview/healthprofessional/page1/AllPages
  94. 94. Strategy 2: Improve Prognostics => Domesticate © 2014 MapR Technologies 94 Reshape / Remove
  95. 95. Strategy 2: Improve Prognostics => Domestication © 2014 MapR Technologies 95 http://www.fao.org/docrep/field/009/v8720e/v8720e04.htm
  96. 96. Strategy 2: Improve Prognostics => Domestication Beware: Law of Unintended Consequences From: The Unanticipated Consequences of Purposive Social Action” Merton, 1936 © 2014 MapR Technologies 96 http://www.fao.org/docrep/field/009/v8720e/v8720e04.htm
  97. 97. © 2014 Star Wars V: The Empire Strikes Back MapR Technologies 97
  98. 98. © 2014 MapR Technologies 98 “When Racism was Science”, 10/13 in • Current Exhibit at NYU (thru March 2015) • Reproduction of the “Eugenics Record Office” • @ Cold Spring Harbor Laboratory (30min from NYC) • Supported by many social progressives (Carnegie Institute, Rockefeller family, Harvard) • “There were many prominent New Yorkers involved in eugenics […] It was initially about how to become more efficient as a modern society. • Today, viewed as a scientific disgrace by Americans. What about later? http://www.nytimes.com/2014/10/14/science/haunted-files-the-eugenics-record-office-recreates-a-dark-time-in-a-laboratorys-past.html
  99. 99. © 2014 MapR Technologies 99 How to Architect a Solution? http://www.cancer.gov/cancertopics/pdq/genetics/overview/healthprofessional/page1/AllPages
  100. 100. © 2014 MapR Technologies 100 How to Architect a Solution? Personal Genome Predict personal outcome
  101. 101. © 2014 MapR Technologies 101 How to Architect a Solution? Personal Genome Predict personal outcome Let’s look at a existing system that’s similar…
  102. 102. Largest Biometric Database in the World © 2014 MapR Technologies 102 1.2B PEOPLE PEOPLE
  103. 103. © 2014 MapR Technologies 103 Why Create Aadhaar? • India: 1.2 billion residents – 640,000 villages, ~60% lives under $2/day – ~75% literacy, <3% pay income tax, <20% have bank accounts – ~800 million mobile, ~200-300 million migrant workers • Govt. spends about $25-40 billion on direct subsidies – Residents have no standard identity document – Most programs plagued with ghost and multiple identities causing leakage of 30-40%
  104. 104. © 2014 MapR Technologies 104 Why Create Aadhaar? • India: 1.2 billion residents – 640,000 villages, ~60% lives under $2/day – ~75% literacy, <3% pay income tax, <20% have bank accounts – ~800 million mobile, ~200-300 million migrant workers • Govt. spends about $25-40 billion on direct subsidies – Residents have no standard identity document – Most programs plagued with ghost and multiple identities causing leakage of 30-40% Standardize identity => Stop leakage
  105. 105. © 2014 MapR Technologies 105 Aadhaar Biometric Capture & Index Raw Digital Fingerprint
  106. 106. © 2014 MapR Technologies 106 Aadhaar Biometric ID Creation F(x): unique features G(x): uncommon features H(x): other features • 600 to 800 million people loaded in 4 years • In production – 1 million registrations/day – 200+ trillion lookups/day • All built on MapR-DB (HBase)
  107. 107. © 2014 MapR Technologies 107 Aadhaar Biometric ID Creation F(x): unique features G(x): uncommon features H(x): other features • 600 to 800 million people loaded in 4 years • In production – 1 million registrations/day – 200+ trillion lookups/day • All built on MapR-DB (HBase)
  108. 108. © 2014 MapR Technologies 108 How Does this Relate to Genomics? F(x): unique features G(x): uncommon features H(x): other features Same data shape and size • Aadhaar: 1B humans, 5MB minutia • Genome: 6B humans, ~3M variants
  109. 109. © 2014 MapR Technologies 109 How Does this Relate to Genomics? Same data set operations F-1(x): common features F(x): unique features G(x): uncommon features H(x): other features Same data shape and size • Aadhaar: 1B humans, 5MB minutia • Genome: 6B humans, ~3M variants • Genome: variant × phenotype • Common variant => causal gene F-1(x) !
  110. 110. © 2014 MapR Technologies 110 How Does this Relate to Genomics? Same data set operations F-1(x): common features F(x): unique features Same set operations Same data access pattern Same algorithm / memory complexity
  111. 111. © 2014 MapR Technologies 111 How Does this Relate to Genomics? Join and Regress Phenotype ~ Genotype Generate new Hypotheses
  112. 112. There’s Something Bigger Happening… © 2014 © 201 M4 aMpaRp RTe Tcehcnhonloogloiegsies 112
  113. 113. Medical Renaissance is Part of a Paradigm Shift © 2014 MapR Technologies 113 • The Structure of Scientific Revolutions, Kuhn, 1962 http://edtosavetheworld.com/2014/05/01/change-theory-do-we-need-a-new-paradigm/
  114. 114. © 2014 MapR Technologies 114 1st Paradigm: Observational (pre)Science measurement 1st paradigm: observational science description observation theory
  115. 115. simulation © 2014 MapR Technologies 115 2nd Paradigm: Theoretical (pre)Science measurement 1st paradigm: observational science description observation theory explanation 2nd paradigm: theoretical science mathematics computing
  116. 116. simulation © 2014 MapR Technologies 116 3rd Paradigm: Experimental Science measurement 1st paradigm: observational science description observation theory explanation 2nd paradigm: theoretical science mathematics computing directed control channel observation 3rd paradigm: experimental science
  117. 117. Medical Renaissance is Part of a Paradigm Shift © 2014 MapR Technologies 117 • The Structure of Scientific Revolutions, Kuhn, 1962
  118. 118. 4th Paradigm: eScience, enabled by BigData explore simulate explain © 2014 MapR Technologies 118 4th paradigm: eScience accumulated knowledge computing big data engine => “ escape velocity”
  119. 119. © 2014 http://en.wikipedia.org/wiki/The_Structure_of_ MSapRc Tieechnnotliofgiices_Revolution1s19 We are here
  120. 120. © 2014 MapR Technologies, confidential Thanks! Questions? @allenday, @mapr linkedin.com/in/allenday aday@mapr.com “Good news – I hear the paradigm is shifting” ~ New Yorker, Charles Barsotti

×