Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Genomics isn't Special

827 views

Published on

The science driving genomic analyses is rapidly changing, but the operational problems of processing data from DNA sequencers quickly and reliably are not new.

I present an analysis of the parallels in the fundamental limiting components of the '90s internet boom and the DNA sequencing boom that is currently underway, and illustrate how Hadoop, a proven application architecture used widely in BigData and commercial internet applications can be reused in the genomics sector.

Published in: Science
  • Be the first to comment

Genomics isn't Special

  1. 1. © 2014 MapR Technologies 1 AppsSequencer Genomics isn’t Special* Analytics http://www.slideshare.net/urilaserson/genomics-is-not-special-towards-data-intensive-biology
  2. 2. © 2014 MapR Technologies 2 BISensor Genomics Follows the Standard BigData Workflow ETL
  3. 3. © 2014 MapR Technologies 3 BISensor Genomics is a Big Opportunity ETL MapR-DBMapR-FS
  4. 4. © 2014 MapR Technologies 4 Biggest Opportunity is to Save Lives (Clinical) Clinical Pharma Agriculture Manufacturing Energy … …Digitized DNA ~28PB of DNA digitized per year (2013). ~250K Human genomes sequenced (2013). ~4M Babies born (2013, USA). http://www.technologyreview.com/news/531091/emtech-illumina-says-228000-human-genomes-will-be-sequenced-this-year/
  5. 5. © 2014 MapR Technologies 5 Clinical Applications are Launching Now • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs Macquarie Capital, 2014. Genomics 2.0: It’s just the beginning 0 5 10 15 20 2014 2020 Clinical Non-Clinical
  6. 6. © 2014 MapR Technologies 6 Clinical COGS: Analytics > Chemistry • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs 0 5 10 15 20 2014 2020 Clinical Non-Clinical Why?
  7. 7. © 2014 MapR Technologies 7© 2014 MapR Technologies Historical Perspective – eCommerce Boom
  8. 8. © 2014 MapR Technologies 8years CPU transistors/mm2 HDD GB/mm2 Internet GB/s
  9. 9. © 2014 MapR Technologies 9 Early 1990s: Early eCommerce Vendor Setup Storage read/write read/write Website Back Office
  10. 10. © 2014 MapR Technologies 10 Late 1990s: Workload became too big Storage read/write read/write Website WebsiteWebsite Website Back Office Back Office
  11. 11. © 2014 MapR Technologies 11 2003-4: GFS+MapReduce (Hadoop) Published read/write read/write Website WebsiteWebsite Website Storage + Compute Cluster Back Office Back Office
  12. 12. © 2014 MapR Technologies 12© 2014 MapR Technologies Genomics Boom
  13. 13. © 2014 MapR Technologies 13 DNA Sequencing, pre-2004 years CPU transistors/mm2 HDD GB/mm2 DNA bp/$, pre-2004
  14. 14. © 2014 MapR Technologies 14 DNA Sequencing, pre-2004 Storage write-only read/write High-Performance Compute Cluster Coordinator / Edge Node Sequencer
  15. 15. © 2014 MapR Technologies 15 DNA Sequencing, 2004 Disruption years CPU transistors/mm2 HDD GB/mm2 DNA bp/$, post-2004 DNA bp/$, pre-2004
  16. 16. © 2014 MapR Technologies 16 DNA Sequencing, post-2004 Storage write-only read/write High-Performance Compute Cluster Coordinator / Edge Node DNA Sequencer Cluster (e.g. Illumina X-Ten) HPC bottleneck Sequencer back-pressure
  17. 17. © 2014 MapR Technologies 17 DNA Sequencing, 2014 @ Major Sequencing Vendor write-only DNA Sequencer Cluster (e.g. Illumina X-Ten Storage + Compute Cluster Decentralize I/O Decentralize I/O
  18. 18. © 2014 MapR Technologies 18 DNA Analytics Can Now Scale Out HPC Analytics Hadoop / Spark Analytics
  19. 19. © 2014 MapR Technologies 19© 2014 MapR Technologies Back to Market Analysis…
  20. 20. © 2014 MapR Technologies 20 Clinical COGS: Analytics > Chemistry • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs 0 5 10 15 20 2014 2020 Clinical Non-Clinical
  21. 21. © 2014 MapR Technologies 21 Genomics Market Value Chain Sequencing Tech Pharma CLIA Patients Research HospitalsBasic R&D Patients Sequencing Tech
  22. 22. © 2014 MapR Technologies 22 Seven Billion Humans Today Seq. Tech CLIA MapR-DBMapR-FS Linear Growth with # of Humans Exponential Growth with # of Humans Pharma Res. Hospitals
  23. 23. © 2014 MapR Technologies, confidential Thanks! Questions? @allenday, @mapr aday@mapr.com linkedin.com/in/allenday

×