SlideShare a Scribd company logo
1 of 23
© 2014 MapR Technologies 1
AppsSequencer
Genomics isn’t Special*
Analytics
http://www.slideshare.net/urilaserson/genomics-is-not-special-towards-data-intensive-biology
© 2014 MapR Technologies 2
BISensor
Genomics Follows the Standard BigData Workflow
ETL
© 2014 MapR Technologies 3
BISensor
Genomics is a Big Opportunity
ETL
MapR-DBMapR-FS
© 2014 MapR Technologies 4
Biggest Opportunity is to Save Lives (Clinical)
Clinical
Pharma
Agriculture
Manufacturing
Energy
… …Digitized DNA
~28PB of DNA digitized per year (2013).
~250K Human genomes sequenced (2013).
~4M Babies born (2013, USA).
http://www.technologyreview.com/news/531091/emtech-illumina-says-228000-human-genomes-will-be-sequenced-this-year/
© 2014 MapR Technologies 5
Clinical Applications are Launching Now
• 2014: US$ 2B,
• mostly research,
• mostly chemical costs
• 2020: US$ 20B,
• mostly clinical apps,
• mostly analytics costs
Macquarie Capital, 2014. Genomics 2.0: It’s just the beginning
0
5
10
15
20
2014 2020
Clinical
Non-Clinical
© 2014 MapR Technologies 6
Clinical COGS: Analytics > Chemistry
• 2014: US$ 2B,
• mostly research,
• mostly chemical costs
• 2020: US$ 20B,
• mostly clinical apps,
• mostly analytics costs
0
5
10
15
20
2014 2020
Clinical
Non-Clinical
Why?
© 2014 MapR Technologies 7© 2014 MapR Technologies
Historical Perspective – eCommerce Boom
© 2014 MapR Technologies 8years
CPU
transistors/mm2
HDD
GB/mm2
Internet
GB/s
© 2014 MapR Technologies 9
Early 1990s: Early eCommerce Vendor Setup
Storage
read/write
read/write
Website
Back Office
© 2014 MapR Technologies 10
Late 1990s: Workload became too big
Storage
read/write
read/write
Website WebsiteWebsite Website
Back Office Back Office
© 2014 MapR Technologies 11
2003-4: GFS+MapReduce (Hadoop) Published
read/write
read/write
Website WebsiteWebsite Website
Storage + Compute Cluster
Back Office Back Office
© 2014 MapR Technologies 12© 2014 MapR Technologies
Genomics Boom
© 2014 MapR Technologies 13
DNA Sequencing, pre-2004
years
CPU
transistors/mm2
HDD
GB/mm2
DNA
bp/$, pre-2004
© 2014 MapR Technologies 14
DNA Sequencing, pre-2004
Storage
write-only
read/write
High-Performance Compute Cluster
Coordinator /
Edge Node
Sequencer
© 2014 MapR Technologies 15
DNA Sequencing, 2004 Disruption
years
CPU
transistors/mm2
HDD
GB/mm2
DNA
bp/$, post-2004
DNA
bp/$, pre-2004
© 2014 MapR Technologies 16
DNA Sequencing, post-2004
Storage
write-only
read/write
High-Performance Compute Cluster
Coordinator /
Edge Node
DNA Sequencer Cluster (e.g. Illumina X-Ten)
HPC bottleneck
Sequencer
back-pressure
© 2014 MapR Technologies 17
DNA Sequencing, 2014 @ Major Sequencing Vendor
write-only
DNA Sequencer Cluster (e.g. Illumina X-Ten
Storage + Compute Cluster
Decentralize I/O
Decentralize I/O
© 2014 MapR Technologies 18
DNA Analytics Can Now Scale Out
HPC
Analytics
Hadoop / Spark
Analytics
© 2014 MapR Technologies 19© 2014 MapR Technologies
Back to Market Analysis…
© 2014 MapR Technologies 20
Clinical COGS: Analytics > Chemistry
• 2014: US$ 2B,
• mostly research,
• mostly chemical costs
• 2020: US$ 20B,
• mostly clinical apps,
• mostly analytics costs
0
5
10
15
20
2014 2020
Clinical
Non-Clinical
© 2014 MapR Technologies 21
Genomics Market Value Chain
Sequencing
Tech
Pharma CLIA Patients
Research HospitalsBasic R&D Patients
Sequencing
Tech
© 2014 MapR Technologies 22
Seven Billion Humans Today
Seq.
Tech CLIA
MapR-DBMapR-FS
Linear Growth with
# of Humans
Exponential Growth with
# of Humans
Pharma
Res.
Hospitals
© 2014 MapR Technologies, confidential
Thanks!
Questions?
@allenday, @mapr
aday@mapr.com
linkedin.com/in/allenday

More Related Content

Viewers also liked

Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics AnalyticsIntel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
IntelHealthcare
 

Viewers also liked (20)

Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
IHE SGV 17 apr-2012 CIMI, DCM
IHE SGV 17 apr-2012 CIMI, DCMIHE SGV 17 apr-2012 CIMI, DCM
IHE SGV 17 apr-2012 CIMI, DCM
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
 
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics AnalyticsIntel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
 
Strata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationStrata-Hadoop 2015 Presentation
Strata-Hadoop 2015 Presentation
 
Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1
 
동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)
 
Biomedical genomics lecture
Biomedical genomics lectureBiomedical genomics lecture
Biomedical genomics lecture
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)
 
길벗 오픈 안내문
길벗 오픈 안내문길벗 오픈 안내문
길벗 오픈 안내문
 
Computational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsComputational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomics
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Genomics Facts: Did You Know?
Genomics Facts: Did You Know?Genomics Facts: Did You Know?
Genomics Facts: Did You Know?
 
Building cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerBuilding cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and Docker
 

Similar to Genomics isn't Special

Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
DataWorks Summit
 

Similar to Genomics isn't Special (20)

Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
 
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleGenome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Big Data
Big DataBig Data
Big Data
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
DNA Storage
DNA StorageDNA Storage
DNA Storage
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Data Storage in DNA
Data Storage in DNAData Storage in DNA
Data Storage in DNA
 
Overview and Status of HDF in NPOESS & NPP
Overview and Status of HDF in NPOESS & NPPOverview and Status of HDF in NPOESS & NPP
Overview and Status of HDF in NPOESS & NPP
 
IBM Aspera In Life Sciences
IBM Aspera In Life SciencesIBM Aspera In Life Sciences
IBM Aspera In Life Sciences
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Big Data for Security
Big Data for SecurityBig Data for Security
Big Data for Security
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum  White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum
 

More from Allen Day, PhD

20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Allen Day, PhD
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
Allen Day, PhD
 

More from Allen Day, PhD (13)

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 

Recently uploaded

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 

Recently uploaded (20)

Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 

Genomics isn't Special

  • 1. © 2014 MapR Technologies 1 AppsSequencer Genomics isn’t Special* Analytics http://www.slideshare.net/urilaserson/genomics-is-not-special-towards-data-intensive-biology
  • 2. © 2014 MapR Technologies 2 BISensor Genomics Follows the Standard BigData Workflow ETL
  • 3. © 2014 MapR Technologies 3 BISensor Genomics is a Big Opportunity ETL MapR-DBMapR-FS
  • 4. © 2014 MapR Technologies 4 Biggest Opportunity is to Save Lives (Clinical) Clinical Pharma Agriculture Manufacturing Energy … …Digitized DNA ~28PB of DNA digitized per year (2013). ~250K Human genomes sequenced (2013). ~4M Babies born (2013, USA). http://www.technologyreview.com/news/531091/emtech-illumina-says-228000-human-genomes-will-be-sequenced-this-year/
  • 5. © 2014 MapR Technologies 5 Clinical Applications are Launching Now • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs Macquarie Capital, 2014. Genomics 2.0: It’s just the beginning 0 5 10 15 20 2014 2020 Clinical Non-Clinical
  • 6. © 2014 MapR Technologies 6 Clinical COGS: Analytics > Chemistry • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs 0 5 10 15 20 2014 2020 Clinical Non-Clinical Why?
  • 7. © 2014 MapR Technologies 7© 2014 MapR Technologies Historical Perspective – eCommerce Boom
  • 8. © 2014 MapR Technologies 8years CPU transistors/mm2 HDD GB/mm2 Internet GB/s
  • 9. © 2014 MapR Technologies 9 Early 1990s: Early eCommerce Vendor Setup Storage read/write read/write Website Back Office
  • 10. © 2014 MapR Technologies 10 Late 1990s: Workload became too big Storage read/write read/write Website WebsiteWebsite Website Back Office Back Office
  • 11. © 2014 MapR Technologies 11 2003-4: GFS+MapReduce (Hadoop) Published read/write read/write Website WebsiteWebsite Website Storage + Compute Cluster Back Office Back Office
  • 12. © 2014 MapR Technologies 12© 2014 MapR Technologies Genomics Boom
  • 13. © 2014 MapR Technologies 13 DNA Sequencing, pre-2004 years CPU transistors/mm2 HDD GB/mm2 DNA bp/$, pre-2004
  • 14. © 2014 MapR Technologies 14 DNA Sequencing, pre-2004 Storage write-only read/write High-Performance Compute Cluster Coordinator / Edge Node Sequencer
  • 15. © 2014 MapR Technologies 15 DNA Sequencing, 2004 Disruption years CPU transistors/mm2 HDD GB/mm2 DNA bp/$, post-2004 DNA bp/$, pre-2004
  • 16. © 2014 MapR Technologies 16 DNA Sequencing, post-2004 Storage write-only read/write High-Performance Compute Cluster Coordinator / Edge Node DNA Sequencer Cluster (e.g. Illumina X-Ten) HPC bottleneck Sequencer back-pressure
  • 17. © 2014 MapR Technologies 17 DNA Sequencing, 2014 @ Major Sequencing Vendor write-only DNA Sequencer Cluster (e.g. Illumina X-Ten Storage + Compute Cluster Decentralize I/O Decentralize I/O
  • 18. © 2014 MapR Technologies 18 DNA Analytics Can Now Scale Out HPC Analytics Hadoop / Spark Analytics
  • 19. © 2014 MapR Technologies 19© 2014 MapR Technologies Back to Market Analysis…
  • 20. © 2014 MapR Technologies 20 Clinical COGS: Analytics > Chemistry • 2014: US$ 2B, • mostly research, • mostly chemical costs • 2020: US$ 20B, • mostly clinical apps, • mostly analytics costs 0 5 10 15 20 2014 2020 Clinical Non-Clinical
  • 21. © 2014 MapR Technologies 21 Genomics Market Value Chain Sequencing Tech Pharma CLIA Patients Research HospitalsBasic R&D Patients Sequencing Tech
  • 22. © 2014 MapR Technologies 22 Seven Billion Humans Today Seq. Tech CLIA MapR-DBMapR-FS Linear Growth with # of Humans Exponential Growth with # of Humans Pharma Res. Hospitals
  • 23. © 2014 MapR Technologies, confidential Thanks! Questions? @allenday, @mapr aday@mapr.com linkedin.com/in/allenday

Editor's Notes

  1. cinical
  2. cinical
  3. cinical