SlideShare a Scribd company logo
Managing Genomes at Scale:
What We Learned
Rob Long
Monsanto: Big Data Engineer
Twitter: @plantimals
5-29-14
Big Data At Monsanto
What do we mean when we say “big data”?
Monsanto is a (Big) Data Company
A-Maizing
Molecular Machinery
A Reference
Genomic Data Warehouse
Architecture
Application Layer
Compute Farm
Query Genomic Data
Hadoop Cluster
Genomic Data
HbaseHadoop Map Reduce Engine
Data Services
Edge Node Edge Node
…
Unstructured Query
Solr Indexes
Lineage
Graph DB
VCF – Variant Call Format
##fileformat=VCFv4.1
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60
20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4
20 14370 rs6054257 G A 29 PASS …
Let’s Try HBase
HBase Layout
C1 C2 Cn
V1 Vn
V2 Vn
Rowkey1 ->
Rowkey2 ->
ColumnFamily1
Tall Narrow
Ind:chr:pos Id Ref SNP
rs1243 A true
rs321 C true
rs1243 A true
1a2b3c:chr1:1001 ->
1a2b3c:chr1:1000 ->
ColumnFamily:V
456def:chr1:1000 ->
Matrix Report Use Case
Pos VCF1 VCF2 VCF3 VCF4
chr1:100 A A . .
chr1:101 . C C C
chr1:102 . . . A
chr1:103 T T . T
chr1:104 . . . .
First Try
Mapper
Individual 1
Region 1
Mapper
Individual n
Region m
…
Reduce
chr1:1000
Chrom:Pos
Reduce
chr1:1001
Reduce
chrN:M
…
intermediate
result
intermediate
result
intermediate
result
Mapper
Individual 1
Region 1
Mind the gaps
8 12 24
Check For Gaps
Mapper
Individual 1
chr 1
Mapper
Individual 2
chr 1
Mapper
Individual n
chr m
…
Reduce
Abc123:chr1
Reduce
abc124:chr1
Reduce
n:m
…
intermediate
result
intermediate
result
intermediate
result
intermediate
result
intermediate
result
intermediate
result
Check Gaps against
coverage table
This Takes Some Time
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
1 2 3 4 5 6 7 8 9 10 11
Matrix Report Running Time
time in minutes
# of VCFs
Minutes
Flat Wide
genome:chr:pos
123:alt 123:qual 456:alt 456:qual
A 30 C 50
T 35 T 45
b73:chr1:1000 ->
b73:chr1:1001 ->
Improved workflow
Mapper
chr1:100
Mapper
chr1:100
Mapper
chrN:M
…
Reduce
123abc:chr1
Reduce
456def
…
intermediate
result
intermediate
result
join on
positions
Feature Search
Find genomic features like promoter regions, UTR,
exons, genes, etc.
Original Indexing Workflow
HBase
Feature Table
Table
Mapper
Solr Index
Reducer
Solr Index
Reducer
Index zip
Index zip
Table
Mapper
Table
Mapper
Solr Pulls In Updates
HDFS
Solr Server
cron
SolrCloud
Cloudera Search
HBase
Feature Table HDFS
HBase MapReduce
Indexer Tool
Morphline
file
Morphline File
{
extractAvroPaths {
flatten : true
paths : {
feature_name : /featureName
feature_type : /type
start : /start
stop : /stop
}
}
}
Lessons Learned
• Use HBase like a HashMap, not like a relational
database
• Denormalize HBase schemas, no foreign keys
• To scale solr indexes past one server, use
SolrCloud/Cloudera Search, don’t rebuild it on
your own
Questions

More Related Content

What's hot

Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
fnothaft
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
Dataconomy Media
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
Yasin Memari
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014
fnothaft
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
Timothy Danford
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
Amazon Web Services
 
Strata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationStrata-Hadoop 2015 Presentation
Strata-Hadoop 2015 Presentation
Timothy Danford
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
Ian Foster
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Spark Summit
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
Uday Vakalapudi
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
Databricks
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
batchinsights
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Institute
inside-BigData.com
 
CrateDB 101: Geospatial data
CrateDB 101: Geospatial dataCrateDB 101: Geospatial data
CrateDB 101: Geospatial data
Claus Matzinger
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge
Prof. Wim Van Criekinge
 

What's hot (20)

Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
 
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"..."Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
 
Strata-Hadoop 2015 Presentation
Strata-Hadoop 2015 PresentationStrata-Hadoop 2015 Presentation
Strata-Hadoop 2015 Presentation
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Institute
 
CrateDB 101: Geospatial data
CrateDB 101: Geospatial dataCrateDB 101: Geospatial data
CrateDB 101: Geospatial data
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge
 

Viewers also liked

Chong Hing Jewelers
Chong Hing JewelersChong Hing Jewelers
Chong Hing Jewelers
ntrottjti
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
John Dougherty
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
John Dougherty
 
Big Data ROI
Big Data ROIBig Data ROI
Big Data ROI
John Dougherty
 
Weblogic installation
Weblogic installationWeblogic installation
Weblogic installation
Aditya Bhuyan
 
Risk Culture, Risk What?
Risk Culture, Risk What?Risk Culture, Risk What?
Risk Culture, Risk What?
Ian Rich
 
Nine Social Media Mistakes Brands Make
Nine Social Media Mistakes Brands MakeNine Social Media Mistakes Brands Make
Nine Social Media Mistakes Brands Make
AAyuja, Inc.
 
Your guide to staying stylish while travelling
Your guide to staying stylish while travellingYour guide to staying stylish while travelling
Your guide to staying stylish while travelling
Sheena Agarwal
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)
John Dougherty
 
Why Outdoor-2016
Why Outdoor-2016Why Outdoor-2016
Why Outdoor-2016
Brandon R. Gibson
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
StampedeCon
 
People & Places - Proximity, flexibility & affinity
People & Places - Proximity, flexibility & affinityPeople & Places - Proximity, flexibility & affinity
People & Places - Proximity, flexibility & affinity
Clear Channel Belgium Where brands meet people
 
Risk Culture
Risk CultureRisk Culture
Risk Culture
Ian-Edward Stafrace
 
Successful Industrial IoT Patterns
Successful Industrial IoT PatternsSuccessful Industrial IoT Patterns
Successful Industrial IoT Patterns
WSO2
 

Viewers also liked (14)

Chong Hing Jewelers
Chong Hing JewelersChong Hing Jewelers
Chong Hing Jewelers
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
Big Data ROI
Big Data ROIBig Data ROI
Big Data ROI
 
Weblogic installation
Weblogic installationWeblogic installation
Weblogic installation
 
Risk Culture, Risk What?
Risk Culture, Risk What?Risk Culture, Risk What?
Risk Culture, Risk What?
 
Nine Social Media Mistakes Brands Make
Nine Social Media Mistakes Brands MakeNine Social Media Mistakes Brands Make
Nine Social Media Mistakes Brands Make
 
Your guide to staying stylish while travelling
Your guide to staying stylish while travellingYour guide to staying stylish while travelling
Your guide to staying stylish while travelling
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)
 
Why Outdoor-2016
Why Outdoor-2016Why Outdoor-2016
Why Outdoor-2016
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
 
People & Places - Proximity, flexibility & affinity
People & Places - Proximity, flexibility & affinityPeople & Places - Proximity, flexibility & affinity
People & Places - Proximity, flexibility & affinity
 
Risk Culture
Risk CultureRisk Culture
Risk Culture
 
Successful Industrial IoT Patterns
Successful Industrial IoT PatternsSuccessful Industrial IoT Patterns
Successful Industrial IoT Patterns
 

Similar to Managing Genomes At Scale: What We Learned - StampedeCon 2014

Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow
 
From Black Box to Black Magic, Pycon Ireland 2014
From Black Box to Black Magic, Pycon Ireland 2014From Black Box to Black Magic, Pycon Ireland 2014
From Black Box to Black Magic, Pycon Ireland 2014
Gloria Lovera
 
Py conie 2014
Py conie 2014Py conie 2014
Py conie 2014
Gloria Lovera
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
Weikun Wang
 
Macs course
Macs courseMacs course
Macs course
Luca Cozzuto
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Pluribus One
 
Macro Processor
Macro ProcessorMacro Processor
Macro Processor
Saranya1702
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
RafiyaKouser2
 
Microprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptxMicroprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptx
SahithBeats
 
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptxCOA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
SahithBeats
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
Rabi Das
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
Li Shen
 
Design of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and ExcelDesign of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and Excel
David Sandy
 
Circuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project ReportCircuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project Report
Michael Sandy
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
zukun
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
BrittneDean
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
NathanielZaleski
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
DennisHine
 
Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inference
richardchandler
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
Florian Uhlitz
 

Similar to Managing Genomes At Scale: What We Learned - StampedeCon 2014 (20)

Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
From Black Box to Black Magic, Pycon Ireland 2014
From Black Box to Black Magic, Pycon Ireland 2014From Black Box to Black Magic, Pycon Ireland 2014
From Black Box to Black Magic, Pycon Ireland 2014
 
Py conie 2014
Py conie 2014Py conie 2014
Py conie 2014
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
Macs course
Macs courseMacs course
Macs course
 
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
 
Macro Processor
Macro ProcessorMacro Processor
Macro Processor
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_07.ppt
 
Microprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptxMicroprogrammed of organisation and architecture of computer.pptx
Microprogrammed of organisation and architecture of computer.pptx
 
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptxCOA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
COA 2.1 Microprogrammed control systems of btech 2nd year students.pptx
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Design of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and ExcelDesign of Filter Circuits using MATLAB, Multisim, and Excel
Design of Filter Circuits using MATLAB, Multisim, and Excel
 
Circuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project ReportCircuit Theory 2: Filters Project Report
Circuit Theory 2: Filters Project Report
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Model Selection and Multi-model Inference
Model Selection and Multi-model InferenceModel Selection and Multi-model Inference
Model Selection and Multi-model Inference
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Managing Genomes At Scale: What We Learned - StampedeCon 2014

  • 1. Managing Genomes at Scale: What We Learned Rob Long Monsanto: Big Data Engineer Twitter: @plantimals 5-29-14
  • 2. Big Data At Monsanto What do we mean when we say “big data”?
  • 3. Monsanto is a (Big) Data Company
  • 8. Architecture Application Layer Compute Farm Query Genomic Data Hadoop Cluster Genomic Data HbaseHadoop Map Reduce Engine Data Services Edge Node Edge Node … Unstructured Query Solr Indexes Lineage Graph DB
  • 9. VCF – Variant Call Format ##fileformat=VCFv4.1 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta ##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x> ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 20 14370 rs6054257 G A 29 PASS …
  • 11. HBase Layout C1 C2 Cn V1 Vn V2 Vn Rowkey1 -> Rowkey2 -> ColumnFamily1
  • 12. Tall Narrow Ind:chr:pos Id Ref SNP rs1243 A true rs321 C true rs1243 A true 1a2b3c:chr1:1001 -> 1a2b3c:chr1:1000 -> ColumnFamily:V 456def:chr1:1000 ->
  • 13. Matrix Report Use Case Pos VCF1 VCF2 VCF3 VCF4 chr1:100 A A . . chr1:101 . C C C chr1:102 . . . A chr1:103 T T . T chr1:104 . . . .
  • 14. First Try Mapper Individual 1 Region 1 Mapper Individual n Region m … Reduce chr1:1000 Chrom:Pos Reduce chr1:1001 Reduce chrN:M … intermediate result intermediate result intermediate result Mapper Individual 1 Region 1
  • 16. Check For Gaps Mapper Individual 1 chr 1 Mapper Individual 2 chr 1 Mapper Individual n chr m … Reduce Abc123:chr1 Reduce abc124:chr1 Reduce n:m … intermediate result intermediate result intermediate result intermediate result intermediate result intermediate result Check Gaps against coverage table
  • 17. This Takes Some Time 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 1 2 3 4 5 6 7 8 9 10 11 Matrix Report Running Time time in minutes # of VCFs Minutes
  • 18. Flat Wide genome:chr:pos 123:alt 123:qual 456:alt 456:qual A 30 C 50 T 35 T 45 b73:chr1:1000 -> b73:chr1:1001 ->
  • 20. Feature Search Find genomic features like promoter regions, UTR, exons, genes, etc.
  • 21. Original Indexing Workflow HBase Feature Table Table Mapper Solr Index Reducer Solr Index Reducer Index zip Index zip Table Mapper Table Mapper
  • 22. Solr Pulls In Updates HDFS Solr Server cron
  • 24. Cloudera Search HBase Feature Table HDFS HBase MapReduce Indexer Tool Morphline file
  • 25. Morphline File { extractAvroPaths { flatten : true paths : { feature_name : /featureName feature_type : /type start : /start stop : /stop } } }
  • 26. Lessons Learned • Use HBase like a HashMap, not like a relational database • Denormalize HBase schemas, no foreign keys • To scale solr indexes past one server, use SolrCloud/Cloudera Search, don’t rebuild it on your own

Editor's Notes

  1. Good Afternoon Rob Long Today, talk about big data at monsanto And some of the lessons we’ve learned. Started > year ago Little knowledge Something with plants But then, so cool
  2. Been doing big data since 2009 It wasn’t so big then, but… Big data not size, but handling Mutliple machines Denormalized nosql Mapreduce Also variety, unstructured and semistructured integration
  3. Monsanto not prev. assoc. w/ big data Sell seeds Recipes for soil air water into food We need to understand: Genomics, agronomy, breeding, chemistry, Data feedback for breeding Year over year improvement
  4. US average corn yields 1863 - 2002 Axes: x- year, y-bushels per acre ( 0 – 160 ) Up to 1930’s, yield was 30 bushels Increasing yields, due to breeding, treatments, automation, weather prediction, etc Now > 140 bshls/acre 5x increase Goal of 300 Bshls/acre by 2030 How?
  5. Knowledge of the molecular machinery, answers “how?” Highschool biology, applied Squiggles are important – called organelles Smaller scale ACGT’s Genes as recipes
  6. How many people familiar w/ hum gen proj? Some done on the east campus Actual books on shelves Map for a genome Shred a book Need a guide A reference, just like the library. Many references, one per species
  7. Store individual genomes Warehouse / data mart Archipelago of sources Bureaucratic friction Solution: GenRe
  8. <point out the parts> App servers are web logic, Using Cloudera, CDH4.6 hadoop cluster has 30 data nodes, 3 edge nodes Edge nodes host services Compute farm - blast to find sequences Graph db for lineage
  9. Header – standard/custom data 1 variant per line Variant ind. != ref Address, chrom/pos Chrom = file, pos = offset Count from 1 1 line per pos Maize 2.3B bases 1/1000 variants Keep variants only 1/1000th storage requirement Pay a price, explain later Three main access patterns. Dump ind. Matrix, sites passing filters Regions or whole genome Aligned to same ref Flanking regions Not real-time
  10. How many familiar w/ HBase? watch: “Introduction to NoSQL” by Martin Fowler on youtube HBase, big table Distributed persistent hashmap Keyspace partitioned into regions Region servers host the regions CAP, CP
  11. Rows, cf rowkeys Column families, similar datas Sparse columns, 1 c1 vs 2 c1
  12. Our approach Tall/narrow schema Fixed # cols Many rows 1 row / variant / VCF Id:chr:pos, groups indiv. data Good for indiv. And flanking (explain) But…
  13. Review matrix use case Filter: Pos w/ => 1 variant More joins in tall/narrow M * N scanners, worst case
  14. TableMapper, region/individual Join this on chrom:pos, get individuals at a pos Now we can filter Emit if needed Intermediate result, map files
  15. Blue: ref Green: data vert.boxes, 8 variant, 12 ref, 24 no data 24 complicates, no row Either ref or no data? Solved: store gaps
  16. Group by individual/chrom Gaps addressed this way Intersect gaps Then w/gaps, back to chr:pos orientation Dump results to n files
  17. X: num individuals Y: minutes Exponential growth Too many joins
  18. Swap individual with genome build One row per pos per genome One get HBase used correctly
  19. Single pass for M individ.
  20. Variants have a context Location, in a gene, known effects Search on any field Jbrowse, open source visualization
  21. An established pattern Features dumped from hbase Embedded solr in reducers Zips in hdfs
  22. Solr server runs cron Pulls zipped indexes Merges Incremental updates skip MR phase Problems: Brittle, lots of code to maintain Not distributed, a single solr server Must move data off HDFS
  23. Ramping up to billions of features The old solr server was choking. Enter, solrcloud. Indexes in HDFS, Coordinates through zookeeper Collection concept Same REST interface
  24. Cloudera search lily hbase indexer service (real time) MR indexer Morphline file defines mappings between inputs and solr docs Foreign key problem Add a step
  25. Mapping from avro to solr Can have multi-level records
  26. Hbase like hashmap Denormalize Scale with solrcloud, don’t rebuild
  27. Credit to: Jeff GenRe team Amandeep Khurana