SlideShare a Scribd company logo
©2012 MFMER | slide-1
CENTER FOR
INDIVIDUALIZED
MEDICINE
Key Problems in Addressing the
N+1 problem in Patient Care.
Daniel Quest
Note: The case study in this slide deck should not be
considered clinical advice. The goal is to understand the informatics challenges!
©2012 MFMER | slide-2
Center for INDIVIDUALIZED MEDICINE
Introduction
In 1900 David Hilbert in Paris at
the International Congress of
Mathematicians proposed 10
problems (later 23) that greatly
influenced the progression of
mathematics for the 20th century.
This is a similar journey – but in
‘genomic medicine’
Today I will present 9 problems:
AND SPARK IS AWESOME AT MOST OF THEM!
©2012 MFMER | slide-3
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-4
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-5
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-6
Center for INDIVIDUALIZED MEDICINE
Madelyn Shumaker
Madelyn was 8 years old when
diagnosed with DIPG (diffuse intrinsic
pontine glioma), which is nearly always
fatal and lacks an effective treatment.
Nearly all die within two years. She
underwent ‘personalized medicine’ in
an attempt to target her cancer.
January 26, 2015:
Maddie is diagnosed
with DIPG
January 29, 2015:
Maddie goes to St.
Jude Children’s
hospital for 6 weeks
chemo/radiation as
part of a DIPG
clinical trial.
March 30, 2015:
Maddie returns home
and starts school
again. Maddie, has
some slight hearing
loss
November 14, 2015:
Maddie goes to Dr.
Giselle Sholler Helen
DeVos Children's
Hospitalin Grand
Rapids, Mich for
genome and
transcriptome
sequencing. The
Tumor board
recommends
treatment based on
the findings
April 15, 2015:
Maddie returns to St.
Jude’s for a follow up
MRI. The results
show the cancer
virtually gone from
the brainstem
May 22, 2015:
Maddie undergoes a
second round of
chemo
June 12, 2015:
Maddie returns to St.
Jude’s for a follow up
MRI. There is
evidence of necrosis
(cell death) in the
brainstem.
October 20, 2015:
Maddie’s symptoms
worsen and become
persistent, the cancer
is recurrent
October 29, 2015:
Maddie’s undergoes
a revolutionary
surgery at Sloan
Kettering where they
do a biopsy on the
tumor for molecular
and pathology
analysis
December 1, 2015:
Maddie goes begins
to have adverse
reactions to chemo
December 10,
2015: Maddie
passes away from
DIPG
©2012 MFMER | slide-7
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-8
Center for INDIVIDUALIZED MEDICINE
Data Types (Available to the Tumor
Board)
„NGS Whole Exome (Somatic and Germline)
„RNAseq
„PDFs Describing Drug/Gene Interactions
©2012 MFMER | slide-9
Center for INDIVIDUALIZED MEDICINE
Raw Reads
Mapping
Pileup
Variant Calling
Problem 1: Variant Calling
Genomic Variants
https://github.com/bigdatagenomics/ -
Variant Calling in Spark!
~119M Reads ‘Normal Tissue’Exome Seq
~302M Reads ‘Tumor’ Exome Seq
62,116 Variants‘Normal Tissue’
4361 Variants ‘Tumor’
©2012 MFMER | slide-10
Center for INDIVIDUALIZED MEDICINE
Problem 2: Annotation
http://bioinformaticstools.mayo.edu/research/bior/
©2012 MFMER | slide-11
Center for INDIVIDUALIZED MEDICINE
Problem 2: Annotation
Information from data sources from other organizations and
institutions that give important and actionable background.
Our Data:
Genome, Transcriptome,
Epigenome, Microbiome,
Proteome
Drugs
Gene Functions
and Pathways
Gene regulation
Population Variation
Functional
Impact
Oncogenes, tumor
suppressors, epigenetic
readers/writers, etc.
Mayo Clinical Knowledge
©2012 MFMER | slide-12
Center for INDIVIDUALIZED MEDICINE
Document
Data Model
Original Data (normalized syntax)
Added Data
(normalized syntax + semantics)
©2012 MFMER | slide-13
Center for INDIVIDUALIZED MEDICINE
BioR Annotation Engine
>90% of
annotation
queries can be
handled by
genomic
position
search OR by
ID matching.
This is a Map-Reduce Problem! Spark to the rescue!
©2012 MFMER | slide-14
Center for INDIVIDUALIZED MEDICINE
Problem 3: Variant Filtering
Annotations
A1 A2 An…
…
Sample1 Sample2 Sample3
©2012 MFMER | slide-15
Center for INDIVIDUALIZED MEDICINE
Problem 3: Variant Filtering
Genetic Councilors at Mayo are Using
VCF Miner in the Clinic to find the
cause of disease
http://bioinformaticstools.mayo.edu/research/vcf-miner/
Each case requires a different ‘schema’
because each disease is different.
©2012 MFMER | slide-16
Center for INDIVIDUALIZED MEDICINE
VCF-Miner Stats for Madelyn
# Filter Count
1 All Variants 61971 (whole
exome – whole
genome is
usually ~3-5M)
2 Germline Mutations (Maddie relative to HG19) 54579
3 Somatic Mutations (tumor only) 7392
4 Variants in Cancer Genes (cosmic 595) && 3 202
5 SNPEFF Impact = HIGH || MODERATE && 3 257
6 Polyphen = possibly damaging, probably damaging,
unknown && 3
93
7 SIFT_TERM && 3 69
8 4 && 5 8
9 Variants Filtered by Annovar 83
10 Variants in final report to Tumor Board 7 (1 RARG, 6
PIK3CA)
©2012 MFMER | slide-17
Center for INDIVIDUALIZED MEDICINE
In Clinical Report
In Raw Report
In DIPG Literature – not in report
Novel found by VCF-Miner!
Missing in VCF-Miner ‘Damaging’Analysis - RARG (there are two if we consider all
somatic variants)
©2012 MFMER | slide-18
Center for INDIVIDUALIZED MEDICINE
Problem 4 Clinical Oncology
Mutations	in	Genes	Relevant	to	Cancer*
Gene	name Mut	type Location Specific	change COSMIC Gene	description	(NCBI)
ACVR1	(activin	A	
receptor	type	I)
SNV chr2:158630626,	
Pfam	Domain:	
Transforming	
growth	factor	beta	
type	I	GS-motif
Missense,	R206H,	
DNA	allele	ratio	0.46
150	coding	
mutations,	12	
R206H
Activins	are	dimeric	growth	and	differentiation	factors	which	
belong	to	the	transforming	growth	factor-beta	(TGF-beta)	
superfamily	of	structurally	related	signaling	proteins.	
Activins	signal	through	a	heteromeric	complex	of	receptor	
serine	kinases	which	include	at	least	two	type	I	(	I	and	IB)	
and	two	type	II	(II	and	IIB)	receptors.	These	receptors	are	all	
transmembrane	proteins,	composed	of	a	ligand-binding	
extracellular	domain	with	cysteine-rich	region,	a	
transmembrane	domain,	and	a	cytoplasmic	domain	with	
predicted	serine/threonine	specificity.	Type	I	receptors	are	
essential	for	signaling;	and	type	II	receptors	are	required	for	
binding	ligands	and	for	expression	of	type	I	receptors.	Type	I	
and	II	receptors	form	a	stable	complex	after	ligand	binding,	
resulting	in	phosphorylation	of	type	I	receptors	by	type	II	
receptors.	This	gene	encodes	activin	A	type	I	receptor	which	
signals	a	particular	transcriptional	response	in	concert	with	
activin	type	II	receptors.	Mutations	in	this	gene	are	
associated	with	fibrodysplasia	ossificans	progressive.
PIK3CA	
(phosphatidylinositol-
4,5,-bisphosphate	3-
kinase,	catalytic	subunit	
alpha)
SNV chr3:178936091,	
Pfam	Domain:	
Phosphoinositide	3-
kinase	family,	
accessory	domain	
(PIK)
Missense,		E545K,	
DNA	allele	ratio	0.14
8178	coding	
mutations,	
1277	E545K
Phosphatidylinositol	3-kinase	is	composed	of	an	85	kDa	
regulatory	subunit	and	a	110	kDa	catalytic	subunit.	The	
protein	encoded	by	this	gene	represents	the	catalytic	subunit,	
which	uses	ATP	to	phosphorylate	PtdIns,	PtdIns4P	and	
PtdIns(4,5)P2.	This	gene	has	been	found	to	be	oncogenic	and	
has	been	implicated	in	cervical	cancers.
©2012 MFMER | slide-19
Center for INDIVIDUALIZED MEDICINE
Molecular Guided Report
©2012 MFMER | slide-20
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-21
Center for INDIVIDUALIZED MEDICINE
Problem 5: Integrate RNA and other data
to get improved accuracy.
MAPRSeq
©2012 MFMER | slide-22
Center for INDIVIDUALIZED MEDICINE
RNAseq Stats:
Gene RPKM
Average RPKM
>1
16.2921
H3F3A 23.844392374
ACVR1 7.87217392506
PIK3CA 3.67493145428
FGFR3 27.3744042904
FIP1L1 11.8135796235
HLA-A 96.6034792985
STAG2 11.7003663476
Number of Transcripts
RPKM > 1 17651
RPKM > 0 38320
ALL 57773
©2012 MFMER | slide-23
Center for INDIVIDUALIZED MEDICINE
Problem 6: Metadata
Diagnosis UC vs. IC Vs. CD
Date of Current Diagnosis date listed in Month/Year format
Initial Diagnosis for example if 1st dx as UC, then dx with CD
Date of Past Diagnosis date listed in Month/Year format
Gender Male / Female
Birth date month/day/year
Race
Ethnicity
Pyoderma Gangrenosum Yes or No
Erythema Nodosum Yes or No
Metastatic Crohn's disease Yes or No
Uveitis/iritis Yes or No
Episcleritis/scleritis Yes or No
Primary Sclerosing Cholangitis Yes or No
Arthritis - small joints (hot
swollen joints) Yes or No
Arthritis - large joints (hot
swollen joints) Yes or No
Amyloidosis Yes or No
Ankylosing Spondylitis Yes or No
Sacroiliitis Yes or No
IBD-related mouth ulcers Yes or No
Venous thrombosis Yes or No
Arterial thrombosis Yes or No
Kidney stones Yes or No
Colon or rectal cancer Yes or No
perianal procedure Yes or No
abcess Yes or No
fistula Yes or No
stricture Yes or No
seton placed Yes or No
Crohn Phenotype B1 or B2 or B3
Crohn's Location Ileocolonic, Colonic, Ileum
UC Location E1 or E2 or E3
Number of IBD related
surgeries # value
Number of resections # value
Number of stricturoplasties # value
Anti-TNF Ever Yes or No
Any 1st degree family members
with CD Yes or No
Any 1st degree family members
with UC Yes or No
Ever Smoked Yes or No
Smoke When Diagnosed Yes or No
Currently Smoke Yes or No
Current Smoke Amount packs per day, can be decimal, or pack per week
Average Packs Per Day Over
History packs per day, can be decimal, or pack per week
Years Smoked # value
Medication for IBD See separate list
Example Study Collection Form
• Studies on the research side mostly ‘managed’ in excel files –
semi-structured/denormalized.
• Each investigator collects the information they need to answer a
specific question.
• Can come from clinical notes in the EMR; free text – this
requires NLP!
• Limited information can come from the EDT for example ICD 10
codes, birthdate, ect.
• Information is decentralized and not easy to query
Samples and the metadata about them link
genotypes and phenotypes
©2012 MFMER | slide-24
Center for INDIVIDUALIZED MEDICINE
Review: Components
Annotation
VCF-Miner
Filtering
Sorting
Sequencing
Primary Analysis
(read mapping)
GenomeGPS
Variant
Calling
CUSTOM
Linux
Tooling / R
VCF-Export
- Analytics
Metadata
Other Pipelines:
• MAP-RSEQ
• CHIP-SEQ
• Methyl-SEQ
• Microbiome
• Biomarkers
Variant
Warehouse
©2012 MFMER | slide-25
Center for INDIVIDUALIZED MEDICINE
Problem 7: Cohort Identification (N+1)
©2012 MFMER | slide-26
Center for INDIVIDUALIZED MEDICINE
Metadata Query Builder
select id from patient where (
gender = ‘f’,
smoking = ‘false’,
Contains(diagnosis, ‘DIPG’),
age < 10
…
)
©2012 MFMER | slide-27
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-28
Center for INDIVIDUALIZED MEDICINE
p1 1 0 1 1 1 1 0 1 1
p2 1 1 1 0 1 1 0 0 1
p3 1 0 0 1 1 1 0 1 1
…
pN 0 0 0 1 0 0 0 0 0
smoking
braincancer
age<10
female
PIC3CA
RARG
BRCA1
Genes
w/mutations
Phenotype
observations Pathways
Transcriptional
misregulationin
cancer
…
IGF2BP3
Up/Down
Regulated
Genes
p1
p3
pN
p2
©2012 MFMER | slide-29
Center for INDIVIDUALIZED MEDICINE
Problem 8 Dynamic Recalculation and
Analytics
A large set of records (~70 Million) records each needs be touched in
recalculating statistics.
• Statistics need to be recalculated because we often are dealing with
incomplete data or incompatable technologies (e.g. gene panels versus
whole genome sequencing)
After a user selects the cohort set, important statistics need to be
recalculated based on the cohort set.
Variant Database Samples Selected
Based on the Metadata
Bitwise OR used for
calculating frequencies
Resulting
Allele
Frequencies
(sortable and
dynamic)
©2012 MFMER | slide-30
Center for INDIVIDUALIZED MEDICINE
Different Coverage Results in Errors!
Sample 1
Sample 2
We have to normalize all of the data to make it comparable!
©2012 MFMER | slide-31
Center for INDIVIDUALIZED MEDICINE
S1 S2 S3
V1 0/1 . .
V2 1/1 1/1 0/0
V3 0/0 0/1 0/0
V4 1/0 0/1 .
V5 . . 1/0
VCF1 VCF2 VCF3
S1 S2 S3
V1:0/1 2:1/1 V5:1/0
V2:1/1 3:0/1
V4:1/0 4:0/1
Each of these files come in one at a time, could
be from different intuitions and could be years
apart!
The Cohort Analysis
requires a complete table
of the following form:
f(quality)
Variants
GVCF1 GVCF2 GVCF3
S1 S2 S3
V1 V2 V2
V2 V3 V3
V3 V4 V5
V4
Coverage
http://steven-n-hart.github.io/VariantDB_Challenge/
Different technologies have different coverage!
©2012 MFMER | slide-32
Center for INDIVIDUALIZED MEDICINE
Problem 9: Data Sharing – ConsortiaDB
SHARE
©2012 MFMER | slide-33
Center for INDIVIDUALIZED MEDICINE
Problem Legacy Future State / Being
Evaluated
1 – Variant Calling GATK GATK4 on Spark
/Adam
2 - Annotation Linux Commands
- SGE
Spark
3 – Variant Filtering MongoDB – can
only filter 10,000
samples
Spark – being
evaluated; could be a
Spark SOLR hybrid.
4 – Clinical Oncology Oracle Spark / Hortonworks
Stack
5 - Metadata Elastic Search +
STORM +
MapReduce
Spark/SOLR Hybrid?
6 – RNA and Other Data Linux Commands
- SGE
Spark/Hadoop?
7 – Cohort Identification DB2 Spark/Hbase Hybrid
8 – Dynamic Recalculation Custom
Distributed Java
Spark
9 – Consortium and Data
Sharing
None Spark / Hortonworks
Stack
©2012 MFMER | slide-34
©2012 MFMER | slide-35
Center for INDIVIDUALIZED MEDICINE
Thanks
Advanced Analytics
„ Dan Blezek
„ Yaxiong Lin
„ Paul Bleimeyer
Natural Language Proccessing
Vinod Kaggal, Josh Pankratz,
Sean Murphy, Pradip Kanjamala.
UDP
Mat Raveling, Brian Brownlow, Bob Domnick
Bioinformatics Systems
„ Iain Horton
„ Patrick Duffy
„ Mike Meiners
„ David Rider
„ Matt Bockol
„ Mike Kalembach
„ Greg Dougherty
„ David Mead
Adam Team!
Michael Heuer
Bioinformatics Core
„ JP Koche
„ Steve Hart
„ Raymond Moore
„ Mike Zimmerman
„ Dan O’Brien
„ Saurabh Baheti
Advanced Analytics and
Infrastructure Support
„ Jason Ross
Example Personalized
Medicine Case
„ Jesse Shumaker
„ Giselle Sholler
„ Jeffry Bond
©2012 MFMER | slide-36
Center for INDIVIDUALIZED MEDICINE
Data (!Hypothesis) Driven Discovery
What we do today – collect data to answer a question (GWAS):
An illustration of a Manhattan plot depicting several strongly associated risk loci. Each dot
represents a SNP, with the X-axis showing genomic location and Y-axis showing association
level. This example is taken from a GWA study investigating microcirculation, so the tops
indicates genetic variants that more often are found in individuals with constrictions in small
blood vessels.[1]
©2012 MFMER | slide-37
Center for INDIVIDUALIZED MEDICINE
Data (!Hypothesis) Driven Discovery
What we want to do – have the data to answer a multitude of questions (PheWAS):
An illustration of a Manhattan plot depicting several strongly associated risk variants for a
variety of diseases. Each dot represents a SNP, with the X-axis showing conditions and Y-axis
showing association level.
©2012 MFMER | slide-38
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-39
Center for INDIVIDUALIZED MEDICINE

More Related Content

Viewers also liked

Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Spark Summit
 
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
Spark Summit
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Spark Summit
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
Andy Petrella
 
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambisTAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
robertstevens65
 
Why do you need Clever Cloud ?
Why do you need Clever Cloud ?Why do you need Clever Cloud ?
Why do you need Clever Cloud ?
Quentin Adam
 
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
Quentin Adam
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
Dan Sullivan, Ph.D.
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public Health
Jennifer Gardy
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit
 
From MapReduce to Apache Spark
From MapReduce to Apache SparkFrom MapReduce to Apache Spark
From MapReduce to Apache Spark
Jen Aman
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Finding Graph Isomorphisms In GraphX And GraphFrames
Finding Graph Isomorphisms In GraphX And GraphFramesFinding Graph Isomorphisms In GraphX And GraphFrames
Finding Graph Isomorphisms In GraphX And GraphFrames
Spark Summit
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
MmmooOgle: From Big Data to Decisions for Dairy Cows
MmmooOgle: From Big Data to Decisions for Dairy CowsMmmooOgle: From Big Data to Decisions for Dairy Cows
MmmooOgle: From Big Data to Decisions for Dairy Cows
Spark Summit
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
Jen Aman
 

Viewers also liked (20)

Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
Algorithms and Tools for Genomic Analysis on Spark: Spark Summit East talk by...
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambisTAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
TAMBIS: Transparent Access to Multiple Bioinformatics Information SourcesTambis
 
Why do you need Clever Cloud ?
Why do you need Clever Cloud ?Why do you need Clever Cloud ?
Why do you need Clever Cloud ?
 
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public Health
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine LearningUtilizing Human Data Validation For KPI Analysis And Machine Learning
Utilizing Human Data Validation For KPI Analysis And Machine Learning
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
From MapReduce to Apache Spark
From MapReduce to Apache SparkFrom MapReduce to Apache Spark
From MapReduce to Apache Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Finding Graph Isomorphisms In GraphX And GraphFrames
Finding Graph Isomorphisms In GraphX And GraphFramesFinding Graph Isomorphisms In GraphX And GraphFrames
Finding Graph Isomorphisms In GraphX And GraphFrames
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
MmmooOgle: From Big Data to Decisions for Dairy Cows
MmmooOgle: From Big Data to Decisions for Dairy CowsMmmooOgle: From Big Data to Decisions for Dairy Cows
MmmooOgle: From Big Data to Decisions for Dairy Cows
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 

Similar to Solving The N+1 Problem In Personalized Genomics

Brain tumour patient forum Brinda Shivalingam keynote speech
Brain tumour patient forum Brinda Shivalingam keynote speechBrain tumour patient forum Brinda Shivalingam keynote speech
Brain tumour patient forum Brinda Shivalingam keynote speech
Cure Brain Cancer Foundation
 
0207 2 Jan Geissler - Hot topics in oncology big data
0207 2 Jan Geissler - Hot topics in oncology big data0207 2 Jan Geissler - Hot topics in oncology big data
0207 2 Jan Geissler - Hot topics in oncology big data
Workgroup of European Cancer Patient Advocacy Networks
 
O.O.Bogomolets National Medical University's Achivement
O.O.Bogomolets National Medical University's AchivementO.O.Bogomolets National Medical University's Achivement
O.O.Bogomolets National Medical University's Achivement
Mevar Nirav
 
Beyond Eye Treatment
Beyond Eye TreatmentBeyond Eye Treatment
Beyond Eye Treatment
Melanoma Research Foundation
 
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptxDISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
Tasiumujahid
 
2013 Aegis Capital Healthcare Conference, Sept 26th 2013
2013 Aegis Capital Healthcare Conference, Sept 26th 20132013 Aegis Capital Healthcare Conference, Sept 26th 2013
2013 Aegis Capital Healthcare Conference, Sept 26th 2013
Advanced Cell Technology, Inc.
 
EPC_September-15_pp.22-25 (EPC)
EPC_September-15_pp.22-25 (EPC)EPC_September-15_pp.22-25 (EPC)
EPC_September-15_pp.22-25 (EPC)
shelley bowers
 
Treating Cutaneous Warts
Treating Cutaneous WartsTreating Cutaneous Warts
Treating Cutaneous Warts
Dr. Patrick J. Treacy
 
Dr Patrick Treacy treating cutaneous warts
Dr Patrick Treacy treating cutaneous wartsDr Patrick Treacy treating cutaneous warts
Dr Patrick Treacy treating cutaneous warts
Dr. Patrick J. Treacy
 
RADIATION TREATMENT DROPOUTS
RADIATION TREATMENT DROPOUTSRADIATION TREATMENT DROPOUTS
RADIATION TREATMENT DROPOUTS
Kanhu Charan
 
Act corporate-presentation---september-2013---rodman-conference---final-print...
Act corporate-presentation---september-2013---rodman-conference---final-print...Act corporate-presentation---september-2013---rodman-conference---final-print...
Act corporate-presentation---september-2013---rodman-conference---final-print...
John Redaelli
 
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
Advanced Cell Technology, Inc.
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
Hyung Jin Choi
 
Precysion Glss Presentation Oct Pm2
Precysion  Glss Presentation Oct Pm2Precysion  Glss Presentation Oct Pm2
Precysion Glss Presentation Oct Pm2
pauljmcmahon
 
Taking Immunotherapy to the Next level
Taking Immunotherapy to the Next levelTaking Immunotherapy to the Next level
Taking Immunotherapy to the Next level
Company Spotlight
 
What's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
What's New in the Treatment of Gliomas: A Neuro-Oncologist's PerspectiveWhat's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
What's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
Canadian Cancer Survivor Network
 
Act corporate-presentation14
Act corporate-presentation14Act corporate-presentation14
Act corporate-presentation14
Company Spotlight
 
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 20142014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
Advanced Cell Technology, Inc.
 
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
Institute For Medical Education and Research (IMER)
 
NS BMS Supplement Jan 2017
NS BMS Supplement Jan 2017NS BMS Supplement Jan 2017
NS BMS Supplement Jan 2017
Dominic Rae LION (Open Networker)
 

Similar to Solving The N+1 Problem In Personalized Genomics (20)

Brain tumour patient forum Brinda Shivalingam keynote speech
Brain tumour patient forum Brinda Shivalingam keynote speechBrain tumour patient forum Brinda Shivalingam keynote speech
Brain tumour patient forum Brinda Shivalingam keynote speech
 
0207 2 Jan Geissler - Hot topics in oncology big data
0207 2 Jan Geissler - Hot topics in oncology big data0207 2 Jan Geissler - Hot topics in oncology big data
0207 2 Jan Geissler - Hot topics in oncology big data
 
O.O.Bogomolets National Medical University's Achivement
O.O.Bogomolets National Medical University's AchivementO.O.Bogomolets National Medical University's Achivement
O.O.Bogomolets National Medical University's Achivement
 
Beyond Eye Treatment
Beyond Eye TreatmentBeyond Eye Treatment
Beyond Eye Treatment
 
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptxDISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
DISEASES OF GENETIC MUTATION, TREATMENTS(2).pptx
 
2013 Aegis Capital Healthcare Conference, Sept 26th 2013
2013 Aegis Capital Healthcare Conference, Sept 26th 20132013 Aegis Capital Healthcare Conference, Sept 26th 2013
2013 Aegis Capital Healthcare Conference, Sept 26th 2013
 
EPC_September-15_pp.22-25 (EPC)
EPC_September-15_pp.22-25 (EPC)EPC_September-15_pp.22-25 (EPC)
EPC_September-15_pp.22-25 (EPC)
 
Treating Cutaneous Warts
Treating Cutaneous WartsTreating Cutaneous Warts
Treating Cutaneous Warts
 
Dr Patrick Treacy treating cutaneous warts
Dr Patrick Treacy treating cutaneous wartsDr Patrick Treacy treating cutaneous warts
Dr Patrick Treacy treating cutaneous warts
 
RADIATION TREATMENT DROPOUTS
RADIATION TREATMENT DROPOUTSRADIATION TREATMENT DROPOUTS
RADIATION TREATMENT DROPOUTS
 
Act corporate-presentation---september-2013---rodman-conference---final-print...
Act corporate-presentation---september-2013---rodman-conference---final-print...Act corporate-presentation---september-2013---rodman-conference---final-print...
Act corporate-presentation---september-2013---rodman-conference---final-print...
 
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
Rodman and Rendshaw 2013 Annual Global Investment Conference, NYC, Sept 2013
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
 
Precysion Glss Presentation Oct Pm2
Precysion  Glss Presentation Oct Pm2Precysion  Glss Presentation Oct Pm2
Precysion Glss Presentation Oct Pm2
 
Taking Immunotherapy to the Next level
Taking Immunotherapy to the Next levelTaking Immunotherapy to the Next level
Taking Immunotherapy to the Next level
 
What's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
What's New in the Treatment of Gliomas: A Neuro-Oncologist's PerspectiveWhat's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
What's New in the Treatment of Gliomas: A Neuro-Oncologist's Perspective
 
Act corporate-presentation14
Act corporate-presentation14Act corporate-presentation14
Act corporate-presentation14
 
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 20142014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
2014 BIO CEO Corporate Presentation, NY, NY Feb 11, 2014
 
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
Practical Navigation of a Changing Landscape: Keeping Current on Multiple Mye...
 
NS BMS Supplement Jan 2017
NS BMS Supplement Jan 2017NS BMS Supplement Jan 2017
NS BMS Supplement Jan 2017
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 

Recently uploaded (20)

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 

Solving The N+1 Problem In Personalized Genomics

  • 1. ©2012 MFMER | slide-1 CENTER FOR INDIVIDUALIZED MEDICINE Key Problems in Addressing the N+1 problem in Patient Care. Daniel Quest Note: The case study in this slide deck should not be considered clinical advice. The goal is to understand the informatics challenges!
  • 2. ©2012 MFMER | slide-2 Center for INDIVIDUALIZED MEDICINE Introduction In 1900 David Hilbert in Paris at the International Congress of Mathematicians proposed 10 problems (later 23) that greatly influenced the progression of mathematics for the 20th century. This is a similar journey – but in ‘genomic medicine’ Today I will present 9 problems: AND SPARK IS AWESOME AT MOST OF THEM!
  • 3. ©2012 MFMER | slide-3 Center for INDIVIDUALIZED MEDICINE
  • 4. ©2012 MFMER | slide-4 Center for INDIVIDUALIZED MEDICINE
  • 5. ©2012 MFMER | slide-5 Center for INDIVIDUALIZED MEDICINE
  • 6. ©2012 MFMER | slide-6 Center for INDIVIDUALIZED MEDICINE Madelyn Shumaker Madelyn was 8 years old when diagnosed with DIPG (diffuse intrinsic pontine glioma), which is nearly always fatal and lacks an effective treatment. Nearly all die within two years. She underwent ‘personalized medicine’ in an attempt to target her cancer. January 26, 2015: Maddie is diagnosed with DIPG January 29, 2015: Maddie goes to St. Jude Children’s hospital for 6 weeks chemo/radiation as part of a DIPG clinical trial. March 30, 2015: Maddie returns home and starts school again. Maddie, has some slight hearing loss November 14, 2015: Maddie goes to Dr. Giselle Sholler Helen DeVos Children's Hospitalin Grand Rapids, Mich for genome and transcriptome sequencing. The Tumor board recommends treatment based on the findings April 15, 2015: Maddie returns to St. Jude’s for a follow up MRI. The results show the cancer virtually gone from the brainstem May 22, 2015: Maddie undergoes a second round of chemo June 12, 2015: Maddie returns to St. Jude’s for a follow up MRI. There is evidence of necrosis (cell death) in the brainstem. October 20, 2015: Maddie’s symptoms worsen and become persistent, the cancer is recurrent October 29, 2015: Maddie’s undergoes a revolutionary surgery at Sloan Kettering where they do a biopsy on the tumor for molecular and pathology analysis December 1, 2015: Maddie goes begins to have adverse reactions to chemo December 10, 2015: Maddie passes away from DIPG
  • 7. ©2012 MFMER | slide-7 Center for INDIVIDUALIZED MEDICINE
  • 8. ©2012 MFMER | slide-8 Center for INDIVIDUALIZED MEDICINE Data Types (Available to the Tumor Board) „NGS Whole Exome (Somatic and Germline) „RNAseq „PDFs Describing Drug/Gene Interactions
  • 9. ©2012 MFMER | slide-9 Center for INDIVIDUALIZED MEDICINE Raw Reads Mapping Pileup Variant Calling Problem 1: Variant Calling Genomic Variants https://github.com/bigdatagenomics/ - Variant Calling in Spark! ~119M Reads ‘Normal Tissue’Exome Seq ~302M Reads ‘Tumor’ Exome Seq 62,116 Variants‘Normal Tissue’ 4361 Variants ‘Tumor’
  • 10. ©2012 MFMER | slide-10 Center for INDIVIDUALIZED MEDICINE Problem 2: Annotation http://bioinformaticstools.mayo.edu/research/bior/
  • 11. ©2012 MFMER | slide-11 Center for INDIVIDUALIZED MEDICINE Problem 2: Annotation Information from data sources from other organizations and institutions that give important and actionable background. Our Data: Genome, Transcriptome, Epigenome, Microbiome, Proteome Drugs Gene Functions and Pathways Gene regulation Population Variation Functional Impact Oncogenes, tumor suppressors, epigenetic readers/writers, etc. Mayo Clinical Knowledge
  • 12. ©2012 MFMER | slide-12 Center for INDIVIDUALIZED MEDICINE Document Data Model Original Data (normalized syntax) Added Data (normalized syntax + semantics)
  • 13. ©2012 MFMER | slide-13 Center for INDIVIDUALIZED MEDICINE BioR Annotation Engine >90% of annotation queries can be handled by genomic position search OR by ID matching. This is a Map-Reduce Problem! Spark to the rescue!
  • 14. ©2012 MFMER | slide-14 Center for INDIVIDUALIZED MEDICINE Problem 3: Variant Filtering Annotations A1 A2 An… … Sample1 Sample2 Sample3
  • 15. ©2012 MFMER | slide-15 Center for INDIVIDUALIZED MEDICINE Problem 3: Variant Filtering Genetic Councilors at Mayo are Using VCF Miner in the Clinic to find the cause of disease http://bioinformaticstools.mayo.edu/research/vcf-miner/ Each case requires a different ‘schema’ because each disease is different.
  • 16. ©2012 MFMER | slide-16 Center for INDIVIDUALIZED MEDICINE VCF-Miner Stats for Madelyn # Filter Count 1 All Variants 61971 (whole exome – whole genome is usually ~3-5M) 2 Germline Mutations (Maddie relative to HG19) 54579 3 Somatic Mutations (tumor only) 7392 4 Variants in Cancer Genes (cosmic 595) && 3 202 5 SNPEFF Impact = HIGH || MODERATE && 3 257 6 Polyphen = possibly damaging, probably damaging, unknown && 3 93 7 SIFT_TERM && 3 69 8 4 && 5 8 9 Variants Filtered by Annovar 83 10 Variants in final report to Tumor Board 7 (1 RARG, 6 PIK3CA)
  • 17. ©2012 MFMER | slide-17 Center for INDIVIDUALIZED MEDICINE In Clinical Report In Raw Report In DIPG Literature – not in report Novel found by VCF-Miner! Missing in VCF-Miner ‘Damaging’Analysis - RARG (there are two if we consider all somatic variants)
  • 18. ©2012 MFMER | slide-18 Center for INDIVIDUALIZED MEDICINE Problem 4 Clinical Oncology Mutations in Genes Relevant to Cancer* Gene name Mut type Location Specific change COSMIC Gene description (NCBI) ACVR1 (activin A receptor type I) SNV chr2:158630626, Pfam Domain: Transforming growth factor beta type I GS-motif Missense, R206H, DNA allele ratio 0.46 150 coding mutations, 12 R206H Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I ( I and IB) and two type II (II and IIB) receptors. These receptors are all transmembrane proteins, composed of a ligand-binding extracellular domain with cysteine-rich region, a transmembrane domain, and a cytoplasmic domain with predicted serine/threonine specificity. Type I receptors are essential for signaling; and type II receptors are required for binding ligands and for expression of type I receptors. Type I and II receptors form a stable complex after ligand binding, resulting in phosphorylation of type I receptors by type II receptors. This gene encodes activin A type I receptor which signals a particular transcriptional response in concert with activin type II receptors. Mutations in this gene are associated with fibrodysplasia ossificans progressive. PIK3CA (phosphatidylinositol- 4,5,-bisphosphate 3- kinase, catalytic subunit alpha) SNV chr3:178936091, Pfam Domain: Phosphoinositide 3- kinase family, accessory domain (PIK) Missense, E545K, DNA allele ratio 0.14 8178 coding mutations, 1277 E545K Phosphatidylinositol 3-kinase is composed of an 85 kDa regulatory subunit and a 110 kDa catalytic subunit. The protein encoded by this gene represents the catalytic subunit, which uses ATP to phosphorylate PtdIns, PtdIns4P and PtdIns(4,5)P2. This gene has been found to be oncogenic and has been implicated in cervical cancers.
  • 19. ©2012 MFMER | slide-19 Center for INDIVIDUALIZED MEDICINE Molecular Guided Report
  • 20. ©2012 MFMER | slide-20 Center for INDIVIDUALIZED MEDICINE
  • 21. ©2012 MFMER | slide-21 Center for INDIVIDUALIZED MEDICINE Problem 5: Integrate RNA and other data to get improved accuracy. MAPRSeq
  • 22. ©2012 MFMER | slide-22 Center for INDIVIDUALIZED MEDICINE RNAseq Stats: Gene RPKM Average RPKM >1 16.2921 H3F3A 23.844392374 ACVR1 7.87217392506 PIK3CA 3.67493145428 FGFR3 27.3744042904 FIP1L1 11.8135796235 HLA-A 96.6034792985 STAG2 11.7003663476 Number of Transcripts RPKM > 1 17651 RPKM > 0 38320 ALL 57773
  • 23. ©2012 MFMER | slide-23 Center for INDIVIDUALIZED MEDICINE Problem 6: Metadata Diagnosis UC vs. IC Vs. CD Date of Current Diagnosis date listed in Month/Year format Initial Diagnosis for example if 1st dx as UC, then dx with CD Date of Past Diagnosis date listed in Month/Year format Gender Male / Female Birth date month/day/year Race Ethnicity Pyoderma Gangrenosum Yes or No Erythema Nodosum Yes or No Metastatic Crohn's disease Yes or No Uveitis/iritis Yes or No Episcleritis/scleritis Yes or No Primary Sclerosing Cholangitis Yes or No Arthritis - small joints (hot swollen joints) Yes or No Arthritis - large joints (hot swollen joints) Yes or No Amyloidosis Yes or No Ankylosing Spondylitis Yes or No Sacroiliitis Yes or No IBD-related mouth ulcers Yes or No Venous thrombosis Yes or No Arterial thrombosis Yes or No Kidney stones Yes or No Colon or rectal cancer Yes or No perianal procedure Yes or No abcess Yes or No fistula Yes or No stricture Yes or No seton placed Yes or No Crohn Phenotype B1 or B2 or B3 Crohn's Location Ileocolonic, Colonic, Ileum UC Location E1 or E2 or E3 Number of IBD related surgeries # value Number of resections # value Number of stricturoplasties # value Anti-TNF Ever Yes or No Any 1st degree family members with CD Yes or No Any 1st degree family members with UC Yes or No Ever Smoked Yes or No Smoke When Diagnosed Yes or No Currently Smoke Yes or No Current Smoke Amount packs per day, can be decimal, or pack per week Average Packs Per Day Over History packs per day, can be decimal, or pack per week Years Smoked # value Medication for IBD See separate list Example Study Collection Form • Studies on the research side mostly ‘managed’ in excel files – semi-structured/denormalized. • Each investigator collects the information they need to answer a specific question. • Can come from clinical notes in the EMR; free text – this requires NLP! • Limited information can come from the EDT for example ICD 10 codes, birthdate, ect. • Information is decentralized and not easy to query Samples and the metadata about them link genotypes and phenotypes
  • 24. ©2012 MFMER | slide-24 Center for INDIVIDUALIZED MEDICINE Review: Components Annotation VCF-Miner Filtering Sorting Sequencing Primary Analysis (read mapping) GenomeGPS Variant Calling CUSTOM Linux Tooling / R VCF-Export - Analytics Metadata Other Pipelines: • MAP-RSEQ • CHIP-SEQ • Methyl-SEQ • Microbiome • Biomarkers Variant Warehouse
  • 25. ©2012 MFMER | slide-25 Center for INDIVIDUALIZED MEDICINE Problem 7: Cohort Identification (N+1)
  • 26. ©2012 MFMER | slide-26 Center for INDIVIDUALIZED MEDICINE Metadata Query Builder select id from patient where ( gender = ‘f’, smoking = ‘false’, Contains(diagnosis, ‘DIPG’), age < 10 … )
  • 27. ©2012 MFMER | slide-27 Center for INDIVIDUALIZED MEDICINE
  • 28. ©2012 MFMER | slide-28 Center for INDIVIDUALIZED MEDICINE p1 1 0 1 1 1 1 0 1 1 p2 1 1 1 0 1 1 0 0 1 p3 1 0 0 1 1 1 0 1 1 … pN 0 0 0 1 0 0 0 0 0 smoking braincancer age<10 female PIC3CA RARG BRCA1 Genes w/mutations Phenotype observations Pathways Transcriptional misregulationin cancer … IGF2BP3 Up/Down Regulated Genes p1 p3 pN p2
  • 29. ©2012 MFMER | slide-29 Center for INDIVIDUALIZED MEDICINE Problem 8 Dynamic Recalculation and Analytics A large set of records (~70 Million) records each needs be touched in recalculating statistics. • Statistics need to be recalculated because we often are dealing with incomplete data or incompatable technologies (e.g. gene panels versus whole genome sequencing) After a user selects the cohort set, important statistics need to be recalculated based on the cohort set. Variant Database Samples Selected Based on the Metadata Bitwise OR used for calculating frequencies Resulting Allele Frequencies (sortable and dynamic)
  • 30. ©2012 MFMER | slide-30 Center for INDIVIDUALIZED MEDICINE Different Coverage Results in Errors! Sample 1 Sample 2 We have to normalize all of the data to make it comparable!
  • 31. ©2012 MFMER | slide-31 Center for INDIVIDUALIZED MEDICINE S1 S2 S3 V1 0/1 . . V2 1/1 1/1 0/0 V3 0/0 0/1 0/0 V4 1/0 0/1 . V5 . . 1/0 VCF1 VCF2 VCF3 S1 S2 S3 V1:0/1 2:1/1 V5:1/0 V2:1/1 3:0/1 V4:1/0 4:0/1 Each of these files come in one at a time, could be from different intuitions and could be years apart! The Cohort Analysis requires a complete table of the following form: f(quality) Variants GVCF1 GVCF2 GVCF3 S1 S2 S3 V1 V2 V2 V2 V3 V3 V3 V4 V5 V4 Coverage http://steven-n-hart.github.io/VariantDB_Challenge/ Different technologies have different coverage!
  • 32. ©2012 MFMER | slide-32 Center for INDIVIDUALIZED MEDICINE Problem 9: Data Sharing – ConsortiaDB SHARE
  • 33. ©2012 MFMER | slide-33 Center for INDIVIDUALIZED MEDICINE Problem Legacy Future State / Being Evaluated 1 – Variant Calling GATK GATK4 on Spark /Adam 2 - Annotation Linux Commands - SGE Spark 3 – Variant Filtering MongoDB – can only filter 10,000 samples Spark – being evaluated; could be a Spark SOLR hybrid. 4 – Clinical Oncology Oracle Spark / Hortonworks Stack 5 - Metadata Elastic Search + STORM + MapReduce Spark/SOLR Hybrid? 6 – RNA and Other Data Linux Commands - SGE Spark/Hadoop? 7 – Cohort Identification DB2 Spark/Hbase Hybrid 8 – Dynamic Recalculation Custom Distributed Java Spark 9 – Consortium and Data Sharing None Spark / Hortonworks Stack
  • 34. ©2012 MFMER | slide-34
  • 35. ©2012 MFMER | slide-35 Center for INDIVIDUALIZED MEDICINE Thanks Advanced Analytics „ Dan Blezek „ Yaxiong Lin „ Paul Bleimeyer Natural Language Proccessing Vinod Kaggal, Josh Pankratz, Sean Murphy, Pradip Kanjamala. UDP Mat Raveling, Brian Brownlow, Bob Domnick Bioinformatics Systems „ Iain Horton „ Patrick Duffy „ Mike Meiners „ David Rider „ Matt Bockol „ Mike Kalembach „ Greg Dougherty „ David Mead Adam Team! Michael Heuer Bioinformatics Core „ JP Koche „ Steve Hart „ Raymond Moore „ Mike Zimmerman „ Dan O’Brien „ Saurabh Baheti Advanced Analytics and Infrastructure Support „ Jason Ross Example Personalized Medicine Case „ Jesse Shumaker „ Giselle Sholler „ Jeffry Bond
  • 36. ©2012 MFMER | slide-36 Center for INDIVIDUALIZED MEDICINE Data (!Hypothesis) Driven Discovery What we do today – collect data to answer a question (GWAS): An illustration of a Manhattan plot depicting several strongly associated risk loci. Each dot represents a SNP, with the X-axis showing genomic location and Y-axis showing association level. This example is taken from a GWA study investigating microcirculation, so the tops indicates genetic variants that more often are found in individuals with constrictions in small blood vessels.[1]
  • 37. ©2012 MFMER | slide-37 Center for INDIVIDUALIZED MEDICINE Data (!Hypothesis) Driven Discovery What we want to do – have the data to answer a multitude of questions (PheWAS): An illustration of a Manhattan plot depicting several strongly associated risk variants for a variety of diseases. Each dot represents a SNP, with the X-axis showing conditions and Y-axis showing association level.
  • 38. ©2012 MFMER | slide-38 Center for INDIVIDUALIZED MEDICINE
  • 39. ©2012 MFMER | slide-39 Center for INDIVIDUALIZED MEDICINE