SlideShare a Scribd company logo
© 2009 Illumina, Inc. All rights reserved.
Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb,
iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genome Informatics
Alliance 2013
Defining Genomic Big Data
and its Impact on Scientific
Progress
2
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
From Whence We Came…
ATGCCGTTT…
CCGGTTAAT…
GAATTGCAG…
6:A2567C
12:C123T
20:T4678A
30-40TB
˜5TB
600GB
˜20GB
3
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data
Large amounts of data generated in genomics; multiple
samples, size of data, etc
Integration of digital data to enrich context of samples;
DNA, RNA, methylation, time courses, spatial
distributions with samples, …
Fusion of digital data and categorical data; combination
rules (categories), extraction from unstructured inputs,
…
Tools and techniques appropriate for resultant data
sets; visualization, model building, exploration, …
Advances require data mining rather than the one-at-a-
time hypothesis testing approaches of today
4
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data and Personal Genome Information
PERSONAL SEQUENCE
(owned by individual/doctor)
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
RISK VARIANTS
(approved for clinical use)
Human Genome
Clinical studies Populations
SequencingFunctional annotation
3: 12,300 3: 12,400 ( kb )
PPARg
GENOMIC ANNOTATION
(in public domain)
Variant: C3 : 12,450,610 : T0.7/C0.3 :
PPARG : Pro/Leu :
Medical
consequence:
Associated with severe insulin
resistance, diabetes mellitus,
hypertension
Pharmacological
consequence:
Resistant to thiazolidinediones
CLINICAL DECISION
Consultation
Consent
Clinical assessment
Selected risk
information
5
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Sequencing a 17-member three-generation
pedigree.
– Ultra deep sequencing improves sensitivity
– Leveraging inheritance information improves
accuracy
– Data and results made publicly available
Identifying ultra accurate genomic variants is
enabling rapid improvements in technology
and software
This data will allow us to assess accuracy for
many FDA submissions
We are collaborating with NIST & CDC to
develop a public resource for quantifying
sequencing accuracy
Platinum Genomes as a Truth Reference
Creating a catalogue of highly-accurate SNPs, indels & SVs
6
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Reduction from 40 Q-scores to 8 Q-scores becoming accepted
Sequencing output is still increasing exponentially therefore further
compression is likely to be required
Platinum genome work suggest ~95% of genome is consistently called (this
95% is known as the platinum regions)
Regions which are reliably called may not need 8 Q-scores resolution
– we can reduce “well
sequenced” regions to 2 Q-
scores
Start with 8 Q-score bam file:
– Reduce the platinum regions
to 2 Q-scores (keep non-
platinum at 8 Q-scores)
– Reduce the platinum regions
to 1 Q-score
– Whole genome
2 Q-score
– Reduce platinum region to 2
Q-scores but also keep
original Q-scores of
mismatches (MM) and
anomalous reads
– ~40Gb (20Gb CRAM)
Data Reduction Via Vertical Compression (NA12882)
Build Total SNPs
(>Q20)
SNPs diff
genotype
(>Q20)
Not called in
Q-score
compressed
build
(>Q20)
Not called in 8
Q-score build
(>Q20)
8 Q-score 3,735,575
(3,627,165)
- - -
8 Q-score
technical
replicate
3,734,849
(3,626,485)
45,584
(22,400)
80,131 (29,211) 79,405 (28,845)
Platinum
Genome 2 Q-
score
3,732,568
(3,620,612)
3,255 (161) 3417 (63) 410 (127)
Platinum
Genome 1 Q-
score
3,764,928
(3,626,468)
4002 (584) 2605 (75) 31,958 (2964)
Whole Genome
2 Q-score
3,712,636
(3,598,400)
25,175 (1912) 24,237 (166) 1298 (112)
Platinum 2 q-
score keep MM
and anom.
reads
3,735,684
(3,627,226)
197 (123) 142 (35) 251 (102)
7
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Faster Data – DNA to Result in <2 Days
12 core server
64Gb RAM
Sequence Analyze AnnotateSample
27 hr 8 hr
HiSeq2500 Isaac analysis overnight
40 hr
Fast turnaround is required for clinical applications
4.5 hr
PCR Free library
8
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
WGS reveals somatic mutations in TERT
gene promoter of melanoma patients
Form a novel transcription factor binding
motif
Recurrence in melanoma is as high as
any known coding mutation
Importance of Non-coding Mutations – Bigger Data!
-200 -100
TERT gene
0 +100 +200
Gene (mutation) Incidence in
melanoma
TERT (promoter) 52%
BRAF (V600E) 53%
CDKN2A 50%
NRAS (Q61R) 28%
TERT (coding) 1%
Horn et al. & Huang et al., Science 2013
9
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Complexity of Data
10
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Surveillance of Leukaemia (CLL) – More Data Complexity!
0 6463 65 6662
Event
Timeline
Sequencing
Birth DeathTreatmentDiagnosis TreatmentTreatment
0
50
100
150
200
250
a b c d e
NORMAL
CLASS 4
CLASS 3
CLASS 2
CLASS 1
Time points
Abundance
Changing
subclonal
populations
0
1
2
3
4
5
c
NO
CL
CL
CL
CL
“Remission” has
disease
Schuh et al., Oxford
11
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
A Deeper Complexity of Genomic Data
12
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Utility Requires Complex Composite Information
iPad
Plug and Play
Cloud
Allele Frequency
in populations
www.1000genomes.org
Medical/Risk data
(with expert review)
Hgmd, pharmgkb
Genetic Variants
dbSNP
Functional Effects
ensembl.org,
genome.ucsc.edu,
encode.org
Disease association
genome.gov
ANNOTATED
GENOME
( gVCF)
<1Gbyte
Ancestry
Tissue type
Risk
Carrier status
Diagnosis
Drug
response
Annotate DisseminateInterpret
13
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Apps
Public Genomic Databases
Users
EMR
Support & Engineering
Instruments
Genomic Big Data Ecosystems
14
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Genomic Big Data Status
Researcher
Treatment choice
Clinician
Patient
Knowledge
Information
15
COMPANY CONFIDENTIAL – INTERNAL USE ONLY
Challenges for this Meeting to Address
What data frameworks and models
are required?
How will genomes (DNA, RNA,
methylation states, etc) be
aggregated and compared?
How will collaboration and data
sharing evolve?
Where will the technology go and
how must the community respond
to lever the benefits
Brainstorming of ideas
Sessions from groups that have
experiences from many fields
Next steps!!
Actively participate and enjoy the entire
experience!

More Related Content

What's hot

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
SciBite Limited
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and Efficiency
CAREL Industries S.p.A
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeq
Golden Helix
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment
Covance
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinical
Golden Helix
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAG
Thermo Fisher Scientific
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_MartinezBill Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysis
Golden Helix
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Golden Helix
 

What's hot (9)

Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
Air Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and EfficiencyAir Handling Units: a matter of Health, Comfort and Efficiency
Air Handling Units: a matter of Health, Comfort and Efficiency
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeq
 
In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment In Vitro Cardiac Safety Assessment
In Vitro Cardiac Safety Assessment
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinical
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAG
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_Martinez
 
CNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysisCNV Annotations: a crucial step in your variant analysis
CNV Annotations: a crucial step in your variant analysis
 
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline WorkflowEvaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
 

Viewers also liked

台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路交點
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
CityAge
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estarAna
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & Advisors
Practice Paradox
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL Chemical
Tan Ray
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news medium
Roshan Mastana
 
Adultos Mayores.
Adultos Mayores.Adultos Mayores.
Adultos Mayores.
Katerine Medina Giraldo
 
Five Easy Casserole Recipes
Five Easy Casserole RecipesFive Easy Casserole Recipes
Five Easy Casserole Recipes
MaidPro Kansas City
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaElizabeth Huisa Veria
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE Luminis
David Simpson
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous Writers
ESSAYSHARK.com
 
Enlace quimico daniel
Enlace quimico danielEnlace quimico daniel
Enlace quimico daniel
Diana Carolina Camacho Cedeño
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?
Mashable
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groupsNeeraj Saini
 
Home DIYs That Smell Good
Home DIYs That Smell GoodHome DIYs That Smell Good
Home DIYs That Smell Good
MaidPro Kansas City
 

Viewers also liked (18)

Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011Daily Newsletter: 16th May, 2011
Daily Newsletter: 16th May, 2011
 
台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路台中交點Vol.6 - 蕭至威 - 圓夢之路
台中交點Vol.6 - 蕭至威 - 圓夢之路
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
 
Horario 8º semestre
Horario  8º semestreHorario  8º semestre
Horario 8º semestre
 
Guia de estudio ser estar
Guia de estudio ser estarGuia de estudio ser estar
Guia de estudio ser estar
 
Blogging for Accountants & Advisors
Blogging for Accountants & AdvisorsBlogging for Accountants & Advisors
Blogging for Accountants & Advisors
 
Experience at NSL Chemical
Experience at NSL ChemicalExperience at NSL Chemical
Experience at NSL Chemical
 
The new breaking news medium
The new breaking news mediumThe new breaking news medium
The new breaking news medium
 
Adultos Mayores.
Adultos Mayores.Adultos Mayores.
Adultos Mayores.
 
Five Easy Casserole Recipes
Five Easy Casserole RecipesFive Easy Casserole Recipes
Five Easy Casserole Recipes
 
La educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en limaLa educaciòn y las bibliotecas escolares en lima
La educaciòn y las bibliotecas escolares en lima
 
Google Analytics and Sungard HE Luminis
Google Analytics and Sungard HE LuminisGoogle Analytics and Sungard HE Luminis
Google Analytics and Sungard HE Luminis
 
The Dark Side of Famous Writers
The Dark Side of Famous WritersThe Dark Side of Famous Writers
The Dark Side of Famous Writers
 
Enlace quimico daniel
Enlace quimico danielEnlace quimico daniel
Enlace quimico daniel
 
Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?Do You Have What It Takes to Be CEO?
Do You Have What It Takes to Be CEO?
 
Group facilitation types_of_groups
Group facilitation types_of_groupsGroup facilitation types_of_groups
Group facilitation types_of_groups
 
Home DIYs That Smell Good
Home DIYs That Smell GoodHome DIYs That Smell Good
Home DIYs That Smell Good
 
SOP CV
SOP CVSOP CV
SOP CV
 

Similar to Scott Kahn Genomic Big Data.gia.052913

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Docker, Inc.
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Thermo Fisher Scientific
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
Denis C. Bauer
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
nist-spin
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
GenomeInABottle
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Denis C. Bauer
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Ilya Klabukov
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
InsideScientific
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Pistoia Alliance
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
Genotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
Andre Freitas
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product BulletinAmanda Eberle
 
05 costa
05 costa05 costa
05 costa
fruitbreedomics
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
Guy Coates
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
Miro Cupak
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
FOODCROPS
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
SHAPE Society
 

Similar to Scott Kahn Genomic Big Data.gia.052913 (20)

Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
 
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
How to Create CRISPR-Edited T Cells More Efficiently for Tomorrow's Cell Ther...
 
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
openarray_product Bulletin
openarray_product Bulletinopenarray_product Bulletin
openarray_product Bulletin
 
05 costa
05 costa05 costa
05 costa
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
 
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
2015. Pegadaraju Venkatramana. Array Tape Platform and its appliccation in ge...
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Scott Kahn Genomic Big Data.gia.052913

  • 1. © 2009 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genome Informatics Alliance 2013 Defining Genomic Big Data and its Impact on Scientific Progress
  • 2. 2 COMPANY CONFIDENTIAL – INTERNAL USE ONLY From Whence We Came… ATGCCGTTT… CCGGTTAAT… GAATTGCAG… 6:A2567C 12:C123T 20:T4678A 30-40TB ˜5TB 600GB ˜20GB
  • 3. 3 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Large amounts of data generated in genomics; multiple samples, size of data, etc Integration of digital data to enrich context of samples; DNA, RNA, methylation, time courses, spatial distributions with samples, … Fusion of digital data and categorical data; combination rules (categories), extraction from unstructured inputs, … Tools and techniques appropriate for resultant data sets; visualization, model building, exploration, … Advances require data mining rather than the one-at-a- time hypothesis testing approaches of today
  • 4. 4 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data and Personal Genome Information PERSONAL SEQUENCE (owned by individual/doctor) Issued: 01 MAR 07 Recommended next check: 28 FEB 10 PGI id: 5910322 – 61215923014 RISK VARIANTS (approved for clinical use) Human Genome Clinical studies Populations SequencingFunctional annotation 3: 12,300 3: 12,400 ( kb ) PPARg GENOMIC ANNOTATION (in public domain) Variant: C3 : 12,450,610 : T0.7/C0.3 : PPARG : Pro/Leu : Medical consequence: Associated with severe insulin resistance, diabetes mellitus, hypertension Pharmacological consequence: Resistant to thiazolidinediones CLINICAL DECISION Consultation Consent Clinical assessment Selected risk information
  • 5. 5 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Sequencing a 17-member three-generation pedigree. – Ultra deep sequencing improves sensitivity – Leveraging inheritance information improves accuracy – Data and results made publicly available Identifying ultra accurate genomic variants is enabling rapid improvements in technology and software This data will allow us to assess accuracy for many FDA submissions We are collaborating with NIST & CDC to develop a public resource for quantifying sequencing accuracy Platinum Genomes as a Truth Reference Creating a catalogue of highly-accurate SNPs, indels & SVs
  • 6. 6 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Reduction from 40 Q-scores to 8 Q-scores becoming accepted Sequencing output is still increasing exponentially therefore further compression is likely to be required Platinum genome work suggest ~95% of genome is consistently called (this 95% is known as the platinum regions) Regions which are reliably called may not need 8 Q-scores resolution – we can reduce “well sequenced” regions to 2 Q- scores Start with 8 Q-score bam file: – Reduce the platinum regions to 2 Q-scores (keep non- platinum at 8 Q-scores) – Reduce the platinum regions to 1 Q-score – Whole genome 2 Q-score – Reduce platinum region to 2 Q-scores but also keep original Q-scores of mismatches (MM) and anomalous reads – ~40Gb (20Gb CRAM) Data Reduction Via Vertical Compression (NA12882) Build Total SNPs (>Q20) SNPs diff genotype (>Q20) Not called in Q-score compressed build (>Q20) Not called in 8 Q-score build (>Q20) 8 Q-score 3,735,575 (3,627,165) - - - 8 Q-score technical replicate 3,734,849 (3,626,485) 45,584 (22,400) 80,131 (29,211) 79,405 (28,845) Platinum Genome 2 Q- score 3,732,568 (3,620,612) 3,255 (161) 3417 (63) 410 (127) Platinum Genome 1 Q- score 3,764,928 (3,626,468) 4002 (584) 2605 (75) 31,958 (2964) Whole Genome 2 Q-score 3,712,636 (3,598,400) 25,175 (1912) 24,237 (166) 1298 (112) Platinum 2 q- score keep MM and anom. reads 3,735,684 (3,627,226) 197 (123) 142 (35) 251 (102)
  • 7. 7 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Faster Data – DNA to Result in <2 Days 12 core server 64Gb RAM Sequence Analyze AnnotateSample 27 hr 8 hr HiSeq2500 Isaac analysis overnight 40 hr Fast turnaround is required for clinical applications 4.5 hr PCR Free library
  • 8. 8 COMPANY CONFIDENTIAL – INTERNAL USE ONLY WGS reveals somatic mutations in TERT gene promoter of melanoma patients Form a novel transcription factor binding motif Recurrence in melanoma is as high as any known coding mutation Importance of Non-coding Mutations – Bigger Data! -200 -100 TERT gene 0 +100 +200 Gene (mutation) Incidence in melanoma TERT (promoter) 52% BRAF (V600E) 53% CDKN2A 50% NRAS (Q61R) 28% TERT (coding) 1% Horn et al. & Huang et al., Science 2013
  • 9. 9 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Complexity of Data
  • 10. 10 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Surveillance of Leukaemia (CLL) – More Data Complexity! 0 6463 65 6662 Event Timeline Sequencing Birth DeathTreatmentDiagnosis TreatmentTreatment 0 50 100 150 200 250 a b c d e NORMAL CLASS 4 CLASS 3 CLASS 2 CLASS 1 Time points Abundance Changing subclonal populations 0 1 2 3 4 5 c NO CL CL CL CL “Remission” has disease Schuh et al., Oxford
  • 11. 11 COMPANY CONFIDENTIAL – INTERNAL USE ONLY A Deeper Complexity of Genomic Data
  • 12. 12 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Utility Requires Complex Composite Information iPad Plug and Play Cloud Allele Frequency in populations www.1000genomes.org Medical/Risk data (with expert review) Hgmd, pharmgkb Genetic Variants dbSNP Functional Effects ensembl.org, genome.ucsc.edu, encode.org Disease association genome.gov ANNOTATED GENOME ( gVCF) <1Gbyte Ancestry Tissue type Risk Carrier status Diagnosis Drug response Annotate DisseminateInterpret
  • 13. 13 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Apps Public Genomic Databases Users EMR Support & Engineering Instruments Genomic Big Data Ecosystems
  • 14. 14 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Genomic Big Data Status Researcher Treatment choice Clinician Patient Knowledge Information
  • 15. 15 COMPANY CONFIDENTIAL – INTERNAL USE ONLY Challenges for this Meeting to Address What data frameworks and models are required? How will genomes (DNA, RNA, methylation states, etc) be aggregated and compared? How will collaboration and data sharing evolve? Where will the technology go and how must the community respond to lever the benefits Brainstorming of ideas Sessions from groups that have experiences from many fields Next steps!! Actively participate and enjoy the entire experience!