SlideShare a Scribd company logo
Trends in Genomics: An Engineer’s Perspective Saul A. Kravitz, PhD December 2009
Biggest Change:  Sequencing is free 2000:   Factory, AB3700 @ Celera  - 1k 500bp reads/day/sequener = 0.5Mbp/day - Human Genome = ~ 190 sequencer yr,  ~200M$ 2002 2002:   Factory, AB3730 @ JCVI  - 10k 500bp reads/sequencer/day = 5Mbp/day - Human Genome = ~ 19 sequencer yr,    ~10M$ 2010 2010:  Benchtop, 454 GS Junior  - 70M 500bp reads/day = 35Gbp/day  - Human genome = ~ 1 sequencer day,   ~10k$ 2010:  Service, Complete Genomics - Human genome = ~ 1 day,                       ~1k$
New Bottlenecks Generating sequence data – free Data Management Data Query Data Analysis Breadth:   Communities Depth:  Populations  (e.g., flu, human) Thinking is very pricy!
Same Thinking $, More Data Project Cost
The Crux of the Problem Genomic data interpreted in context How does my genome compare to all others Which other proteins are similar to mine Size of context is growing exponentially Growth is faster than Moore’s law Hard to fight an exponential BLASTP against NCBI NR All against all BLASTP of microbial proteins
Bioinformatics Isn’t High Energy Physics Data inputs are changing rapidly CE Chromatograms, 454 Flowgrams, Color Space Error models and read lengths are changing rapidly Tools evolving rapidly Difficult to track many academic tools High quality commercial platforms emerge Even when “cooks” use shared “ingredients” “recipes” vary widely Faith based science My dataset alone has limited value Computations are (relatively) IO Intensive
Some Solutions and Directions Repeated process must be automated Even if labor is free, deviations from SOP costly Commercial Tools Market has expanded, quality improved Tools for exploring Human Variation The HuRef Browser Metagenomics Tools and Challenges Global Ocean Sampling Expedition Visualization tools Metagenomic Annotation Genome Standards Consortium and M5 Clouds and Grids ScaaS:  Science as a Service
Personal Genomics:    The future is now  (ca 2008)
HuRef Browser:  Accelerate thinking Compare 2 published genomes Craig Venter’s Diploid Genome Composite NCBI-36 Are differences real?    Noisy data? Assembly errors? Analysis errors? Methods development requires curation by biologists As genomes accumulate, more acute challenge
HuRef Browser: http://huref.jcvi.org
Zinc Finger ProteinChr19:57564487-57581356 Transcript Gene Haplotype Blocks Variations NCBI-36 Assembly-Assembly Mapping HuRef Assembly Structure
Protein Truncated by 476 bp Insertion Heterozygous SNP Homozygous SNP Insertion
Assembly Structure Insertion
Genomics vs Metagenomics Genomics – ‘Old School’ Study of a single organism's genome  Genome sequence determined using shotgun sequencing and assembly >1300 microbes sequenced, first in 1995 (at TIGR) DNA usually obtained from pure cultures (<1%) or amplication of DNA from single cells  Metagenomics   Use genomics tricks on communities – no culturing Environmental shotgun sequencing of DNA or RNA Metadata provides context
Metagenomic Questions Within an environment What biological functions are present (absent)? What organisms are present (absent)? Compare data from (dis)similar environments What are the fundamental rules of microbial ecology  Adapting to environmental conditions? How do communities respond to stimuli? How does community structure change? Search for novel proteins and protein families And diversity within known families
Global Ocean Sampling Expedition
Global Ocean Sampling Expedition  ,[object Object]
Pilot:	      2.0M reads		        4/04
Phase 1:         7.7M reads, >6M proteins    3/07
Phase 2-IO:    2.2M reads                           3/08
Phase 2:       ~30M  reads                           2010?
Diverse Environments
Open ocean, estuary, embayment, upwelling, fringing reef, atoll…4/04 3/07 3/08
GOS:  Sequence Diversity in the OceanRusch et al (PLoS Biology2007) Most sequence reads are unique Very limited assembly Most sequences not taxonomically anchored Reference genomes a basis set?  Not really. Several hundred isolates Challenges Relating shotgun data to reference genomes Structural and Functional Annotation
Browsing Large Data Collections: Fragment Recruitment Viewer Microbial Communities vs Reference Genomes Millions of sequence reads vs Thousands of genomes Definition:   A read is recruited to a sequence if: End-to-end blastN alignment exists Rapid Hypothesis Generation and Exploration How do cultured and wildtype genomes differ? Insertions, deletion, translocations Correlation with environmental factors
Fragment Recruitment Viewer Sequence Similarity Genomic Position Doug Rusch, JCVI
Doug Rusch  and Michael Press
Doug Rusch  and Michael Press
GOS Protein AnalysisYooseph et al (PLoS Biology 2007) Novel clustering process ,[object Object]
Predict putative proteins and group into related clusters
Include GOS and all known proteinsFindings ,[object Object]

More Related Content

What's hot

Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Larry Smarr
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Larry Smarr
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Larry Smarr
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
Adina Chuang Howe
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Larry Smarr
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
Adina Chuang Howe
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Larry Smarr
 
ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridge
petermurrayrust
 
Living in a Microbial World
Living in a Microbial WorldLiving in a Microbial World
Living in a Microbial World
Larry Smarr
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
Adina Chuang Howe
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
Larry Smarr
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
Larry Smarr
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
Larry Smarr
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for HarmonizationEU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
European Centre for Disease Prevention and Control (ECDC)
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
Larry Smarr
 
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeUsing Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Larry Smarr
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
Mads Albertsen
 
Nanopore long-read metagenomics
Nanopore long-read metagenomicsNanopore long-read metagenomics
Nanopore long-read metagenomics
Martin Hölzer
 

What's hot (20)

Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
 
ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridge
 
Living in a Microbial World
Living in a Microbial WorldLiving in a Microbial World
Living in a Microbial World
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for HarmonizationEU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeUsing Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Nanopore long-read metagenomics
Nanopore long-read metagenomicsNanopore long-read metagenomics
Nanopore long-read metagenomics
 

Similar to Trends In Genomics

Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
Philip Bourne
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
c.titus.brown
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
Adina Chuang Howe
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
David Cook
 
ISB nov 2014
ISB nov 2014ISB nov 2014
ISB nov 2014
mcdonadt
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
Leighton Pritchard
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
Jonathan Eisen
 
Protease Phylogeny
 Protease Phylogeny  Protease Phylogeny
Protease Phylogeny
Chris Southan
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
Ankit Bhardwaj
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Adina Chuang Howe
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
c.titus.brown
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
Philip Bourne
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Saul Kravitz
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
Bms 2010
Bms 2010Bms 2010
Bms 2010
Philip Bourne
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
 
Use of data
Use of dataUse of data
Use of data
Chris Evelo
 
Genomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug DiscoveryGenomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug Discovery
Philip Bourne
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
c.titus.brown
 

Similar to Trends In Genomics (20)

Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
ISB nov 2014
ISB nov 2014ISB nov 2014
ISB nov 2014
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Protease Phylogeny
 Protease Phylogeny  Protease Phylogeny
Protease Phylogeny
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Bms 2010
Bms 2010Bms 2010
Bms 2010
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Use of data
Use of dataUse of data
Use of data
 
Genomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug DiscoveryGenomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug Discovery
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 

Recently uploaded

Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Business storytelling: key ingredients to a story
Business storytelling: key ingredients to a storyBusiness storytelling: key ingredients to a story
Business storytelling: key ingredients to a story
Alexandra Fulford
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Kalyan Satta Matka Guessing Matka Result Main Bazar chart
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
taqyea
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
my Pandit
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip
 
2024-6-01-IMPACTSilver-Corp-Presentation.pdf
2024-6-01-IMPACTSilver-Corp-Presentation.pdf2024-6-01-IMPACTSilver-Corp-Presentation.pdf
2024-6-01-IMPACTSilver-Corp-Presentation.pdf
hartfordclub1
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
dazzjoker
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
IPLTech Electric
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium PresentationKirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
Lacey Max
 
Science Around Us Module 2 Matter Around Us
Science Around Us Module 2 Matter Around UsScience Around Us Module 2 Matter Around Us
Science Around Us Module 2 Matter Around Us
PennapaKeavsiri
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
aragme
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
timesbpobusiness
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
taqyea
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Niswey
 

Recently uploaded (20)

Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
 
Business storytelling: key ingredients to a story
Business storytelling: key ingredients to a storyBusiness storytelling: key ingredients to a story
Business storytelling: key ingredients to a story
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
 
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
 
The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
 
2024-6-01-IMPACTSilver-Corp-Presentation.pdf
2024-6-01-IMPACTSilver-Corp-Presentation.pdf2024-6-01-IMPACTSilver-Corp-Presentation.pdf
2024-6-01-IMPACTSilver-Corp-Presentation.pdf
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
 
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium PresentationKirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
 
Science Around Us Module 2 Matter Around Us
Science Around Us Module 2 Matter Around UsScience Around Us Module 2 Matter Around Us
Science Around Us Module 2 Matter Around Us
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
 

Trends In Genomics

  • 1. Trends in Genomics: An Engineer’s Perspective Saul A. Kravitz, PhD December 2009
  • 2. Biggest Change: Sequencing is free 2000: Factory, AB3700 @ Celera - 1k 500bp reads/day/sequener = 0.5Mbp/day - Human Genome = ~ 190 sequencer yr, ~200M$ 2002 2002: Factory, AB3730 @ JCVI - 10k 500bp reads/sequencer/day = 5Mbp/day - Human Genome = ~ 19 sequencer yr, ~10M$ 2010 2010: Benchtop, 454 GS Junior - 70M 500bp reads/day = 35Gbp/day - Human genome = ~ 1 sequencer day, ~10k$ 2010: Service, Complete Genomics - Human genome = ~ 1 day, ~1k$
  • 3. New Bottlenecks Generating sequence data – free Data Management Data Query Data Analysis Breadth: Communities Depth: Populations (e.g., flu, human) Thinking is very pricy!
  • 4. Same Thinking $, More Data Project Cost
  • 5. The Crux of the Problem Genomic data interpreted in context How does my genome compare to all others Which other proteins are similar to mine Size of context is growing exponentially Growth is faster than Moore’s law Hard to fight an exponential BLASTP against NCBI NR All against all BLASTP of microbial proteins
  • 6. Bioinformatics Isn’t High Energy Physics Data inputs are changing rapidly CE Chromatograms, 454 Flowgrams, Color Space Error models and read lengths are changing rapidly Tools evolving rapidly Difficult to track many academic tools High quality commercial platforms emerge Even when “cooks” use shared “ingredients” “recipes” vary widely Faith based science My dataset alone has limited value Computations are (relatively) IO Intensive
  • 7. Some Solutions and Directions Repeated process must be automated Even if labor is free, deviations from SOP costly Commercial Tools Market has expanded, quality improved Tools for exploring Human Variation The HuRef Browser Metagenomics Tools and Challenges Global Ocean Sampling Expedition Visualization tools Metagenomic Annotation Genome Standards Consortium and M5 Clouds and Grids ScaaS: Science as a Service
  • 8. Personal Genomics: The future is now (ca 2008)
  • 9. HuRef Browser: Accelerate thinking Compare 2 published genomes Craig Venter’s Diploid Genome Composite NCBI-36 Are differences real? Noisy data? Assembly errors? Analysis errors? Methods development requires curation by biologists As genomes accumulate, more acute challenge
  • 11. Zinc Finger ProteinChr19:57564487-57581356 Transcript Gene Haplotype Blocks Variations NCBI-36 Assembly-Assembly Mapping HuRef Assembly Structure
  • 12. Protein Truncated by 476 bp Insertion Heterozygous SNP Homozygous SNP Insertion
  • 14. Genomics vs Metagenomics Genomics – ‘Old School’ Study of a single organism's genome Genome sequence determined using shotgun sequencing and assembly >1300 microbes sequenced, first in 1995 (at TIGR) DNA usually obtained from pure cultures (<1%) or amplication of DNA from single cells Metagenomics Use genomics tricks on communities – no culturing Environmental shotgun sequencing of DNA or RNA Metadata provides context
  • 15. Metagenomic Questions Within an environment What biological functions are present (absent)? What organisms are present (absent)? Compare data from (dis)similar environments What are the fundamental rules of microbial ecology Adapting to environmental conditions? How do communities respond to stimuli? How does community structure change? Search for novel proteins and protein families And diversity within known families
  • 16. Global Ocean Sampling Expedition
  • 17.
  • 18. Pilot: 2.0M reads 4/04
  • 19. Phase 1: 7.7M reads, >6M proteins 3/07
  • 20. Phase 2-IO: 2.2M reads 3/08
  • 21. Phase 2: ~30M reads 2010?
  • 23. Open ocean, estuary, embayment, upwelling, fringing reef, atoll…4/04 3/07 3/08
  • 24. GOS: Sequence Diversity in the OceanRusch et al (PLoS Biology2007) Most sequence reads are unique Very limited assembly Most sequences not taxonomically anchored Reference genomes a basis set? Not really. Several hundred isolates Challenges Relating shotgun data to reference genomes Structural and Functional Annotation
  • 25. Browsing Large Data Collections: Fragment Recruitment Viewer Microbial Communities vs Reference Genomes Millions of sequence reads vs Thousands of genomes Definition: A read is recruited to a sequence if: End-to-end blastN alignment exists Rapid Hypothesis Generation and Exploration How do cultured and wildtype genomes differ? Insertions, deletion, translocations Correlation with environmental factors
  • 26. Fragment Recruitment Viewer Sequence Similarity Genomic Position Doug Rusch, JCVI
  • 27. Doug Rusch and Michael Press
  • 28. Doug Rusch and Michael Press
  • 29.
  • 30. Predict putative proteins and group into related clusters
  • 31.
  • 32. cover ~all existing prokaryotic families
  • 33. expands diversity of known protein families
  • 34. ~10% of large clusters are novel
  • 35. Many are of viral origin
  • 36.
  • 37. Annotation ofEnvironmental Shotgun Data Challenges: Lack of context Protein fragments Gene Finding Yooseph’s Protein Clusters + Metagene Functional Assignment Variation of JCVI prok annotation pipeline* Leverages protein cluster annotation -- soon Result: Quality Nearly Comparable to Prokaryotic Genomic Annotation
  • 38. Protein ClustersAdvantages and Disadvantages Weaknesses Homology-based Stateful (also a strength) Less sensitive (for now) Strengths Exponential  Linear? Learns over time Easy to maintain
  • 39. Increasing the pressure Nextgen + Metagenomics Deeper collections Short sequences  less informative How should we annotate? When in doubt, use BLAST against NRAA, and other large and fast-growing collections Annotation needs growing dramatically 24x7 quality software Special Hardware: FPGA? Grahics/CUDA? SIMD/SSE? New algorithms? Back to supercomputers? Sharing data and computes Standardization of data, metadata, and computes Folker Meyer, ANL
  • 40. Science as a Service (ScaaS) Standard tools as services Service-Oriented Architecture Supported by HPC as necessary Grid workflow for integration Maintain tools & data in scalable compute environment Celera Assembler in the clouds
  • 41. Vision for High Throughput Science Today: Scientist Construction of the Ark. Nuremberg Chronicle (1493).
  • 42. Vision for High Throughput Science Engineers Scientist + http://freepages.genealogy.rootsweb.ancestry.com/~thegrove/gec2a.html Rodin’s Thinker
  • 43. Credits JCVI Informatics Team Support DOE Gordon and Betty Moore Foundation NIAID

Editor's Notes

  1. With the publication of the genomes of Craig Venter and Jim Watson, and with many additional human genomes being sequenced, the era of personal genomics is here.We are going to need really good tools to take advantage of this flood of data. My goal today is to share our experience building tools to understand the variation within a single individual’s genome, and try to extrapolate forward to what we will need to understand larger collections of genomes.
  2. * A chromosome or sequence id followed by a start position and region length e.g., "chr19:450000+100000" to display the region from 450000-550000 on chromosome 19. * A dbSNP id e.g., "rs2691286" * An Ensembl annotation identifer e.g., "ENSG00000104783" * A gene name, e.g. "KLKB1", optionally followed by the amount of flanking sequence to display e.g., "KLKB1^2000"
  3. Zinc Finger example whole transcript ENST00000334564
  4. INSERT IS 467 BP  TRUNCATES THE PROTEINVNTRPROB HETEROzygousPink = non-synYellow –synpnymous