SlideShare a Scribd company logo
1 of 31
Trends in Genomics: An Engineer’s Perspective Saul A. Kravitz, PhD December 2009
Biggest Change:  Sequencing is free 2000:   Factory, AB3700 @ Celera  - 1k 500bp reads/day/sequener = 0.5Mbp/day - Human Genome = ~ 190 sequencer yr,  ~200M$ 2002 2002:   Factory, AB3730 @ JCVI  - 10k 500bp reads/sequencer/day = 5Mbp/day - Human Genome = ~ 19 sequencer yr,    ~10M$ 2010 2010:  Benchtop, 454 GS Junior  - 70M 500bp reads/day = 35Gbp/day  - Human genome = ~ 1 sequencer day,   ~10k$ 2010:  Service, Complete Genomics - Human genome = ~ 1 day,                       ~1k$
New Bottlenecks Generating sequence data – free Data Management Data Query Data Analysis Breadth:   Communities Depth:  Populations  (e.g., flu, human) Thinking is very pricy!
Same Thinking $, More Data Project Cost
The Crux of the Problem Genomic data interpreted in context How does my genome compare to all others Which other proteins are similar to mine Size of context is growing exponentially Growth is faster than Moore’s law Hard to fight an exponential BLASTP against NCBI NR All against all BLASTP of microbial proteins
Bioinformatics Isn’t High Energy Physics Data inputs are changing rapidly CE Chromatograms, 454 Flowgrams, Color Space Error models and read lengths are changing rapidly Tools evolving rapidly Difficult to track many academic tools High quality commercial platforms emerge Even when “cooks” use shared “ingredients” “recipes” vary widely Faith based science My dataset alone has limited value Computations are (relatively) IO Intensive
Some Solutions and Directions Repeated process must be automated Even if labor is free, deviations from SOP costly Commercial Tools Market has expanded, quality improved Tools for exploring Human Variation The HuRef Browser Metagenomics Tools and Challenges Global Ocean Sampling Expedition Visualization tools Metagenomic Annotation Genome Standards Consortium and M5 Clouds and Grids ScaaS:  Science as a Service
Personal Genomics:    The future is now  (ca 2008)
HuRef Browser:  Accelerate thinking Compare 2 published genomes Craig Venter’s Diploid Genome Composite NCBI-36 Are differences real?    Noisy data? Assembly errors? Analysis errors? Methods development requires curation by biologists As genomes accumulate, more acute challenge
HuRef Browser: http://huref.jcvi.org
Zinc Finger ProteinChr19:57564487-57581356 Transcript Gene Haplotype Blocks Variations NCBI-36 Assembly-Assembly Mapping HuRef Assembly Structure
Protein Truncated by 476 bp Insertion Heterozygous SNP Homozygous SNP Insertion
Assembly Structure Insertion
Genomics vs Metagenomics Genomics – ‘Old School’ Study of a single organism's genome  Genome sequence determined using shotgun sequencing and assembly >1300 microbes sequenced, first in 1995 (at TIGR) DNA usually obtained from pure cultures (<1%) or amplication of DNA from single cells  Metagenomics   Use genomics tricks on communities – no culturing Environmental shotgun sequencing of DNA or RNA Metadata provides context
Metagenomic Questions Within an environment What biological functions are present (absent)? What organisms are present (absent)? Compare data from (dis)similar environments What are the fundamental rules of microbial ecology  Adapting to environmental conditions? How do communities respond to stimuli? How does community structure change? Search for novel proteins and protein families And diversity within known families
Global Ocean Sampling Expedition
Global Ocean Sampling Expedition  ,[object Object]
Pilot:	      2.0M reads		        4/04
Phase 1:         7.7M reads, >6M proteins    3/07
Phase 2-IO:    2.2M reads                           3/08
Phase 2:       ~30M  reads                           2010?
Diverse Environments
Open ocean, estuary, embayment, upwelling, fringing reef, atoll…4/04 3/07 3/08
GOS:  Sequence Diversity in the OceanRusch et al (PLoS Biology2007) Most sequence reads are unique Very limited assembly Most sequences not taxonomically anchored Reference genomes a basis set?  Not really. Several hundred isolates Challenges Relating shotgun data to reference genomes Structural and Functional Annotation
Browsing Large Data Collections: Fragment Recruitment Viewer Microbial Communities vs Reference Genomes Millions of sequence reads vs Thousands of genomes Definition:   A read is recruited to a sequence if: End-to-end blastN alignment exists Rapid Hypothesis Generation and Exploration How do cultured and wildtype genomes differ? Insertions, deletion, translocations Correlation with environmental factors
Fragment Recruitment Viewer Sequence Similarity Genomic Position Doug Rusch, JCVI
Doug Rusch  and Michael Press
Doug Rusch  and Michael Press
GOS Protein AnalysisYooseph et al (PLoS Biology 2007) Novel clustering process ,[object Object]
Predict putative proteins and group into related clusters
Include GOS and all known proteinsFindings ,[object Object]

More Related Content

What's hot

Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Larry Smarr
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Larry Smarr
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Larry Smarr
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Larry Smarr
 
ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridgepetermurrayrust
 
Living in a Microbial World
Living in a Microbial WorldLiving in a Microbial World
Living in a Microbial WorldLarry Smarr
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human HealthLarry Smarr
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...Larry Smarr
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionLarry Smarr
 
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeUsing Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeLarry Smarr
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
Nanopore long-read metagenomics
Nanopore long-read metagenomicsNanopore long-read metagenomics
Nanopore long-read metagenomicsMartin Hölzer
 

What's hot (20)

Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
 
ContentMining at Cambridge
ContentMining at CambridgeContentMining at Cambridge
ContentMining at Cambridge
 
Living in a Microbial World
Living in a Microbial WorldLiving in a Microbial World
Living in a Microbial World
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
 
Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for HarmonizationEU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner MicrobiomeUsing Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Nanopore long-read metagenomics
Nanopore long-read metagenomicsNanopore long-read metagenomics
Nanopore long-read metagenomics
 

Similar to Trends In Genomics

Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased OverviewPhilip Bourne
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringAdina Chuang Howe
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
ISB nov 2014
ISB nov 2014ISB nov 2014
ISB nov 2014mcdonadt
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingJonathan Eisen
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainAdina Chuang Howe
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems PharmacologyPhilip Bourne
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Saul Kravitz
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Genomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug DiscoveryGenomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug DiscoveryPhilip Bourne
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 

Similar to Trends In Genomics (20)

Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Job Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio EngineeringJob Talk Iowa State University Ag Bio Engineering
Job Talk Iowa State University Ag Bio Engineering
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
ISB nov 2014
ISB nov 2014ISB nov 2014
ISB nov 2014
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Protease Phylogeny
 Protease Phylogeny  Protease Phylogeny
Protease Phylogeny
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Bms 2010
Bms 2010Bms 2010
Bms 2010
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Use of data
Use of dataUse of data
Use of data
 
Genomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug DiscoveryGenomics and Proteomics - Impact on Drug Discovery
Genomics and Proteomics - Impact on Drug Discovery
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 

Recently uploaded

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...Suhani Kapoor
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 

Recently uploaded (20)

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...
VIP Call Girls Gandi Maisamma ( Hyderabad ) Phone 8250192130 | ₹5k To 25k Wit...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 

Trends In Genomics

  • 1. Trends in Genomics: An Engineer’s Perspective Saul A. Kravitz, PhD December 2009
  • 2. Biggest Change: Sequencing is free 2000: Factory, AB3700 @ Celera - 1k 500bp reads/day/sequener = 0.5Mbp/day - Human Genome = ~ 190 sequencer yr, ~200M$ 2002 2002: Factory, AB3730 @ JCVI - 10k 500bp reads/sequencer/day = 5Mbp/day - Human Genome = ~ 19 sequencer yr, ~10M$ 2010 2010: Benchtop, 454 GS Junior - 70M 500bp reads/day = 35Gbp/day - Human genome = ~ 1 sequencer day, ~10k$ 2010: Service, Complete Genomics - Human genome = ~ 1 day, ~1k$
  • 3. New Bottlenecks Generating sequence data – free Data Management Data Query Data Analysis Breadth: Communities Depth: Populations (e.g., flu, human) Thinking is very pricy!
  • 4. Same Thinking $, More Data Project Cost
  • 5. The Crux of the Problem Genomic data interpreted in context How does my genome compare to all others Which other proteins are similar to mine Size of context is growing exponentially Growth is faster than Moore’s law Hard to fight an exponential BLASTP against NCBI NR All against all BLASTP of microbial proteins
  • 6. Bioinformatics Isn’t High Energy Physics Data inputs are changing rapidly CE Chromatograms, 454 Flowgrams, Color Space Error models and read lengths are changing rapidly Tools evolving rapidly Difficult to track many academic tools High quality commercial platforms emerge Even when “cooks” use shared “ingredients” “recipes” vary widely Faith based science My dataset alone has limited value Computations are (relatively) IO Intensive
  • 7. Some Solutions and Directions Repeated process must be automated Even if labor is free, deviations from SOP costly Commercial Tools Market has expanded, quality improved Tools for exploring Human Variation The HuRef Browser Metagenomics Tools and Challenges Global Ocean Sampling Expedition Visualization tools Metagenomic Annotation Genome Standards Consortium and M5 Clouds and Grids ScaaS: Science as a Service
  • 8. Personal Genomics: The future is now (ca 2008)
  • 9. HuRef Browser: Accelerate thinking Compare 2 published genomes Craig Venter’s Diploid Genome Composite NCBI-36 Are differences real? Noisy data? Assembly errors? Analysis errors? Methods development requires curation by biologists As genomes accumulate, more acute challenge
  • 11. Zinc Finger ProteinChr19:57564487-57581356 Transcript Gene Haplotype Blocks Variations NCBI-36 Assembly-Assembly Mapping HuRef Assembly Structure
  • 12. Protein Truncated by 476 bp Insertion Heterozygous SNP Homozygous SNP Insertion
  • 14. Genomics vs Metagenomics Genomics – ‘Old School’ Study of a single organism's genome Genome sequence determined using shotgun sequencing and assembly >1300 microbes sequenced, first in 1995 (at TIGR) DNA usually obtained from pure cultures (<1%) or amplication of DNA from single cells Metagenomics Use genomics tricks on communities – no culturing Environmental shotgun sequencing of DNA or RNA Metadata provides context
  • 15. Metagenomic Questions Within an environment What biological functions are present (absent)? What organisms are present (absent)? Compare data from (dis)similar environments What are the fundamental rules of microbial ecology Adapting to environmental conditions? How do communities respond to stimuli? How does community structure change? Search for novel proteins and protein families And diversity within known families
  • 16. Global Ocean Sampling Expedition
  • 17.
  • 18. Pilot: 2.0M reads 4/04
  • 19. Phase 1: 7.7M reads, >6M proteins 3/07
  • 20. Phase 2-IO: 2.2M reads 3/08
  • 21. Phase 2: ~30M reads 2010?
  • 23. Open ocean, estuary, embayment, upwelling, fringing reef, atoll…4/04 3/07 3/08
  • 24. GOS: Sequence Diversity in the OceanRusch et al (PLoS Biology2007) Most sequence reads are unique Very limited assembly Most sequences not taxonomically anchored Reference genomes a basis set? Not really. Several hundred isolates Challenges Relating shotgun data to reference genomes Structural and Functional Annotation
  • 25. Browsing Large Data Collections: Fragment Recruitment Viewer Microbial Communities vs Reference Genomes Millions of sequence reads vs Thousands of genomes Definition: A read is recruited to a sequence if: End-to-end blastN alignment exists Rapid Hypothesis Generation and Exploration How do cultured and wildtype genomes differ? Insertions, deletion, translocations Correlation with environmental factors
  • 26. Fragment Recruitment Viewer Sequence Similarity Genomic Position Doug Rusch, JCVI
  • 27. Doug Rusch and Michael Press
  • 28. Doug Rusch and Michael Press
  • 29.
  • 30. Predict putative proteins and group into related clusters
  • 31.
  • 32. cover ~all existing prokaryotic families
  • 33. expands diversity of known protein families
  • 34. ~10% of large clusters are novel
  • 35. Many are of viral origin
  • 36.
  • 37. Annotation ofEnvironmental Shotgun Data Challenges: Lack of context Protein fragments Gene Finding Yooseph’s Protein Clusters + Metagene Functional Assignment Variation of JCVI prok annotation pipeline* Leverages protein cluster annotation -- soon Result: Quality Nearly Comparable to Prokaryotic Genomic Annotation
  • 38. Protein ClustersAdvantages and Disadvantages Weaknesses Homology-based Stateful (also a strength) Less sensitive (for now) Strengths Exponential  Linear? Learns over time Easy to maintain
  • 39. Increasing the pressure Nextgen + Metagenomics Deeper collections Short sequences  less informative How should we annotate? When in doubt, use BLAST against NRAA, and other large and fast-growing collections Annotation needs growing dramatically 24x7 quality software Special Hardware: FPGA? Grahics/CUDA? SIMD/SSE? New algorithms? Back to supercomputers? Sharing data and computes Standardization of data, metadata, and computes Folker Meyer, ANL
  • 40. Science as a Service (ScaaS) Standard tools as services Service-Oriented Architecture Supported by HPC as necessary Grid workflow for integration Maintain tools & data in scalable compute environment Celera Assembler in the clouds
  • 41. Vision for High Throughput Science Today: Scientist Construction of the Ark. Nuremberg Chronicle (1493).
  • 42. Vision for High Throughput Science Engineers Scientist + http://freepages.genealogy.rootsweb.ancestry.com/~thegrove/gec2a.html Rodin’s Thinker
  • 43. Credits JCVI Informatics Team Support DOE Gordon and Betty Moore Foundation NIAID

Editor's Notes

  1. With the publication of the genomes of Craig Venter and Jim Watson, and with many additional human genomes being sequenced, the era of personal genomics is here.We are going to need really good tools to take advantage of this flood of data. My goal today is to share our experience building tools to understand the variation within a single individual’s genome, and try to extrapolate forward to what we will need to understand larger collections of genomes.
  2. * A chromosome or sequence id followed by a start position and region length e.g., "chr19:450000+100000" to display the region from 450000-550000 on chromosome 19. * A dbSNP id e.g., "rs2691286" * An Ensembl annotation identifer e.g., "ENSG00000104783" * A gene name, e.g. "KLKB1", optionally followed by the amount of flanking sequence to display e.g., "KLKB1^2000"
  3. Zinc Finger example whole transcript ENST00000334564
  4. INSERT IS 467 BP  TRUNCATES THE PROTEINVNTRPROB HETEROzygousPink = non-synYellow –synpnymous