0
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2...
Cost Per Megabase in Sequencing DNA  is Falling Much Faster Than Moore’s Law www.genome.gov/sequencingcosts/
Genomic Sequencing  is Driving Big Data November 30, 2011
BGI—The Beijing Genome Institute  is the World’s Largest Genomic Institute <ul><li>Main Facilities in Shenzhen and Hong Ko...
Next Generation Genome Sequencers Produce Large Data Sets Source: Chris Misleh, SOM/Calit2 UCSD
Needed: Interdisciplinary Teams Made From  Computer Science, Data Analytics, and Genomics We believe  the field of bioinfo...
Calit2 Brings Together  Computer Science and Bioinformatics  National Biomedical Computation  Resource  an NIH supported r...
Single Nucleotide Polymophisms (SNPs): Human DNA Base Pairs May Differ At Some Points Person A Person B http://en.wikipedi...
Why We Study SNPs 99.9% of One’s Individual DNA Sequence will be Identical  to that of Another Person.  Of the 0.1% Differ...
Consumer Companies Provide Your SNPs www.23andme.com
Cost of Sequencing Human Genome  is Rapidly Becoming Affordable
The Rise of Individual and Societal Genomic Testing-Promise and Concerns www.technologyreview.com/biomedicine/25218/
Publically Sharing Your Genome and Medical Records: Is it Crazy or the Future?
From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 Out of Less Than 5,000 sq. ft.! 4 Million Newborns / Year...
But the Human Genome Contains  Less Than 1% of the Bodies Genes http://commonfund.nih.gov/hmp/ The Total Number of These B...
The Human Microbiome is the Next Large NIH Drive  to Understand Human Health and Disease <ul><li>“ A majority of the bacte...
The New Science of Metagenomics “ The emerging field  of metagenomics,  where the DNA of entire communities of microbes  i...
Community Cyberinfrastructure for Advanced  Microbial Ecology Research and Analysis http://camera.calit2.net/
Calit2 CAMERA:  0ver 4000 Registered Users  From Over 80 Countries
Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 4000 Users From 90 Countries 5...
UCSD Planned Optical Networked Biomedical Researchers and Instruments <ul><li>Connects at 10 Gbps : </li></ul><ul><ul><li>...
UCSD Campus Investment in Fiber  Enables Big Data Science Source:  Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Displa...
SURFnet – a Global SuperNetwork Connecting to the Global Lambda Integrated Facility Visualization courtesy of  Donna Cox, ...
Upcoming SlideShare
Loading in...5
×

Sequencing Genomics: The New Big Data Driver

1,158

Published on

11.12.07
IntermezzoTalk
SURFnet7, Part of GigaPort3
Title: Sequencing Genomics: The New Big Data Driver
Utrecht, Netherlands

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,158
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This is a production cluster with it’s own Force10 e1200 switch. It is connected to quartzite and is labeled as the “CAMERA Force10 E1200”. We built CAMERA this way because of technology deployed successfully in Quartzite
  • Transcript of "Sequencing Genomics: The New Big Data Driver"

    1. 1. Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
    2. 2. Cost Per Megabase in Sequencing DNA is Falling Much Faster Than Moore’s Law www.genome.gov/sequencingcosts/
    3. 3. Genomic Sequencing is Driving Big Data November 30, 2011
    4. 4. BGI—The Beijing Genome Institute is the World’s Largest Genomic Institute <ul><li>Main Facilities in Shenzhen and Hong Kong, China </li></ul><ul><ul><li>Branch Facilities in Copenhagen, Boston, UC Davis </li></ul></ul><ul><li>137 Illumina HiSeq 2000 Next Generation Sequencing Systems </li></ul><ul><ul><li>Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day </li></ul></ul><ul><li>Supported by Supercomputing ~160TF, 33TB Memory </li></ul><ul><ul><li>Large-Scale (12PB) Storage </li></ul></ul>
    5. 5. Next Generation Genome Sequencers Produce Large Data Sets Source: Chris Misleh, SOM/Calit2 UCSD
    6. 6. Needed: Interdisciplinary Teams Made From Computer Science, Data Analytics, and Genomics We believe the field of bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years,” --Isaac Ro, an analyst at Goldman Sachs
    7. 7. Calit2 Brings Together Computer Science and Bioinformatics National Biomedical Computation Resource an NIH supported resource center
    8. 8. Single Nucleotide Polymophisms (SNPs): Human DNA Base Pairs May Differ At Some Points Person A Person B http://en.wikipedia.org/wiki/File:Dna-SNP.svg
    9. 9. Why We Study SNPs 99.9% of One’s Individual DNA Sequence will be Identical to that of Another Person. Of the 0.1% Difference, Over 80% will be Single Nucleotide Polymorphisms (SNPs). http://shop.perkinelmer.com/content/snps/genotyping.asp
    10. 10. Consumer Companies Provide Your SNPs www.23andme.com
    11. 11. Cost of Sequencing Human Genome is Rapidly Becoming Affordable
    12. 12. The Rise of Individual and Societal Genomic Testing-Promise and Concerns www.technologyreview.com/biomedicine/25218/
    13. 13. Publically Sharing Your Genome and Medical Records: Is it Crazy or the Future?
    14. 14. From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 Out of Less Than 5,000 sq. ft.! 4 Million Newborns / Year in U.S.
    15. 15. But the Human Genome Contains Less Than 1% of the Bodies Genes http://commonfund.nih.gov/hmp/ The Total Number of These Bacterial Cells is 10 Times the Number of Human Cells in Your Body
    16. 16. The Human Microbiome is the Next Large NIH Drive to Understand Human Health and Disease <ul><li>“ A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms.” </li></ul><ul><li>“ We discovered significant inter-subject variability.” </li></ul><ul><li>“ Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease.” </li></ul>“ Diversity of the Human Intestinal Microbial Flora” Paul B. Eckburg, et al Science (10 June 2005) 395 Phylotypes
    17. 17. The New Science of Metagenomics “ The emerging field of metagenomics, where the DNA of entire communities of microbes is studied simultaneously, presents the greatest opportunity -- perhaps since the invention of the microscope – to revolutionize understanding of the microbial world.” – National Research Council March 27, 2007 NRC Report: Metagenomic data should be made publicly available in international archives as rapidly as possible.
    18. 18. Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis http://camera.calit2.net/
    19. 19. Calit2 CAMERA: 0ver 4000 Registered Users From Over 80 Countries
    20. 20. Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 4000 Users From 90 Countries 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched/ Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2
    21. 21. UCSD Planned Optical Networked Biomedical Researchers and Instruments <ul><li>Connects at 10 Gbps : </li></ul><ul><ul><li>Microarrays </li></ul></ul><ul><ul><li>Genome Sequencers </li></ul></ul><ul><ul><li>Mass Spectrometry </li></ul></ul><ul><ul><li>Light and Electron Microscopes </li></ul></ul><ul><ul><li>Whole Body Imagers </li></ul></ul><ul><ul><li>Computing </li></ul></ul><ul><ul><li>Storage </li></ul></ul>Cellular & Molecular Medicine West National Center for Microscopy & Imaging Biomedical Research Center for Molecular Genetics Pharmaceutical Sciences Building Cellular & Molecular Medicine East CryoElectron Microscopy Facility Radiology Imaging Lab Bioengineering [email_address] San Diego Supercomputer Center
    22. 22. UCSD Campus Investment in Fiber Enables Big Data Science Source: Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb: CENIC, NLR, I2 GLIF Scientific Instruments DataOasis (Central) Storage GreenLight Data Center
    23. 23. SURFnet – a Global SuperNetwork Connecting to the Global Lambda Integrated Facility Visualization courtesy of Donna Cox, Bob Patterson, NCSA. www.glif.is
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×