Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using Supercomputers and Supernetworks to Explore the Ocean of Life

912
views

Published on

07.06.07 …

07.06.07
Director's Colloquium
Los Alamos National Laboratory
Title: Using Supercomputers and Supernetworks to Explore the Ocean of Life
Los Alamos, NM

Published in: Economy & Finance, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
912
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Using Supercomputers and Supernetworks to Explore the Ocean of Life Director's Colloquium Los Alamos National Laboratory Los Alamos, New Mexico June 7, 2007 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
  • 2. Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's SDSC and Scripps Institution of Oceanography, is creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), funded by the Gordon and Betty Moore Foundation. CAMERA collaborates closely with DoE's Joint Genome Institute. The CAMERA computational and storage cluster containing the metagenomic data can be accessed via the web over novel dedicated 10 Gb/s light pipes (termed "lambdas") through the National LambdaRail, providing direct connection to the scalable Linux clusters in individual user laboratories. These clusters are reconfigured as "OptIPortals," providing the end user with local scalable visualization, computing, and storage. Scientists will use CAMERA for metagenomics research -- analyzing microbial genomic sequence data in the context of other microbial species, as well as in relation to the chemical and physical conditions in which microbes are sampled. The CAMERA project contains the results of the Venter Institute's Sorcerer II Expedition, which carried out the first large-scale genomic survey of microbial life in the world's oceans to produce the largest gene catalogue ever assembled, doubling the number of protein sequences currently available in GenBank. In addition to Sorcerer II's ecological genomic data, the CAMERA database will be augmented by the full genomes of more than 150 critical marine microbes enabling new comparative genomics studies. Currently over 1000 users are registered from over 40 countries.
  • 3. Calit2--A Systems Approach to the Future of the Internet and its Transformation of Our Society www.calit2.net Calit2 Has Assembled a Complex Social Network of Over 350 UC San Diego & UC Irvine Faculty Working in Multidisciplinary Teams With Staff, Students, Industry, and the Community Over 130 Companies and 300 Federal Grants in Collaboration with Calit2
  • 4. Two New Calit2 Buildings Provide New Laboratories for “Living in the Future”
    • “ Convergence” Laboratory Facilities
      • Nanotech, BioMEMS, Chips, Radio, Photonics
      • Virtual Reality, Digital Cinema, HDTV, Gaming
    • Over 1000 Researchers in Two Buildings
      • Linked via Dedicated Optical Networks
    UC Irvine www.calit2.net Preparing for a World in Which Distance is Eliminated… UC San Diego
  • 5. Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers
    • Some Areas of Concentration:
      • Algorithmic and System Biology
      • Bioinformatics
      • Metagenomics
      • Cancer Genomics
      • Human Genomic Variation and Disease
      • Proteomics
      • Mitochondrial Evolution
      • Computational Biology
      • Multi-Scale Cellular Imaging
      • Information Theory and Biological Systems
      • Telemedicine
    UC Irvine UC Irvine Southern California Telemedicine Learning Center (TLC) National Biomedical Computation Resource an NIH supported resource center
  • 6. Calit2 Facilitated Formation of the Center for Algorithmic and Systems Biology http://casb.ucsd.edu/
  • 7. Most of Evolutionary Time Was in the Microbial World Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences You Are Here
  • 8. Joint Genome Institute is a Leading Microbial Genomic Source
  • 9. Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans www.moore.org/microgenome/worldmap.asp Microbes Nominated by Leading Ocean Microbial Biologists
  • 10. Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes Phylogenetic Trees Created by Uli Stingl, Oregon State Blue Means Contains One of the Moore 155 Genomes www.moore.org/microgenome/trees.aspx
  • 11. Moore 155 Marine Microbial Genomes Gives Broad Coverage of Microbial “Tree of Life” www.moore.org/microgenome/alpha-proteobacteria.aspx Phylogenetic Trees Created by Uli Stingl, Oregon State
  • 12. Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial www.genomesonline.org 90 Metagenomes First Genome 1995 6 Genomes/ Year 2000
  • 13. Metagenomic Data Sets Are Rapidly Being Accumulated
    • “ A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms.”
    • “ We discovered significant inter-subject variability.”
    • “ Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease.”
    “ Diversity of the Human Intestinal Microbial Flora” Paul B. Eckburg, et al Science (10 June 2005) 395 Phylotypes
  • 14. Microbial Metagenomics is a Rapidly Emerging Field of Research “ Despite their ubiquity, relatively little is known about the majority of environmental microorganisms, largely because of their resistance to culture under standard laboratory conditions.” “ The application of high-throughput shotgun sequencing environmental samples has recently provided global views of those communities not obtainable from 16S rRNA or BAC clone–sequencing surveys .” Comparative Metagenomics of Microbial Communities Susannah Green Tringe, Christian von Mering, Arthur Kobayashi, Asaf A. Salamov, Kevin Chen, Hwai W. Chang, Mircea Podar, Jay M. Short, Eric J. Mathur, John C. Detter, Peer Bork, Philip Hugenholtz, Edward M. Rubin Science 22 April 2005
  • 15. Enormous Increase in Scale of Known Genes Over Last Decade 3 Billion Bases 30,000 Genes 6.3 Billion Bases 5.6 Million Genes 1.8 Million Bases 1749 Genes ~3300x 1995 First Microbe Genome 2001 Human Genome 2007 Ocean Metagenomics
  • 16. Microbial Genomics Allow Us to Look Back Nearly 4 Billion Years In the Evolution of Life Falkowski and Vargas Science 304 (5667) 2004
  • 17. The Sargasso Sea Experiment The Power of Environmental Metagenomics
    • Yielded a Total of Over 1 Billion Base Pairs of Non-Redundant Sequence
    • Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms
    • Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown
    • Identified over 1.2 Million Unknown Genes
    MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol. 304. pp. 66 - 74
  • 18. Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data Will Double Number of Proteins in GenBank! Specify Ocean Data Each Sample ~2000 Microbial Species
  • 19. Environmental Metadata: Beyond Data Collected at Sampling Site NASA AQUA-MODIS Images covering GOS sites #8 – 12, mid November, 2003 Sea Surface Temp Chlorophyll
  • 20. GOS Predicted Proteins are Largely Bacterial Source: Shibu Yooseph, et al. (PLOS Biology March 2007) ~3 Million Previously Known Proteins ~5.1 Million GOS Predicted Proteins NCBI-nr, PG, TGI-EST, ENS
  • 21. Current Universe of Medium/ Large Protein Families Source: Shibu Yooseph, et al. (PLOS Biology March 2007) Protein Families Conserved Across Tree of Life Protein Families Unique to GOS 17,067 Protein Family Clusters 1 Million CPU-Hour Computation !
  • 22. GOS Analysis -- Protein Families in Nature Have Been Poorly Explored Thus Far
    • Novel Sequence Similarity Clustering Process Predicts Proteins and Groups Related Sequences Into Clusters (Families)
    • GOS Proteins Increase Size / Diversity of Many Protein Families
    • 1,700 Novel GOS-Only Clusters Identified (>20 per Cluster)
      • 10% of 17,000 Clusters
    NCBI_nr GOS + NCBI_nr + Ensembl + TIGR Gene Indices + Prokaryotic Genomes Source: Shibu Yooseph, et al. (PLOS Biology March 2007)
  • 23. Enormous Biodiversity: Very little of GOS Metagenomic Data Assembles Well
    • Use Reference Genomes to Recruit Fragments
      • Compared 334 Finished and 250 Draft Microbial Genomes
    • Only 5 Microbial Genera Yielded substantial and Uniform Recruitment
      • Prochlorococcus, Synechococcus, Pelagibacter, Shewanella, and Burkholderia
    Source: Douglas Rusch, et al. (PLOS Biology March 2007)
  • 24. Self Organizing Maps Identifies Species Using Japanese Earth Simulator Human Fugu Arabidopsis Rice C. Elegans Drosophilia www.es.jamstec.go.jp/publication/journal/jes_vol.6/pdf/JES6_22-Abe.pdf T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23 SOM Created from an Unsupervised Neural Network Algorithm to Analyze Tetranucleotide Frequencies in a Wide Range of Genomes 10kb Moving Window
  • 25. Using SOM, Sargasso Sea Metagenomic Data Yields 92 Microbial Genera ! Eukaryotes Prokaryotes Viruses Mitochondria Chloroplasts Input Genomes: 1500 Microbes 40 Eukaryotes 1065 Viruses 642 Mitochondria 42 Chloroplasts 5kb Window T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23
  • 26. The Human Kinome: A Protein Family Implicated In Many Human Diseases Crystal Structures EPKs Manning, et al (2002) Science 298 :1912 Over 500 Protein Kinases 2% of the Human Genome Many splice variants Source: Susan Taylor, SOM, UCSD YEAST Mouse C.elegans Drosoph Arabid. Sea Urchin Dicty. Tetrahy.
  • 27. From Microbial Genomes To Human Disease
    • Microbes Have a Much Simpler Genome Than Humans
    • However, Microbes Share Many of the Core Components of the Molecular Signaling Machinery Used by Humans
    • Understand Both the Evolution and Regulation of Signaling Systems, First in Microbes and Then in Humans
    • This is a Rich Source for Mapping the Origins of EPKs
    Source: Susan Taylor, SOM, UCSD >24,000 Kinases Including 16,000 New Kinases In Venter Global Ocean Sampling Data!
  • 28. The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PI Univ. Partners: SDSC, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5M Over Five Years Now In the Fifth Year
  • 29. Dedicated Optical Channels Makes High Performance Cyberinfrastructure Possible Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 200x Shared Internet Throughput ( WDM) Source: Steve Wallach, Chiaro Networks “ Lambdas”
  • 30. National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers San Francisco Pittsburgh Cleveland San Diego Los Angeles Portland Seattle Pensacola Baton Rouge Houston San Antonio Las Cruces / El Paso Phoenix New York City Washington, DC Raleigh Jacksonville Dallas Tulsa Atlanta Kansas City Denver Ogden/ Salt Lake City Boise Albuquerque UC-TeraGrid UIC/NW-Starlight Chicago International Collaborators NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone Links Two Dozen State and Regional Optical Networks DOE, NSF, & NASA Using NLR
  • 31. OptIPortal–Termination Device for the Dedicated Gigabit/sec Lightpaths Photo Source: David Lee, Mark Ellisman NCMIR, UCSD Collaborative Analysis of Large Scale Images of Cancer Cells Integration of High Definition Video Streams with Large Scale Image Display Walls
  • 32. My OptIPortal TM – Affordable Termination Device for the OptIPuter Global Backplane
    • 20 Dual CPU Nodes, 20 24” Monitors, ~$50,000
    • 1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC!
    • Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC
    Source: Phil Papadopoulos SDSC, Calit2
  • 33. PI Larry Smarr Paul Gilna Ex. Dir. Announced January 17, 2006 $24.5M Over Seven Years
  • 34.  
  • 35. The Calit2 CAMERA Microbial Metagenomics Server is Open to the Community PLOS Biology March 2007
  • 36. CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastructure: Raw Resources, Middleware & Execution Environment NBCR Rocks Clusters Virtual Organizations Web Services KEPLER Workflow Management Vision Telescience Portal Located in Calit2@UCSD Building National Biomedical Computation Resource an NIH supported resource center
  • 37. Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Traditional User Response Request Source: Phil Papadopoulos, SDSC, Calit2 + Web Services
      • Sargasso Sea Data
      • Sorcerer II Expedition (GOS)
      • JGI Community Sequencing Project
      • Moore Marine Microbial Project
      • NASA and NOAA Satellite Data
      • Community Microbial Metagenomics Data
    Flat File Server Farm W E B PORTAL Dedicated Compute Farm (1000s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10,000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric
  • 38. Calit2 CAMERA Production Compute and Storage Complex 512 Processors ~5 Teraflops ~ 200 Terabytes Storage
  • 39. The Calit2 CAMERA Metagenomics Site is Now Active http://camera.calit2.net/
  • 40. CAMERA Research Tools and Data
  • 41. Distribution of CAMERA User Registrations Nearly 1000 Registered Users From 45 Countries
  • 42. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb
  • 43. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD
  • 44. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD
  • 45. Interactive Exploration of Marine Genomes Using 100 Million Pixels Ginger Armburst (UW), Terry Gaasterland (UCSD SIO)
  • 46. Calit2 is Now OptIPuter Connecting Remote OptIPortals for Moore-Funded Microbial Researchers via NLR NW! CICESE UW JCVI MIT SIO UCSD SDSU UIC EVL UCI OptIPortals OptIPortal CAMERA Servers
  • 47. Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA Data System www.glif.is Created in Reykjavik, Iceland 2003 Visualization courtesy of Bob Patterson, NCSA.

×