Using Supercomputers and Supernetworks to Explore the Ocean of Life Director's Colloquium Los Alamos National Laboratory L...
Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's SDSC and Scripps Institution o...
Calit2--A Systems Approach to the Future of the Internet and its Transformation of Our Society www.calit2.net Calit2 Has A...
Two New Calit2 Buildings Provide  New Laboratories for “Living in the Future” <ul><li>“ Convergence” Laboratory Facilities...
Calit2 Brings Computer Scientists and Engineers  Together with Biomedical Researchers <ul><li>Some Areas of Concentration:...
Calit2 Facilitated Formation of the Center for Algorithmic and Systems Biology http://casb.ucsd.edu/
Most of Evolutionary Time  Was in the Microbial World Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequenc...
Joint Genome Institute  is a Leading Microbial Genomic Source
Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans www.moore.org/microgenome/worldm...
Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes Phylogenetic Trees...
Moore 155 Marine Microbial Genomes Gives  Broad Coverage of Microbial “Tree of Life” www.moore.org/microgenome/alpha-prote...
Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial www.genomesonline.org 90 Metagenomes First Genom...
Metagenomic Data Sets Are Rapidly Being Accumulated <ul><li>“ A majority of the bacterial sequences corresponded to uncult...
Microbial Metagenomics is  a Rapidly Emerging Field of Research “ Despite their ubiquity, relatively little is known about...
Enormous Increase in Scale of Known Genes  Over Last Decade 3 Billion Bases 30,000 Genes 6.3 Billion Bases 5.6 Million Gen...
Microbial Genomics Allow Us to Look Back  Nearly 4 Billion Years In the Evolution of Life Falkowski and Vargas  Science  3...
The Sargasso Sea Experiment  The Power of Environmental Metagenomics <ul><li>Yielded a Total of  Over 1 Billion Base Pairs...
Marine Genome Sequencing Project –  Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data Will Double Number ...
Environmental Metadata:  Beyond Data Collected at Sampling Site NASA AQUA-MODIS  Images covering GOS sites #8 – 12, mid No...
GOS Predicted Proteins  are Largely Bacterial Source: Shibu Yooseph, et al. (PLOS Biology March 2007) ~3 Million Previousl...
Current Universe of  Medium/ Large Protein Families Source: Shibu Yooseph, et al. (PLOS Biology March 2007) Protein Famili...
GOS Analysis -- Protein Families in Nature  Have Been Poorly Explored Thus Far <ul><li>Novel Sequence Similarity Clusterin...
Enormous Biodiversity: Very little of GOS Metagenomic Data Assembles Well <ul><li>Use Reference Genomes to Recruit Fragmen...
Self Organizing Maps Identifies Species Using Japanese Earth Simulator Human Fugu Arabidopsis Rice C. Elegans Drosophilia ...
Using SOM, Sargasso Sea  Metagenomic Data Yields 92 Microbial Genera ! Eukaryotes Prokaryotes Viruses Mitochondria Chlorop...
The Human Kinome: A Protein Family Implicated In Many Human Diseases Crystal Structures EPKs Manning, et al (2002)  Scienc...
From Microbial Genomes  To Human Disease <ul><li>Microbes Have a Much Simpler Genome Than Humans </li></ul><ul><li>However...
The OptIPuter Project: Creating High Resolution Portals  Over Dedicated Optical Channels to Global Science Data Picture So...
Dedicated Optical Channels Makes  High Performance Cyberinfrastructure Possible Parallel Lambdas are Driving Optical Netwo...
National Lambda Rail (NLR) and TeraGrid Provides  Cyberinfrastructure Backbone for U.S. Researchers San Francisco Pittsbur...
OptIPortal–Termination Device  for the Dedicated Gigabit/sec Lightpaths Photo Source: David Lee,  Mark Ellisman NCMIR, UCS...
My OptIPortal TM  – Affordable Termination Device for the OptIPuter Global Backplane <ul><li>20 Dual CPU Nodes, 20 24” Mon...
PI Larry Smarr Paul Gilna Ex. Dir. Announced January 17, 2006 $24.5M Over Seven Years
 
The Calit2 CAMERA  Microbial Metagenomics Server  is Open to the Community PLOS Biology March 2007
CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastru...
Calit2’s Direct Access Core Architecture  Will Create Next Generation Metagenomics Server Traditional User Response Reques...
Calit2 CAMERA Production Compute and Storage Complex 512 Processors  ~5 Teraflops  ~ 200 Terabytes Storage
The Calit2 CAMERA Metagenomics Site  is Now Active http://camera.calit2.net/
CAMERA Research Tools and Data
Distribution of CAMERA  User Registrations Nearly 1000 Registered Users From 45 Countries
Use of Tiled Display Wall OptIPortal  to Interactively View Microbial Genome Acidobacteria bacterium Ellin345 Soil Bacteri...
Use of Tiled Display Wall OptIPortal  to Interactively View Microbial Genome Source:  Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal  to Interactively View Microbial Genome Source:  Raj Singh, UCSD
Interactive Exploration of Marine Genomes  Using 100 Million Pixels Ginger Armburst (UW), Terry Gaasterland (UCSD SIO)
Calit2 is Now OptIPuter Connecting Remote OptIPortals  for Moore-Funded Microbial Researchers via NLR NW! CICESE UW JCVI M...
Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA Data System www.glif.is Created in Reyk...
Upcoming SlideShare
Loading in …5
×

Using Supercomputers and Supernetworks to Explore the Ocean of Life

1,192 views

Published on

07.06.07
Director's Colloquium
Los Alamos National Laboratory
Title: Using Supercomputers and Supernetworks to Explore the Ocean of Life
Los Alamos, NM

Published in: Economy & Finance, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,192
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Supercomputers and Supernetworks to Explore the Ocean of Life

  1. 1. Using Supercomputers and Supernetworks to Explore the Ocean of Life Director's Colloquium Los Alamos National Laboratory Los Alamos, New Mexico June 7, 2007 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
  2. 2. Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's SDSC and Scripps Institution of Oceanography, is creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA), funded by the Gordon and Betty Moore Foundation. CAMERA collaborates closely with DoE's Joint Genome Institute. The CAMERA computational and storage cluster containing the metagenomic data can be accessed via the web over novel dedicated 10 Gb/s light pipes (termed &quot;lambdas&quot;) through the National LambdaRail, providing direct connection to the scalable Linux clusters in individual user laboratories. These clusters are reconfigured as &quot;OptIPortals,&quot; providing the end user with local scalable visualization, computing, and storage. Scientists will use CAMERA for metagenomics research -- analyzing microbial genomic sequence data in the context of other microbial species, as well as in relation to the chemical and physical conditions in which microbes are sampled. The CAMERA project contains the results of the Venter Institute's Sorcerer II Expedition, which carried out the first large-scale genomic survey of microbial life in the world's oceans to produce the largest gene catalogue ever assembled, doubling the number of protein sequences currently available in GenBank. In addition to Sorcerer II's ecological genomic data, the CAMERA database will be augmented by the full genomes of more than 150 critical marine microbes enabling new comparative genomics studies. Currently over 1000 users are registered from over 40 countries.
  3. 3. Calit2--A Systems Approach to the Future of the Internet and its Transformation of Our Society www.calit2.net Calit2 Has Assembled a Complex Social Network of Over 350 UC San Diego & UC Irvine Faculty Working in Multidisciplinary Teams With Staff, Students, Industry, and the Community Over 130 Companies and 300 Federal Grants in Collaboration with Calit2
  4. 4. Two New Calit2 Buildings Provide New Laboratories for “Living in the Future” <ul><li>“ Convergence” Laboratory Facilities </li></ul><ul><ul><li>Nanotech, BioMEMS, Chips, Radio, Photonics </li></ul></ul><ul><ul><li>Virtual Reality, Digital Cinema, HDTV, Gaming </li></ul></ul><ul><li>Over 1000 Researchers in Two Buildings </li></ul><ul><ul><li>Linked via Dedicated Optical Networks </li></ul></ul>UC Irvine www.calit2.net Preparing for a World in Which Distance is Eliminated… UC San Diego
  5. 5. Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers <ul><li>Some Areas of Concentration: </li></ul><ul><ul><li>Algorithmic and System Biology </li></ul></ul><ul><ul><li>Bioinformatics </li></ul></ul><ul><ul><li>Metagenomics </li></ul></ul><ul><ul><li>Cancer Genomics </li></ul></ul><ul><ul><li>Human Genomic Variation and Disease </li></ul></ul><ul><ul><li>Proteomics </li></ul></ul><ul><ul><li>Mitochondrial Evolution </li></ul></ul><ul><ul><li>Computational Biology </li></ul></ul><ul><ul><li>Multi-Scale Cellular Imaging </li></ul></ul><ul><ul><li>Information Theory and Biological Systems </li></ul></ul><ul><ul><li>Telemedicine </li></ul></ul>UC Irvine UC Irvine Southern California Telemedicine Learning Center (TLC) National Biomedical Computation Resource an NIH supported resource center
  6. 6. Calit2 Facilitated Formation of the Center for Algorithmic and Systems Biology http://casb.ucsd.edu/
  7. 7. Most of Evolutionary Time Was in the Microbial World Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences You Are Here
  8. 8. Joint Genome Institute is a Leading Microbial Genomic Source
  9. 9. Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans www.moore.org/microgenome/worldmap.asp Microbes Nominated by Leading Ocean Microbial Biologists
  10. 10. Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 155 Marine Microbes Phylogenetic Trees Created by Uli Stingl, Oregon State Blue Means Contains One of the Moore 155 Genomes www.moore.org/microgenome/trees.aspx
  11. 11. Moore 155 Marine Microbial Genomes Gives Broad Coverage of Microbial “Tree of Life” www.moore.org/microgenome/alpha-proteobacteria.aspx Phylogenetic Trees Created by Uli Stingl, Oregon State
  12. 12. Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial www.genomesonline.org 90 Metagenomes First Genome 1995 6 Genomes/ Year 2000
  13. 13. Metagenomic Data Sets Are Rapidly Being Accumulated <ul><li>“ A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms.” </li></ul><ul><li>“ We discovered significant inter-subject variability.” </li></ul><ul><li>“ Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease.” </li></ul>“ Diversity of the Human Intestinal Microbial Flora” Paul B. Eckburg, et al Science (10 June 2005) 395 Phylotypes
  14. 14. Microbial Metagenomics is a Rapidly Emerging Field of Research “ Despite their ubiquity, relatively little is known about the majority of environmental microorganisms, largely because of their resistance to culture under standard laboratory conditions.” “ The application of high-throughput shotgun sequencing environmental samples has recently provided global views of those communities not obtainable from 16S rRNA or BAC clone–sequencing surveys .” Comparative Metagenomics of Microbial Communities Susannah Green Tringe, Christian von Mering, Arthur Kobayashi, Asaf A. Salamov, Kevin Chen, Hwai W. Chang, Mircea Podar, Jay M. Short, Eric J. Mathur, John C. Detter, Peer Bork, Philip Hugenholtz, Edward M. Rubin Science 22 April 2005
  15. 15. Enormous Increase in Scale of Known Genes Over Last Decade 3 Billion Bases 30,000 Genes 6.3 Billion Bases 5.6 Million Genes 1.8 Million Bases 1749 Genes ~3300x 1995 First Microbe Genome 2001 Human Genome 2007 Ocean Metagenomics
  16. 16. Microbial Genomics Allow Us to Look Back Nearly 4 Billion Years In the Evolution of Life Falkowski and Vargas Science 304 (5667) 2004
  17. 17. The Sargasso Sea Experiment The Power of Environmental Metagenomics <ul><li>Yielded a Total of Over 1 Billion Base Pairs of Non-Redundant Sequence </li></ul><ul><li>Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms </li></ul><ul><li>Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown </li></ul><ul><li>Identified over 1.2 Million Unknown Genes </li></ul>MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol. 304. pp. 66 - 74
  18. 18. Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data Will Double Number of Proteins in GenBank! Specify Ocean Data Each Sample ~2000 Microbial Species
  19. 19. Environmental Metadata: Beyond Data Collected at Sampling Site NASA AQUA-MODIS Images covering GOS sites #8 – 12, mid November, 2003 Sea Surface Temp Chlorophyll
  20. 20. GOS Predicted Proteins are Largely Bacterial Source: Shibu Yooseph, et al. (PLOS Biology March 2007) ~3 Million Previously Known Proteins ~5.1 Million GOS Predicted Proteins NCBI-nr, PG, TGI-EST, ENS
  21. 21. Current Universe of Medium/ Large Protein Families Source: Shibu Yooseph, et al. (PLOS Biology March 2007) Protein Families Conserved Across Tree of Life Protein Families Unique to GOS 17,067 Protein Family Clusters 1 Million CPU-Hour Computation !
  22. 22. GOS Analysis -- Protein Families in Nature Have Been Poorly Explored Thus Far <ul><li>Novel Sequence Similarity Clustering Process Predicts Proteins and Groups Related Sequences Into Clusters (Families) </li></ul><ul><li>GOS Proteins Increase Size / Diversity of Many Protein Families </li></ul><ul><li>1,700 Novel GOS-Only Clusters Identified (>20 per Cluster) </li></ul><ul><ul><li>10% of 17,000 Clusters </li></ul></ul>NCBI_nr GOS + NCBI_nr + Ensembl + TIGR Gene Indices + Prokaryotic Genomes Source: Shibu Yooseph, et al. (PLOS Biology March 2007)
  23. 23. Enormous Biodiversity: Very little of GOS Metagenomic Data Assembles Well <ul><li>Use Reference Genomes to Recruit Fragments </li></ul><ul><ul><li>Compared 334 Finished and 250 Draft Microbial Genomes </li></ul></ul><ul><li>Only 5 Microbial Genera Yielded substantial and Uniform Recruitment </li></ul><ul><ul><li>Prochlorococcus, Synechococcus, Pelagibacter, Shewanella, and Burkholderia </li></ul></ul>Source: Douglas Rusch, et al. (PLOS Biology March 2007)
  24. 24. Self Organizing Maps Identifies Species Using Japanese Earth Simulator Human Fugu Arabidopsis Rice C. Elegans Drosophilia www.es.jamstec.go.jp/publication/journal/jes_vol.6/pdf/JES6_22-Abe.pdf T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23 SOM Created from an Unsupervised Neural Network Algorithm to Analyze Tetranucleotide Frequencies in a Wide Range of Genomes 10kb Moving Window
  25. 25. Using SOM, Sargasso Sea Metagenomic Data Yields 92 Microbial Genera ! Eukaryotes Prokaryotes Viruses Mitochondria Chloroplasts Input Genomes: 1500 Microbes 40 Eukaryotes 1065 Viruses 642 Mitochondria 42 Chloroplasts 5kb Window T. Abe, H. Sugawara, S. Kanaya, T. Ikemura Journal of the Earth Simulator, Volume 6, October 2006, 17–23
  26. 26. The Human Kinome: A Protein Family Implicated In Many Human Diseases Crystal Structures EPKs Manning, et al (2002) Science 298 :1912 Over 500 Protein Kinases 2% of the Human Genome Many splice variants Source: Susan Taylor, SOM, UCSD YEAST Mouse C.elegans Drosoph Arabid. Sea Urchin Dicty. Tetrahy.
  27. 27. From Microbial Genomes To Human Disease <ul><li>Microbes Have a Much Simpler Genome Than Humans </li></ul><ul><li>However, Microbes Share Many of the Core Components of the Molecular Signaling Machinery Used by Humans </li></ul><ul><li>Understand Both the Evolution and Regulation of Signaling Systems, First in Microbes and Then in Humans </li></ul><ul><li>This is a Rich Source for Mapping the Origins of EPKs </li></ul>Source: Susan Taylor, SOM, UCSD >24,000 Kinases Including 16,000 New Kinases In Venter Global Ocean Sampling Data!
  28. 28. The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PI Univ. Partners: SDSC, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5M Over Five Years Now In the Fifth Year
  29. 29. Dedicated Optical Channels Makes High Performance Cyberinfrastructure Possible Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 200x Shared Internet Throughput ( WDM) Source: Steve Wallach, Chiaro Networks “ Lambdas”
  30. 30. National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers San Francisco Pittsburgh Cleveland San Diego Los Angeles Portland Seattle Pensacola Baton Rouge Houston San Antonio Las Cruces / El Paso Phoenix New York City Washington, DC Raleigh Jacksonville Dallas Tulsa Atlanta Kansas City Denver Ogden/ Salt Lake City Boise Albuquerque UC-TeraGrid UIC/NW-Starlight Chicago International Collaborators NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone Links Two Dozen State and Regional Optical Networks DOE, NSF, & NASA Using NLR
  31. 31. OptIPortal–Termination Device for the Dedicated Gigabit/sec Lightpaths Photo Source: David Lee, Mark Ellisman NCMIR, UCSD Collaborative Analysis of Large Scale Images of Cancer Cells Integration of High Definition Video Streams with Large Scale Image Display Walls
  32. 32. My OptIPortal TM – Affordable Termination Device for the OptIPuter Global Backplane <ul><li>20 Dual CPU Nodes, 20 24” Monitors, ~$50,000 </li></ul><ul><li>1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC! </li></ul><ul><li>Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC </li></ul>Source: Phil Papadopoulos SDSC, Calit2
  33. 33. PI Larry Smarr Paul Gilna Ex. Dir. Announced January 17, 2006 $24.5M Over Seven Years
  34. 35. The Calit2 CAMERA Microbial Metagenomics Server is Open to the Community PLOS Biology March 2007
  35. 36. CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastructure: Raw Resources, Middleware & Execution Environment NBCR Rocks Clusters Virtual Organizations Web Services KEPLER Workflow Management Vision Telescience Portal Located in Calit2@UCSD Building National Biomedical Computation Resource an NIH supported resource center
  36. 37. Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Traditional User Response Request Source: Phil Papadopoulos, SDSC, Calit2 + Web Services <ul><ul><li>Sargasso Sea Data </li></ul></ul><ul><ul><li>Sorcerer II Expedition (GOS) </li></ul></ul><ul><ul><li>JGI Community Sequencing Project </li></ul></ul><ul><ul><li>Moore Marine Microbial Project </li></ul></ul><ul><ul><li>NASA and NOAA Satellite Data </li></ul></ul><ul><ul><li>Community Microbial Metagenomics Data </li></ul></ul>Flat File Server Farm W E B PORTAL Dedicated Compute Farm (1000s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10,000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric
  37. 38. Calit2 CAMERA Production Compute and Storage Complex 512 Processors ~5 Teraflops ~ 200 Terabytes Storage
  38. 39. The Calit2 CAMERA Metagenomics Site is Now Active http://camera.calit2.net/
  39. 40. CAMERA Research Tools and Data
  40. 41. Distribution of CAMERA User Registrations Nearly 1000 Registered Users From 45 Countries
  41. 42. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb
  42. 43. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD
  43. 44. Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD
  44. 45. Interactive Exploration of Marine Genomes Using 100 Million Pixels Ginger Armburst (UW), Terry Gaasterland (UCSD SIO)
  45. 46. Calit2 is Now OptIPuter Connecting Remote OptIPortals for Moore-Funded Microbial Researchers via NLR NW! CICESE UW JCVI MIT SIO UCSD SDSU UIC EVL UCI OptIPortals OptIPortal CAMERA Servers
  46. 47. Countries are Aggressively Creating Gigabit Services: Interactive Access to CAMERA Data System www.glif.is Created in Reykjavik, Iceland 2003 Visualization courtesy of Bob Patterson, NCSA.

×