Bionimbus Cambridge Workshop (3-28-11, v7)


Published on

This is a talk that I gave on March 28, 2011 at a workshop at the Center for Mathematical Sciences in Cambridge, England.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bionimbus Cambridge Workshop (3-28-11, v7)

  1. 1. Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data <br />March 29, 2011<br />Robert Grossman<br />Institute for Genomics & Systems Biology<br />Computation InstituteUniversity of Chicago<br />and<br />Open Cloud Consortium <br />
  2. 2. Part 1Biology, Big Data & Clouds<br />2<br />Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR). <br />
  3. 3. Source: Lincoln Stein<br />
  4. 4. The Challenge is to Support Cubes of Next Gen Sequence Data<br />Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.<br />Different developmental stages<br />Different pathologies<br />Perturb the environment<br />
  5. 5. Genomics as a Big Data Science<br />
  6. 6. What is a new about clouds?<br />6<br />
  7. 7. 7<br />Scale is New<br />
  8. 8. Elastic, On-Demand Computing with Usage Based Pricing Is New<br />8<br />costs the same as<br />1 computer in a rack for 120 hours<br />120 computers in three racks for 1 hour<br />
  9. 9. Part 2. What is Bionimbus?<br /><br />
  10. 10. Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.<br />
  11. 11. Step 2. Send sample tobe sequenced.<br />Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.<br />IGSBSequencers<br />BID Generator<br />External Sequencers<br />Step 5. Cloud based analysis using IGSB and 3rd<br />party tools and applications. <br />Step 3a. Return rawreads.<br />Step 3b. Returnvariant calls, CNV, annotation…<br />Bionimbus Private Cloud UC<br />Bionimbus Community Cloud<br />Step 4. Secure datarouting to appropriatecloud based upon BID.<br />Bionimbus Private Cloud XY<br />Amazon<br />dbGaP<br />
  12. 12. What is a good unit to understand data intensive computing of biological data?<br />
  13. 13. Bionimbus & OSDC Today<br />The NIH in the U.S. currently makes available for download approximately 2PB of data.<br />Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.<br />Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.<br />
  14. 14. GWT-based Front End<br />Elastic Cloud Services<br />Database Services<br />Analysis Pipelines & Re-analysis Services<br />Intercloud Services<br />Large Data Cloud Services<br />Data Ingestion Services<br />
  15. 15. Bionimbus Deployment Options<br />Bionimbus Community<br />BionimbusAMIs & Amazon hosted applications<br />Bionimbus Private Clouds<br />
  16. 16. Part 3. Some Bionimbus Case<br />
  17. 17. Case Study: Public Datasets in Bionimbus<br />
  18. 18. Case Study: ModENCODE<br />Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).<br />BionimbusVMs were used for some of the integrative analysis.<br />Bionimbus is used as a backup for the modENCODE DCC<br />
  19. 19. Case Study: IGSB<br />All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.<br />
  20. 20. Bionimbus Virtual Machine Releases <br />20<br />
  21. 21. Part 4<br />What is the OSDC?<br />
  22. 22. Open Science Data Cloud<br />Astronomical data<br />Biological data (Bionimbus)<br />NSF-PIRE OSDC Data Challenge<br />Earth science data (& disaster relief)<br />
  23. 23. 23<br /><ul><li>U.S based not-for-profit corporation.
  24. 24. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
  25. 25. Manages cloud computing testbeds: Open Cloud Testbed.
  26. 26. Develop reference implementations, benchmarks and standards.</li></ul><br />
  27. 27. OCC Members<br />Companies: Cisco, Citrix, Yahoo!, …<br />Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …<br />Federal agencies: NASA<br />Other: National Lambda Rail<br />Adding international partnersin 2011.<br />24<br />
  28. 28. Infrastructure<br />2010 Proof-of-Concept Infrastructure<br />450+ nodes<br />3000+ cores<br />3+ PB<br />Four data centers (two more to come in 2011)<br />Data centers have 10G network connections (some 100G links in 2011)<br />Plan to add approximately 1 PB of data in 2011.<br />With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.<br />
  29. 29. Towards a Long Term, Sustainable Model<br />Cap Exp about $1M/year<br />Op Exp about $1M/year<br />Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.<br />
  30. 30. Variety of analysis<br />Scientist with laptop<br />Wide<br />Open Science Data Cloud<br />Med<br />Sequencing centers, LHC, LSST<br />Low<br />Data Size<br />Medium to Large <br />Small<br />Very Large<br />Dedicated infrastructure<br />No infrastructure<br />General infrastructure<br />
  31. 31. Persistent data<br />Large<br />data clouds<br />Med<br />databases<br />HPC<br />Small<br />Cycles<br />Large & spec. clusters<br />Small to medium clusters<br />Single workstations<br />
  32. 32. Bionimbus Team*<br />David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P White<br />Institute for Genomics & Systems BiologyUniversity of Chicago<br />Jia Chen, YunhongGu and Damian Roqueiro<br />University of Illinois at Chicago<br />Lincoln Stein and ZhengZha<br />Ontario Institute for Cancer Research<br />*In alphabetical order<br />
  33. 33. Acknowledgements<br />
  34. 34. Questions?<br />
  35. 35. Thank You<br />For more information:<br /><br /><br /><br />