Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data <br />March 29, 2011<br />Robert Grossman<br />Institute for Genomics & Systems Biology<br />Computation InstituteUniversity of Chicago<br />and<br />Open Cloud Consortium <br />
Part 1Biology, Big Data & Clouds<br />2<br />Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR). <br />
The Challenge is to Support Cubes of Next Gen Sequence Data<br />Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.<br />Different developmental stages<br />Different pathologies<br />Perturb the environment<br />
Elastic, On-Demand Computing with Usage Based Pricing Is New<br />8<br />costs the same as<br />1 computer in a rack for 120 hours<br />120 computers in three racks for 1 hour<br />
Part 2. What is Bionimbus?<br />www.bionimbus.org<br />
Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.<br />
Step 2. Send sample tobe sequenced.<br />Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.<br />IGSBSequencers<br />BID Generator<br />External Sequencers<br />Step 5. Cloud based analysis using IGSB and 3rd<br />party tools and applications. <br />Step 3a. Return rawreads.<br />Step 3b. Returnvariant calls, CNV, annotation…<br />Bionimbus Private Cloud UC<br />Bionimbus Community Cloud<br />Step 4. Secure datarouting to appropriatecloud based upon BID.<br />Bionimbus Private Cloud XY<br />Amazon<br />dbGaP<br />
What is a good unit to understand data intensive computing of biological data?<br />
Bionimbus & OSDC Today<br />The NIH in the U.S. currently makes available for download approximately 2PB of data.<br />Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.<br />Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.<br />
Case Study: Public Datasets in Bionimbus<br />
Case Study: ModENCODE<br />Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).<br />BionimbusVMs were used for some of the integrative analysis.<br />Bionimbus is used as a backup for the modENCODE DCC<br />
Case Study: IGSB<br />All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.<br />
Open Science Data Cloud<br />Astronomical data<br />Biological data (Bionimbus)<br />NSF-PIRE OSDC Data Challenge<br />Earth science data (& disaster relief)<br />
23<br /><ul><li>U.S based not-for-profit corporation.
Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
Manages cloud computing testbeds: Open Cloud Testbed.
Develop reference implementations, benchmarks and standards.</li></ul>www.opencloudconsortium.org<br />
OCC Members<br />Companies: Cisco, Citrix, Yahoo!, …<br />Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …<br />Federal agencies: NASA<br />Other: National Lambda Rail<br />Adding international partnersin 2011.<br />24<br />
Infrastructure<br />2010 Proof-of-Concept Infrastructure<br />450+ nodes<br />3000+ cores<br />3+ PB<br />Four data centers (two more to come in 2011)<br />Data centers have 10G network connections (some 100G links in 2011)<br />Plan to add approximately 1 PB of data in 2011.<br />With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.<br />
Towards a Long Term, Sustainable Model<br />Cap Exp about $1M/year<br />Op Exp about $1M/year<br />Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.<br />
Variety of analysis<br />Scientist with laptop<br />Wide<br />Open Science Data Cloud<br />Med<br />Sequencing centers, LHC, LSST<br />Low<br />Data Size<br />Medium to Large <br />Small<br />Very Large<br />Dedicated infrastructure<br />No infrastructure<br />General infrastructure<br />
Bionimbus Team*<br />David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P White<br />Institute for Genomics & Systems BiologyUniversity of Chicago<br />Jia Chen, YunhongGu and Damian Roqueiro<br />University of Illinois at Chicago<br />Lincoln Stein and ZhengZha<br />Ontario Institute for Cancer Research<br />*In alphabetical order<br />