Bionimbus Cambridge Workshop (3-28-11, v7)

Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data March 29, 2011 Robert Grossman Institute for Genomics & Systems Biology Computation InstituteUniversity of Chicago and Open Cloud Consortium

Part 1Biology, Big Data & Clouds 2 Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).

The Challenge is to Support Cubes of Next Gen Sequence Data Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set. Different developmental stages Different pathologies Perturb the environment

Genomics as a Big Data Science

Elastic, On-Demand Computing with Usage Based Pricing Is New 8 costs the same as 1 computer in a rack for 120 hours 120 computers in three racks for 1 hour

Part 2. What is Bionimbus? www.bionimbus.org

Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.

Step 2. Send sample tobe sequenced. Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc. IGSBSequencers BID Generator External Sequencers Step 5. Cloud based analysis using IGSB and 3rd party tools and applications. Step 3a. Return rawreads. Step 3b. Returnvariant calls, CNV, annotation… Bionimbus Private Cloud UC Bionimbus Community Cloud Step 4. Secure datarouting to appropriatecloud based upon BID. Bionimbus Private Cloud XY Amazon dbGaP

What is a good unit to understand data intensive computing of biological data?

Bionimbus & OSDC Today The NIH in the U.S. currently makes available for download approximately 2PB of data. Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage. Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.

GWT-based Front End Elastic Cloud Services Database Services Analysis Pipelines & Re-analysis Services Intercloud Services Large Data Cloud Services Data Ingestion Services

Bionimbus Deployment Options Bionimbus Community Cloudwww.bionimbus.org BionimbusAMIs & Amazon hosted applications Bionimbus Private Clouds

Case Study: Public Datasets in Bionimbus

Case Study: ModENCODE Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments). BionimbusVMs were used for some of the integrative analysis. Bionimbus is used as a backup for the modENCODE DCC

Case Study: IGSB All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.

Bionimbus Virtual Machine Releases 20

Open Science Data Cloud Astronomical data Biological data (Bionimbus) NSF-PIRE OSDC Data Challenge Earth science data (& disaster relief)

Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.

Manages cloud computing testbeds: Open Cloud Testbed.

Develop reference implementations, benchmarks and standards.www.opencloudconsortium.org

OCC Members Companies: Cisco, Citrix, Yahoo!, … Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, … Federal agencies: NASA Other: National Lambda Rail Adding international partnersin 2011. 24

Infrastructure 2010 Proof-of-Concept Infrastructure 450+ nodes 3000+ cores 3+ PB Four data centers (two more to come in 2011) Data centers have 10G network connections (some 100G links in 2011) Plan to add approximately 1 PB of data in 2011. With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.

Towards a Long Term, Sustainable Model Cap Exp about $1M/year Op Exp about $1M/year Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.

Variety of analysis Scientist with laptop Wide Open Science Data Cloud Med Sequencing centers, LHC, LSST Low Data Size Medium to Large Small Very Large Dedicated infrastructure No infrastructure General infrastructure

Persistent data Large data clouds Med databases HPC Small Cycles Large & spec. clusters Small to medium clusters Single workstations

Bionimbus Team* David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P White Institute for Genomics & Systems BiologyUniversity of Chicago Jia Chen, YunhongGu and Damian Roqueiro University of Illinois at Chicago Lincoln Stein and ZhengZha Ontario Institute for Cancer Research *In alphabetical order

Bionimbus Cambridge Workshop (3-28-11, v7)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Bionimbus Cambridge Workshop (3-28-11, v7)

Similar to Bionimbus Cambridge Workshop (3-28-11, v7) (20)

More from Robert Grossman

More from Robert Grossman (20)

Recently uploaded

Recently uploaded (20)

Bionimbus Cambridge Workshop (3-28-11, v7)