Your SlideShare is downloading. ×

Bionimbus Cambridge Workshop (3-28-11, v7)

1,109
views

Published on

This is a talk that I gave on March 28, 2011 at a workshop at the Center for Mathematical Sciences in Cambridge, England.

This is a talk that I gave on March 28, 2011 at a workshop at the Center for Mathematical Sciences in Cambridge, England.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,109
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data
    March 29, 2011
    Robert Grossman
    Institute for Genomics & Systems Biology
    Computation InstituteUniversity of Chicago
    and
    Open Cloud Consortium
  • 2. Part 1Biology, Big Data & Clouds
    2
    Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).
  • 3. Source: Lincoln Stein
  • 4. The Challenge is to Support Cubes of Next Gen Sequence Data
    Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.
    Different developmental stages
    Different pathologies
    Perturb the environment
  • 5. Genomics as a Big Data Science
  • 6. What is a new about clouds?
    6
  • 7. 7
    Scale is New
  • 8. Elastic, On-Demand Computing with Usage Based Pricing Is New
    8
    costs the same as
    1 computer in a rack for 120 hours
    120 computers in three racks for 1 hour
  • 9. Part 2. What is Bionimbus?
    www.bionimbus.org
  • 10. Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
  • 11. Step 2. Send sample tobe sequenced.
    Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.
    IGSBSequencers
    BID Generator
    External Sequencers
    Step 5. Cloud based analysis using IGSB and 3rd
    party tools and applications.
    Step 3a. Return rawreads.
    Step 3b. Returnvariant calls, CNV, annotation…
    Bionimbus Private Cloud UC
    Bionimbus Community Cloud
    Step 4. Secure datarouting to appropriatecloud based upon BID.
    Bionimbus Private Cloud XY
    Amazon
    dbGaP
  • 12. What is a good unit to understand data intensive computing of biological data?
  • 13. Bionimbus & OSDC Today
    The NIH in the U.S. currently makes available for download approximately 2PB of data.
    Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
    Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.
  • 14. GWT-based Front End
    Elastic Cloud Services
    Database Services
    Analysis Pipelines & Re-analysis Services
    Intercloud Services
    Large Data Cloud Services
    Data Ingestion Services
  • 15. Bionimbus Deployment Options
    Bionimbus Community Cloudwww.bionimbus.org
    BionimbusAMIs & Amazon hosted applications
    Bionimbus Private Clouds
  • 16. Part 3. Some Bionimbus Case
  • 17. Case Study: Public Datasets in Bionimbus
  • 18. Case Study: ModENCODE
    Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
    BionimbusVMs were used for some of the integrative analysis.
    Bionimbus is used as a backup for the modENCODE DCC
  • 19. Case Study: IGSB
    All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
  • 20. Bionimbus Virtual Machine Releases
    20
  • 21. Part 4
    What is the OSDC?
  • 22. Open Science Data Cloud
    Astronomical data
    Biological data (Bionimbus)
    NSF-PIRE OSDC Data Challenge
    Earth science data (& disaster relief)
  • 23. 23
    • U.S based not-for-profit corporation.
    • 24. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
    • 25. Manages cloud computing testbeds: Open Cloud Testbed.
    • 26. Develop reference implementations, benchmarks and standards.
    www.opencloudconsortium.org
  • 27. OCC Members
    Companies: Cisco, Citrix, Yahoo!, …
    Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …
    Federal agencies: NASA
    Other: National Lambda Rail
    Adding international partnersin 2011.
    24
  • 28. Infrastructure
    2010 Proof-of-Concept Infrastructure
    450+ nodes
    3000+ cores
    3+ PB
    Four data centers (two more to come in 2011)
    Data centers have 10G network connections (some 100G links in 2011)
    Plan to add approximately 1 PB of data in 2011.
    With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.
  • 29. Towards a Long Term, Sustainable Model
    Cap Exp about $1M/year
    Op Exp about $1M/year
    Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
  • 30. Variety of analysis
    Scientist with laptop
    Wide
    Open Science Data Cloud
    Med
    Sequencing centers, LHC, LSST
    Low
    Data Size
    Medium to Large
    Small
    Very Large
    Dedicated infrastructure
    No infrastructure
    General infrastructure
  • 31. Persistent data
    Large
    data clouds
    Med
    databases
    HPC
    Small
    Cycles
    Large & spec. clusters
    Small to medium clusters
    Single workstations
  • 32. Bionimbus Team*
    David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P White
    Institute for Genomics & Systems BiologyUniversity of Chicago
    Jia Chen, YunhongGu and Damian Roqueiro
    University of Illinois at Chicago
    Lincoln Stein and ZhengZha
    Ontario Institute for Cancer Research
    *In alphabetical order
  • 33. Acknowledgements
  • 34. Questions?
  • 35. Thank You
    For more information: www.bionimbus.org
    www.opencloudconsortium.org
    www.igsb.org
    rgrossman.com

×