An Overview of Bionimbus and the Open Cloud Consortium Robert Grossman Open Cloud Consortium Institute for Genomics & Systems BiologyUniversity of Chicago Laboratory for Advanced ComputingUniversity of Illinois at Chicago
Web Portal & Widgets Elastic Cloud Services Database Services Analysis Pipelines & Re-analysis Services Scalable data transport Large Data Cloud Services Data Ingestion Services
Case Study 1: Cistrack Resource for cis-regulatory data. Integrates databases and large data clouds. Open source. Contains raw data, intermediate, and analyzed data from approximately 300 experiments from Agilent, Affy and Solexa platforms.
Case Study 2 71 rare, deleterious SNP genotypes were validated by Sequenom. SNP concordance: Alignment against gene models: 46% TopHat alignment: 91% Ran TopHat in Bionimbus using Cube-based VMs. Total time went from 25 days to 1 day.
Case Study 3 ssh modENCODE Worm/Fly peak calling reanalysis Virtual Machines Working Space Simple Persistent Storage (glusterfs) ftp Hypervisers App App App Racks of Hardware OS OS OS Private cloud (Eucalyptus & Cube)
Hybrid Clouds ami-efa24c86 Virtual Machines Bionimbus virtual machine images Hypervisers App App App Hardware Cluster OS OS OS Public cloud Private / Community cloud
Bionimbus Delivery Mechanisms Login and use the Bionimbus cloud. Use Bionimbus Virtual Machine Images in a) your private cloud; b) Bionimbus cloud; c) public clouds such as Amazon. Bionimbus is open source and you can build your own cloud (and interoperate with ours) (First release of integrated system 3Q 2010) Bionimbus data services for genomic data, even for large datasets
Elastic Clouds Large Data Clouds Goal: Minimize cost of virtualized machines & provide on-demand. HPC Goal: Maximize data (with matching compute) and control cost. Goal: Minimize latency and control heat.
A successful cloud will… Web 2.0/3.0 user interface Compute services at the scale of a data center. High speed network to move & share the data Persist & refresh data over the long term
501(c)(3) Not-for-profit corporation Develops standards, interoperability frameworks, and reference implementations. Operates clouds. Develops benchmarks. One area of focus: bridge between private and public clouds. 14 www.opencloudconsortium.org
Operates Clouds 500 nodes 3000 cores 1.5+ PB Four data centers 10 Gbps Target to refresh 1/3 each year.
OCC Members Companies: Yahoo, Cisco, Aerospace Corp., Booz Allen Hamilton, InfoBlox, Open Data Group, Raytheon Universities: CalIT2, Johns Hopkins, Northwestern University, University of Chicago, University of Illinois at Chicago Government agencies: NASA 16
Open Cloud Consortium Perspective Vendor neutral Open, interoperable architecture Experiment at scale Operate infrastructure at the scale of a small data center Long term point of view (think like a library not cloud service provider) Think public, private & hybrid clouds