Open Science Data CloudApril 13, 2011Robert GrossmanOpen Cloud ConsortiumUniversity of ChicagoOpen Data Group
Open Science Data CloudAstronomical dataBiological data (Bionimbus)NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
Who are we?
4U.S based not-for-profit corporation.
Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
Manages cloud computing testbeds: Open Cloud Testbed.
Develop reference implementations, benchmarks and standards.www.opencloudconsortium.org
OCC MembersCompanies: Cisco, Citrix, Yahoo!, …Universities:  University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …Federal agencies: NASAInternational Partners: AIST (Japan)Other: National Lambda RailBeginning to add international partnersin 2011.5
Phase 12011 - 2014Proof of Concept2008 - 2010Phase 22015-20204 locations
10G networks
450+ nodes
3000 cores
2 PB
Build a data center for science
6-10 locations
100G networks
$1M - $2M hardwareper yearWhy Another Cloud Project?
Variety of analysisScientist with laptopWideOpen Science Data CloudMedHigh energy physics, astronomyLowData SizeMedium to Large SmallVery LargeDedicated infrastructureNo infrastructureGeneral infrastructure
Persistent dataLargedata cloudsMeddatabasesHPCSmallCyclesLarge & spec. clustersSmall to medium clustersSingle workstations
What is the Open Science Data Cloud?
Hosted, managed, distributed facility to:Manage & archive your medium and large datasetsProvide computational resources to analyze itProvide networking to share it with your colleagues and the public.
Long Time GoalBuild a (small) data center for science.
And preserve your data the same way that libraries preserve books & museums preserve art.
OSDC PerspectiveTake a long term point of view (think like a library not a cloud service provider)
Operate infrastructure at the scale of a small data center
Interoperate with public clouds
Open, interoperable architecture
Experiment at scale
Vendor neutralOSDC Projects
Project 1.  Bionimbuswww.bionimbus.org
Case Study: Public Datasets in Bionimbus
What Could You Do With 1 PB of Genomics Data?The NIH in the U.S. currently makes available for download approximately 2PB of data.Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
Case Study:  ModENCODEBionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).BionimbusVMs were used for some of the integrative analysis.Bionimbus is used as a backup for the modENCODE DCC

Open Science Data Cloud - CCA 11