Open Science Data Cloud - CCA 11

1.
Open Science DataCloudApril 13, 2011Robert GrossmanOpen Cloud ConsortiumUniversity of ChicagoOpen Data Group

2.
Open Science DataCloudAstronomical dataBiological data (Bionimbus)NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)

3.
Who are we?

4.
4U.S based not-for-profitcorporation.

5.
Manages cloud computinginfrastructure to support scientific research: Open Science Data Cloud.

6.
Manages cloud computingtestbeds: Open Cloud Testbed.

7.
Develop reference implementations,benchmarks and standards.www.opencloudconsortium.org

8.
OCC MembersCompanies: Cisco,Citrix, Yahoo!, …Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …Federal agencies: NASAInternational Partners: AIST (Japan)Other: National Lambda RailBeginning to add international partnersin 2011.5

9.
Phase 12011 -2014Proof of Concept2008 - 2010Phase 22015-20204 locations

10.
10G networks

11.
450+ nodes

12.
3000 cores

13.
2 PB

14.
Build a datacenter for science

15.
6-10 locations

16.
100G networks

17.
$1M - $2Mhardwareper yearWhy Another Cloud Project?

18.
Variety of analysisScientistwith laptopWideOpen Science Data CloudMedHigh energy physics, astronomyLowData SizeMedium to Large SmallVery LargeDedicated infrastructureNo infrastructureGeneral infrastructure

19.
Persistent dataLargedata cloudsMeddatabasesHPCSmallCyclesLarge& spec. clustersSmall to medium clustersSingle workstations

20.
What is theOpen Science Data Cloud?

21.
Hosted, managed, distributedfacility to:Manage & archive your medium and large datasetsProvide computational resources to analyze itProvide networking to share it with your colleagues and the public.

22.
Long Time GoalBuilda (small) data center for science.

23.
And preserve yourdata the same way that libraries preserve books & museums preserve art.

24.
OSDC PerspectiveTake along term point of view (think like a library not a cloud service provider)

25.
Operate infrastructure atthe scale of a small data center

26.
Interoperate with publicclouds

27.
Open, interoperable architecture

28.
Experiment at scale

29.
Vendor neutralOSDC Projects

30.
Project 1. Bionimbuswww.bionimbus.org

31.
Case Study: PublicDatasets in Bionimbus

32.
What Could YouDo With 1 PB of Genomics Data?The NIH in the U.S. currently makes available for download approximately 2PB of data.Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.

33.
Case Study: ModENCODEBionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).BionimbusVMs were used for some of the integrative analysis.Bionimbus is used as a backup for the modENCODE DCC

34.
Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead20

35.
Provide Fire /Flood Data to Rescue WorkersNote blue bars indicating a surge of rainfall upstreamFlood DashboardThen a flood wave appears downstream at Rundu river gauge days laterShort Term Pilot for 2011 Colored areas represent catchments where rainfall collects and drains to river basins

36.
Rivergauges displayed as small circles

37.
Detailedmeasurements are available on the display by clicking on the river gauge stations.Zambezi basin consisting of upper, middle and lower catchments21

38.
Project 3: OSDC PIRE Project

39.
OSDC PIRE ProjectOverviewResearch

40.
Cloud middleware fordata intensive computing

41.
Wide area clouds

42.
Training and educationworkshops

43.
Data intensive computingusing the OSDC

44.
Cloud computing forscientific computing

45.
Outreach

46.
OSDC Data ChallengeForeignPartnersNational Institute of Advanced Industrial Science and Technology (AIST), Japan

47.
Beijing Institute ofGenomics (BIG)

48.
Edinburgh University

49.
Korea Institute ofScience & Technology

50.
San Paulo StateUniversity

51.
Universidade Federal Fluminense,Brasil

52.
University of AmsterdamOSDCData ChallengeAnnual contest to select 3 to 4 datasets each year to add to the OSDC.

53.
Looking for themost interesting datasets to add.Research FocusCloud architectures for data intensive computing

54.
Wide area clouds

55.
Continuous learning

56.
Scanning queriesWays toParticipateNominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners

57.
Send one ofyour graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing

58.
Submit your mostimpressive dataset to the OSDC Data Challenge

59.
Buy a containerof computers and join the OSDCOpen Science Data Cloud Sustainability Model

60.
Towards a LongTerm, Sustainable ModelCapital Exp about $1M/yearOperating Exp about $1M/yearMoore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.

61.
Who do youmost trust to manage your data for 100 years?Companies may not be here tomorrow.Government agencies have a role, but not always easy to use.Think of a not for profit with that mission.

62.
Buy A Containerand Join the OCCUse 2/3 of the container for your own purposes.Provide 1/3 of the container to the OCC for a share replica space.

63.
To Get InvolvedJointhe Open Cloud Consortium: www.opencloudconsortium.org

64.
Questions?

Open Science Data Cloud - CCA 11

More Related Content

What's hot

Similar to Open Science Data Cloud - CCA 11

More from Robert Grossman

Recently uploaded

Open Science Data Cloud - CCA 11