Open Science Data Cloud - CCA 11

Open Science Data Cloud - CCA 11



This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Open Science Data Cloud - CCA 11 Open Science Data Cloud - CCA 11 Presentation Transcript

    • Open Science Data Cloud
      April 13, 2011
      Robert Grossman
      Open Cloud Consortium
      University of ChicagoOpen Data Group
    • Open Science Data Cloud
      Astronomical data
      Biological data (Bionimbus)
      NSF-PIRE OSDC Data Challenge
      Earth science data (& disaster relief)
    • Who are we?
    • 4
      • U.S based not-for-profit corporation.
      • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
      • Manages cloud computing testbeds: Open Cloud Testbed.
      • Develop reference implementations, benchmarks and standards.
    • OCC Members
      Companies: Cisco, Citrix, Yahoo!, …
      Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …
      Federal agencies: NASA
      International Partners: AIST (Japan)
      Other: National Lambda Rail
      Beginning to add international partnersin 2011.
    • Phase 12011 - 2014
      Proof of Concept2008 - 2010
      Phase 2
      • 4 locations
      • 10G networks
      • 450+ nodes
      • 3000 cores
      • 2 PB
      • Build a data center for science
      • 6-10 locations
      • 100G networks
      • $1M - $2M hardwareper year
    • Why Another Cloud Project?
    • Variety of analysis
      Scientist with laptop
      Open Science Data Cloud
      High energy physics, astronomy
      Data Size
      Medium to Large
      Very Large
      Dedicated infrastructure
      No infrastructure
      General infrastructure
    • Persistent data
      data clouds
      Large & spec. clusters
      Small to medium clusters
      Single workstations
    • What is the Open Science Data Cloud?
    • Hosted, managed, distributed facility to:
      Manage & archive your medium and large datasets
      Provide computational resources to analyze it
      Provide networking to share it with your colleagues and the public.
    • Long Time Goal
      Build a (small) data center for science.
    • And preserve your data the same way that libraries preserve books & museums preserve art.
    • OSDC Perspective
      • Take a long term point of view (think like a library not a cloud service provider)
      • Operate infrastructure at the scale of a small data center
      • Interoperate with public clouds
      • Open, interoperable architecture
      • Experiment at scale
      • Vendor neutral
    • OSDC Projects
    • Project 1. Bionimbus
    • Case Study: Public Datasets in Bionimbus
    • What Could You Do With 1 PB of Genomics Data?
      The NIH in the U.S. currently makes available for download approximately 2PB of data.
      Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
      We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
    • Case Study: ModENCODE
      Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
      BionimbusVMs were used for some of the integrative analysis.
      Bionimbus is used as a backup for the modENCODE DCC
    • Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead
    • Provide Fire / Flood Data to Rescue Workers
      Note blue bars indicating
      a surge of rainfall upstream
      Flood Dashboard
      Then a flood wave appears
      downstream at Rundu river
      gauge days later
      Short Term Pilot for 2011
      • Colored areas represent catchments where rainfall collects and drains to river basins
      • River gauges displayed as small circles
      • Detailed measurements are available on the display by clicking on the river gauge stations.
      Zambezi basin consisting of
      upper, middle and lower catchments
    • Project 3: OSDC PIRE Project
    • OSDC PIRE Project Overview
      • Research
      • Cloud middleware for data intensive computing
      • Wide area clouds
      • Training and education workshops
      • Data intensive computing using the OSDC
      • Cloud computing for scientific computing
      • Outreach
      • OSDC Data Challenge
    • Foreign Partners
      • National Institute of Advanced Industrial Science and Technology (AIST), Japan
      • Beijing Institute of Genomics (BIG)
      • Edinburgh University
      • Korea Institute of Science & Technology
      • San Paulo State University
      • Universidade Federal Fluminense, Brasil
      • University of Amsterdam
    • OSDC Data Challenge
      • Annual contest to select 3 to 4 datasets each year to add to the OSDC.
      • Looking for the most interesting datasets to add.
    • Research Focus
      • Cloud architectures for data intensive computing
      • Wide area clouds
      • Continuous learning
      • Scanning queries
    • Ways to Participate
      • Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
      • Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
      • Submit your most impressive dataset to the OSDC Data Challenge
      • Buy a container of computers and join the OSDC
    • Open Science Data Cloud Sustainability Model
    • Towards a Long Term, Sustainable Model
      Capital Exp about $1M/year
      Operating Exp about $1M/year
      Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
    • Who do you most trust to manage your data for 100 years?
      Companies may not be here tomorrow.
      Government agencies have a role, but not always easy to use.
      Think of a not for profit with that mission.
    • Buy A Container and Join the OCC
      Use 2/3 of the container for your own purposes.
      Provide 1/3 of the container to the OCC for a share replica space.
    • To Get Involved
      Join the Open Cloud Consortium:
    • Questions?