Open Science Data Cloud - CCA 11
 

Open Science Data Cloud - CCA 11

on

  • 1,431 views

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

Statistics

Views

Total Views
1,431
Views on SlideShare
1,431
Embed Views
0

Actions

Likes
1
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Open Science Data Cloud - CCA 11 Open Science Data Cloud - CCA 11 Presentation Transcript

    • Open Science Data Cloud
      April 13, 2011
      Robert Grossman
      Open Cloud Consortium
      University of ChicagoOpen Data Group
    • Open Science Data Cloud
      Astronomical data
      Biological data (Bionimbus)
      NSF-PIRE OSDC Data Challenge
      Earth science data (& disaster relief)
    • Who are we?
    • 4
      • U.S based not-for-profit corporation.
      • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
      • Manages cloud computing testbeds: Open Cloud Testbed.
      • Develop reference implementations, benchmarks and standards.
      www.opencloudconsortium.org
    • OCC Members
      Companies: Cisco, Citrix, Yahoo!, …
      Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …
      Federal agencies: NASA
      International Partners: AIST (Japan)
      Other: National Lambda Rail
      Beginning to add international partnersin 2011.
      5
    • Phase 12011 - 2014
      Proof of Concept2008 - 2010
      Phase 2
      2015-2020
      • 4 locations
      • 10G networks
      • 450+ nodes
      • 3000 cores
      • 2 PB
      • Build a data center for science
      • 6-10 locations
      • 100G networks
      • $1M - $2M hardwareper year
    • Why Another Cloud Project?
    • Variety of analysis
      Scientist with laptop
      Wide
      Open Science Data Cloud
      Med
      High energy physics, astronomy
      Low
      Data Size
      Medium to Large
      Small
      Very Large
      Dedicated infrastructure
      No infrastructure
      General infrastructure
    • Persistent data
      Large
      data clouds
      Med
      databases
      HPC
      Small
      Cycles
      Large & spec. clusters
      Small to medium clusters
      Single workstations
    • What is the Open Science Data Cloud?
    • Hosted, managed, distributed facility to:
      Manage & archive your medium and large datasets
      Provide computational resources to analyze it
      Provide networking to share it with your colleagues and the public.
    • Long Time Goal
      Build a (small) data center for science.
    • And preserve your data the same way that libraries preserve books & museums preserve art.
    • OSDC Perspective
      • Take a long term point of view (think like a library not a cloud service provider)
      • Operate infrastructure at the scale of a small data center
      • Interoperate with public clouds
      • Open, interoperable architecture
      • Experiment at scale
      • Vendor neutral
    • OSDC Projects
    • Project 1. Bionimbus
      www.bionimbus.org
    • Case Study: Public Datasets in Bionimbus
    • What Could You Do With 1 PB of Genomics Data?
      The NIH in the U.S. currently makes available for download approximately 2PB of data.
      Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
      We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
    • Case Study: ModENCODE
      Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
      BionimbusVMs were used for some of the integrative analysis.
      Bionimbus is used as a backup for the modENCODE DCC
    • Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead
      20
    • Provide Fire / Flood Data to Rescue Workers
      Note blue bars indicating
      a surge of rainfall upstream
      Flood Dashboard
      Then a flood wave appears
      downstream at Rundu river
      gauge days later
      Short Term Pilot for 2011
      • Colored areas represent catchments where rainfall collects and drains to river basins
      • River gauges displayed as small circles
      • Detailed measurements are available on the display by clicking on the river gauge stations.
      Zambezi basin consisting of
      upper, middle and lower catchments
      21
    • Project 3: OSDC PIRE Project
    • OSDC PIRE Project Overview
      • Research
      • Cloud middleware for data intensive computing
      • Wide area clouds
      • Training and education workshops
      • Data intensive computing using the OSDC
      • Cloud computing for scientific computing
      • Outreach
      • OSDC Data Challenge
    • Foreign Partners
      • National Institute of Advanced Industrial Science and Technology (AIST), Japan
      • Beijing Institute of Genomics (BIG)
      • Edinburgh University
      • Korea Institute of Science & Technology
      • San Paulo State University
      • Universidade Federal Fluminense, Brasil
      • University of Amsterdam
    • OSDC Data Challenge
      • Annual contest to select 3 to 4 datasets each year to add to the OSDC.
      • Looking for the most interesting datasets to add.
    • Research Focus
      • Cloud architectures for data intensive computing
      • Wide area clouds
      • Continuous learning
      • Scanning queries
    • Ways to Participate
      • Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
      • Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
      • Submit your most impressive dataset to the OSDC Data Challenge
      • Buy a container of computers and join the OSDC
    • Open Science Data Cloud Sustainability Model
    • Towards a Long Term, Sustainable Model
      Capital Exp about $1M/year
      Operating Exp about $1M/year
      Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
    • Who do you most trust to manage your data for 100 years?
      Companies may not be here tomorrow.
      Government agencies have a role, but not always easy to use.
      Think of a not for profit with that mission.
    • Buy A Container and Join the OCC
      Use 2/3 of the container for your own purposes.
      Provide 1/3 of the container to the OCC for a share replica space.
    • To Get Involved
      Join the Open Cloud Consortium: www.opencloudconsortium.org
    • Questions?