• Like

Open Science Data Cloud - CCA 11

  • 1,253 views
Uploaded on

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,253
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
26
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Open Science Data Cloud
    April 13, 2011
    Robert Grossman
    Open Cloud Consortium
    University of ChicagoOpen Data Group
  • 2. Open Science Data Cloud
    Astronomical data
    Biological data (Bionimbus)
    NSF-PIRE OSDC Data Challenge
    Earth science data (& disaster relief)
  • 3. Who are we?
  • 4. 4
    • U.S based not-for-profit corporation.
    • 5. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
    • 6. Manages cloud computing testbeds: Open Cloud Testbed.
    • 7. Develop reference implementations, benchmarks and standards.
    www.opencloudconsortium.org
  • 8. OCC Members
    Companies: Cisco, Citrix, Yahoo!, …
    Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …
    Federal agencies: NASA
    International Partners: AIST (Japan)
    Other: National Lambda Rail
    Beginning to add international partnersin 2011.
    5
  • 9. Phase 12011 - 2014
    Proof of Concept2008 - 2010
    Phase 2
    2015-2020
  • Why Another Cloud Project?
  • 18. Variety of analysis
    Scientist with laptop
    Wide
    Open Science Data Cloud
    Med
    High energy physics, astronomy
    Low
    Data Size
    Medium to Large
    Small
    Very Large
    Dedicated infrastructure
    No infrastructure
    General infrastructure
  • 19. Persistent data
    Large
    data clouds
    Med
    databases
    HPC
    Small
    Cycles
    Large & spec. clusters
    Small to medium clusters
    Single workstations
  • 20. What is the Open Science Data Cloud?
  • 21. Hosted, managed, distributed facility to:
    Manage & archive your medium and large datasets
    Provide computational resources to analyze it
    Provide networking to share it with your colleagues and the public.
  • 22. Long Time Goal
    Build a (small) data center for science.
  • 23. And preserve your data the same way that libraries preserve books & museums preserve art.
  • 24. OSDC Perspective
    • Take a long term point of view (think like a library not a cloud service provider)
    • 25. Operate infrastructure at the scale of a small data center
    • 26. Interoperate with public clouds
    • 27. Open, interoperable architecture
    • 28. Experiment at scale
    • 29. Vendor neutral
  • OSDC Projects
  • 30. Project 1. Bionimbus
    www.bionimbus.org
  • 31. Case Study: Public Datasets in Bionimbus
  • 32. What Could You Do With 1 PB of Genomics Data?
    The NIH in the U.S. currently makes available for download approximately 2PB of data.
    Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
    We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
  • 33. Case Study: ModENCODE
    Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
    BionimbusVMs were used for some of the integrative analysis.
    Bionimbus is used as a backup for the modENCODE DCC
  • 34. Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead
    20
  • 35. Provide Fire / Flood Data to Rescue Workers
    Note blue bars indicating
    a surge of rainfall upstream
    Flood Dashboard
    Then a flood wave appears
    downstream at Rundu river
    gauge days later
    Short Term Pilot for 2011
    • Colored areas represent catchments where rainfall collects and drains to river basins
    • 36. River gauges displayed as small circles
    • 37. Detailed measurements are available on the display by clicking on the river gauge stations.
    Zambezi basin consisting of
    upper, middle and lower catchments
    21
  • 38. Project 3: OSDC PIRE Project
  • 39. OSDC PIRE Project Overview
    • Research
    • 40. Cloud middleware for data intensive computing
    • 41. Wide area clouds
    • 42. Training and education workshops
    • 43. Data intensive computing using the OSDC
    • 44. Cloud computing for scientific computing
    • 45. Outreach
    • 46. OSDC Data Challenge
  • Foreign Partners
    • National Institute of Advanced Industrial Science and Technology (AIST), Japan
    • 47. Beijing Institute of Genomics (BIG)
    • 48. Edinburgh University
    • 49. Korea Institute of Science & Technology
    • 50. San Paulo State University
    • 51. Universidade Federal Fluminense, Brasil
    • 52. University of Amsterdam
  • OSDC Data Challenge
    • Annual contest to select 3 to 4 datasets each year to add to the OSDC.
    • 53. Looking for the most interesting datasets to add.
  • Research Focus
    • Cloud architectures for data intensive computing
    • 54. Wide area clouds
    • 55. Continuous learning
    • 56. Scanning queries
  • Ways to Participate
    • Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
    • 57. Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
    • 58. Submit your most impressive dataset to the OSDC Data Challenge
    • 59. Buy a container of computers and join the OSDC
  • Open Science Data Cloud Sustainability Model
  • 60. Towards a Long Term, Sustainable Model
    Capital Exp about $1M/year
    Operating Exp about $1M/year
    Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
  • 61. Who do you most trust to manage your data for 100 years?
    Companies may not be here tomorrow.
    Government agencies have a role, but not always easy to use.
    Think of a not for profit with that mission.
  • 62. Buy A Container and Join the OCC
    Use 2/3 of the container for your own purposes.
    Provide 1/3 of the container to the OCC for a share replica space.
  • 63. To Get Involved
    Join the Open Cloud Consortium: www.opencloudconsortium.org
  • 64. Questions?