Your SlideShare is downloading. ×
0
Open Science Data Cloud<br />April 13, 2011<br />Robert Grossman<br />Open Cloud Consortium<br />University of ChicagoOpen...
Open Science Data Cloud<br />Astronomical data<br />Biological data (Bionimbus)<br />NSF-PIRE OSDC Data Challenge<br />Ear...
Who are we?<br />
4<br /><ul><li>U.S based not-for-profit corporation.
Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
Manages cloud computing testbeds: Open Cloud Testbed.
Develop reference implementations, benchmarks and standards.</li></ul>www.opencloudconsortium.org<br />
OCC Members<br />Companies: Cisco, Citrix, Yahoo!, …<br />Universities:  University of Chicago, Northwestern Univ., Johns ...
Phase 12011 - 2014<br />Proof of Concept2008 - 2010<br />Phase 2<br />2015-2020<br /><ul><li>4 locations
10G networks
450+ nodes
3000 cores
2 PB
Build a data center for science
6-10 locations
100G networks
$1M - $2M hardwareper year</li></li></ul><li>Why Another Cloud Project?<br />
Variety of analysis<br />Scientist with laptop<br />Wide<br />Open Science Data Cloud<br />Med<br />High energy physics, a...
Persistent data<br />Large<br />data clouds<br />Med<br />databases<br />HPC<br />Small<br />Cycles<br />Large & spec. clu...
What is the Open Science Data Cloud?<br />
Hosted, managed, distributed facility to:<br />Manage & archive your medium and large datasets<br />Provide computational ...
Long Time Goal<br />Build a (small) data center for science.<br />
And preserve your data the same way that libraries preserve books & museums preserve art.<br />
OSDC Perspective<br /><ul><li>Take a long term point of view (think like a library not a cloud service provider)
Operate infrastructure at the scale of a small data center
Interoperate with public clouds
Open, interoperable architecture
Experiment at scale
Vendor neutral</li></li></ul><li>OSDC Projects<br />
Project 1.  Bionimbus<br />www.bionimbus.org<br />
Case Study: Public Datasets in Bionimbus<br />
What Could You Do With 1 PB of Genomics Data?<br />The NIH in the U.S. currently makes available for download approximatel...
Case Study:  ModENCODE<br />Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).<br...
Upcoming SlideShare
Loading in...5
×

Open Science Data Cloud - CCA 11

1,362

Published on

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,362
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Open Science Data Cloud - CCA 11"

  1. 1. Open Science Data Cloud<br />April 13, 2011<br />Robert Grossman<br />Open Cloud Consortium<br />University of ChicagoOpen Data Group<br />
  2. 2. Open Science Data Cloud<br />Astronomical data<br />Biological data (Bionimbus)<br />NSF-PIRE OSDC Data Challenge<br />Earth science data (& disaster relief)<br />
  3. 3. Who are we?<br />
  4. 4. 4<br /><ul><li>U.S based not-for-profit corporation.
  5. 5. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
  6. 6. Manages cloud computing testbeds: Open Cloud Testbed.
  7. 7. Develop reference implementations, benchmarks and standards.</li></ul>www.opencloudconsortium.org<br />
  8. 8. OCC Members<br />Companies: Cisco, Citrix, Yahoo!, …<br />Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …<br />Federal agencies: NASA<br />International Partners: AIST (Japan)<br />Other: National Lambda Rail<br />Beginning to add international partnersin 2011.<br />5<br />
  9. 9. Phase 12011 - 2014<br />Proof of Concept2008 - 2010<br />Phase 2<br />2015-2020<br /><ul><li>4 locations
  10. 10. 10G networks
  11. 11. 450+ nodes
  12. 12. 3000 cores
  13. 13. 2 PB
  14. 14. Build a data center for science
  15. 15. 6-10 locations
  16. 16. 100G networks
  17. 17. $1M - $2M hardwareper year</li></li></ul><li>Why Another Cloud Project?<br />
  18. 18. Variety of analysis<br />Scientist with laptop<br />Wide<br />Open Science Data Cloud<br />Med<br />High energy physics, astronomy<br />Low<br />Data Size<br />Medium to Large <br />Small<br />Very Large<br />Dedicated infrastructure<br />No infrastructure<br />General infrastructure<br />
  19. 19. Persistent data<br />Large<br />data clouds<br />Med<br />databases<br />HPC<br />Small<br />Cycles<br />Large & spec. clusters<br />Small to medium clusters<br />Single workstations<br />
  20. 20. What is the Open Science Data Cloud?<br />
  21. 21. Hosted, managed, distributed facility to:<br />Manage & archive your medium and large datasets<br />Provide computational resources to analyze it<br />Provide networking to share it with your colleagues and the public.<br />
  22. 22. Long Time Goal<br />Build a (small) data center for science.<br />
  23. 23. And preserve your data the same way that libraries preserve books & museums preserve art.<br />
  24. 24. OSDC Perspective<br /><ul><li>Take a long term point of view (think like a library not a cloud service provider)
  25. 25. Operate infrastructure at the scale of a small data center
  26. 26. Interoperate with public clouds
  27. 27. Open, interoperable architecture
  28. 28. Experiment at scale
  29. 29. Vendor neutral</li></li></ul><li>OSDC Projects<br />
  30. 30. Project 1. Bionimbus<br />www.bionimbus.org<br />
  31. 31. Case Study: Public Datasets in Bionimbus<br />
  32. 32. What Could You Do With 1 PB of Genomics Data?<br />The NIH in the U.S. currently makes available for download approximately 2PB of data.<br />Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.<br />We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011. <br />
  33. 33. Case Study: ModENCODE<br />Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).<br />BionimbusVMs were used for some of the integrative analysis.<br />Bionimbus is used as a backup for the modENCODE DCC<br />
  34. 34. Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead<br />20<br />
  35. 35. Provide Fire / Flood Data to Rescue Workers<br />Note blue bars indicating<br /> a surge of rainfall upstream<br />Flood Dashboard<br />Then a flood wave appears<br /> downstream at Rundu river<br /> gauge days later<br />Short Term Pilot for 2011<br /><ul><li> Colored areas represent catchments where rainfall collects and drains to river basins
  36. 36. River gauges displayed as small circles
  37. 37. Detailed measurements are available on the display by clicking on the river gauge stations.</li></ul>Zambezi basin consisting of<br /> upper, middle and lower catchments<br />21<br />
  38. 38. Project 3: OSDC PIRE Project<br />
  39. 39. OSDC PIRE Project Overview<br /><ul><li>Research
  40. 40. Cloud middleware for data intensive computing
  41. 41. Wide area clouds
  42. 42. Training and education workshops
  43. 43. Data intensive computing using the OSDC
  44. 44. Cloud computing for scientific computing
  45. 45. Outreach
  46. 46. OSDC Data Challenge</li></li></ul><li>Foreign Partners<br /><ul><li>National Institute of Advanced Industrial Science and Technology (AIST), Japan
  47. 47. Beijing Institute of Genomics (BIG)
  48. 48. Edinburgh University
  49. 49. Korea Institute of Science & Technology
  50. 50. San Paulo State University
  51. 51. Universidade Federal Fluminense, Brasil
  52. 52. University of Amsterdam</li></li></ul><li>OSDC Data Challenge<br /><ul><li>Annual contest to select 3 to 4 datasets each year to add to the OSDC.
  53. 53. Looking for the most interesting datasets to add.</li></li></ul><li>Research Focus<br /><ul><li>Cloud architectures for data intensive computing
  54. 54. Wide area clouds
  55. 55. Continuous learning
  56. 56. Scanning queries</li></li></ul><li>Ways to Participate<br /><ul><li>Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
  57. 57. Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
  58. 58. Submit your most impressive dataset to the OSDC Data Challenge
  59. 59. Buy a container of computers and join the OSDC</li></li></ul><li>Open Science Data Cloud Sustainability Model<br />
  60. 60. Towards a Long Term, Sustainable Model<br />Capital Exp about $1M/year<br />Operating Exp about $1M/year<br />Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.<br />
  61. 61. Who do you most trust to manage your data for 100 years?<br />Companies may not be here tomorrow.<br />Government agencies have a role, but not always easy to use.<br />Think of a not for profit with that mission.<br />
  62. 62. Buy A Container and Join the OCC<br />Use 2/3 of the container for your own purposes.<br />Provide 1/3 of the container to the OCC for a share replica space.<br />
  63. 63. To Get Involved<br />Join the Open Cloud Consortium: www.opencloudconsortium.org<br />
  64. 64. Questions?<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×