Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Science Data Cloud - CCA 11


Published on

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

Published in: Technology, Business
  • Be the first to comment

Open Science Data Cloud - CCA 11

  1. 1. Open Science Data Cloud<br />April 13, 2011<br />Robert Grossman<br />Open Cloud Consortium<br />University of ChicagoOpen Data Group<br />
  2. 2. Open Science Data Cloud<br />Astronomical data<br />Biological data (Bionimbus)<br />NSF-PIRE OSDC Data Challenge<br />Earth science data (& disaster relief)<br />
  3. 3. Who are we?<br />
  4. 4. 4<br /><ul><li>U.S based not-for-profit corporation.
  5. 5. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
  6. 6. Manages cloud computing testbeds: Open Cloud Testbed.
  7. 7. Develop reference implementations, benchmarks and standards.</li></ul><br />
  8. 8. OCC Members<br />Companies: Cisco, Citrix, Yahoo!, …<br />Universities: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …<br />Federal agencies: NASA<br />International Partners: AIST (Japan)<br />Other: National Lambda Rail<br />Beginning to add international partnersin 2011.<br />5<br />
  9. 9. Phase 12011 - 2014<br />Proof of Concept2008 - 2010<br />Phase 2<br />2015-2020<br /><ul><li>4 locations
  10. 10. 10G networks
  11. 11. 450+ nodes
  12. 12. 3000 cores
  13. 13. 2 PB
  14. 14. Build a data center for science
  15. 15. 6-10 locations
  16. 16. 100G networks
  17. 17. $1M - $2M hardwareper year</li></li></ul><li>Why Another Cloud Project?<br />
  18. 18. Variety of analysis<br />Scientist with laptop<br />Wide<br />Open Science Data Cloud<br />Med<br />High energy physics, astronomy<br />Low<br />Data Size<br />Medium to Large <br />Small<br />Very Large<br />Dedicated infrastructure<br />No infrastructure<br />General infrastructure<br />
  19. 19. Persistent data<br />Large<br />data clouds<br />Med<br />databases<br />HPC<br />Small<br />Cycles<br />Large & spec. clusters<br />Small to medium clusters<br />Single workstations<br />
  20. 20. What is the Open Science Data Cloud?<br />
  21. 21. Hosted, managed, distributed facility to:<br />Manage & archive your medium and large datasets<br />Provide computational resources to analyze it<br />Provide networking to share it with your colleagues and the public.<br />
  22. 22. Long Time Goal<br />Build a (small) data center for science.<br />
  23. 23. And preserve your data the same way that libraries preserve books & museums preserve art.<br />
  24. 24. OSDC Perspective<br /><ul><li>Take a long term point of view (think like a library not a cloud service provider)
  25. 25. Operate infrastructure at the scale of a small data center
  26. 26. Interoperate with public clouds
  27. 27. Open, interoperable architecture
  28. 28. Experiment at scale
  29. 29. Vendor neutral</li></li></ul><li>OSDC Projects<br />
  30. 30. Project 1. Bionimbus<br /><br />
  31. 31. Case Study: Public Datasets in Bionimbus<br />
  32. 32. What Could You Do With 1 PB of Genomics Data?<br />The NIH in the U.S. currently makes available for download approximately 2PB of data.<br />Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.<br />We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011. <br />
  33. 33. Case Study: ModENCODE<br />Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).<br />BionimbusVMs were used for some of the integrative analysis.<br />Bionimbus is used as a backup for the modENCODE DCC<br />
  34. 34. Project Matsu 2: An Elastic Cloud For Disaster Response Daniel Mandl - NASA/GSFC, Lead<br />20<br />
  35. 35. Provide Fire / Flood Data to Rescue Workers<br />Note blue bars indicating<br /> a surge of rainfall upstream<br />Flood Dashboard<br />Then a flood wave appears<br /> downstream at Rundu river<br /> gauge days later<br />Short Term Pilot for 2011<br /><ul><li> Colored areas represent catchments where rainfall collects and drains to river basins
  36. 36. River gauges displayed as small circles
  37. 37. Detailed measurements are available on the display by clicking on the river gauge stations.</li></ul>Zambezi basin consisting of<br /> upper, middle and lower catchments<br />21<br />
  38. 38. Project 3: OSDC PIRE Project<br />
  39. 39. OSDC PIRE Project Overview<br /><ul><li>Research
  40. 40. Cloud middleware for data intensive computing
  41. 41. Wide area clouds
  42. 42. Training and education workshops
  43. 43. Data intensive computing using the OSDC
  44. 44. Cloud computing for scientific computing
  45. 45. Outreach
  46. 46. OSDC Data Challenge</li></li></ul><li>Foreign Partners<br /><ul><li>National Institute of Advanced Industrial Science and Technology (AIST), Japan
  47. 47. Beijing Institute of Genomics (BIG)
  48. 48. Edinburgh University
  49. 49. Korea Institute of Science & Technology
  50. 50. San Paulo State University
  51. 51. Universidade Federal Fluminense, Brasil
  52. 52. University of Amsterdam</li></li></ul><li>OSDC Data Challenge<br /><ul><li>Annual contest to select 3 to 4 datasets each year to add to the OSDC.
  53. 53. Looking for the most interesting datasets to add.</li></li></ul><li>Research Focus<br /><ul><li>Cloud architectures for data intensive computing
  54. 54. Wide area clouds
  55. 55. Continuous learning
  56. 56. Scanning queries</li></li></ul><li>Ways to Participate<br /><ul><li>Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
  57. 57. Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
  58. 58. Submit your most impressive dataset to the OSDC Data Challenge
  59. 59. Buy a container of computers and join the OSDC</li></li></ul><li>Open Science Data Cloud Sustainability Model<br />
  60. 60. Towards a Long Term, Sustainable Model<br />Capital Exp about $1M/year<br />Operating Exp about $1M/year<br />Moore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.<br />
  61. 61. Who do you most trust to manage your data for 100 years?<br />Companies may not be here tomorrow.<br />Government agencies have a role, but not always easy to use.<br />Think of a not for profit with that mission.<br />
  62. 62. Buy A Container and Join the OCC<br />Use 2/3 of the container for your own purposes.<br />Provide 1/3 of the container to the OCC for a share replica space.<br />
  63. 63. To Get Involved<br />Join the Open Cloud Consortium:<br />
  64. 64. Questions?<br />