SC13 BoF: RDA and HPC


Published on

5 minute presentation during the SC13 Birds of a Feather Session on the relationship between the Research Data Alliance and High Performance Computing.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SC13 BoF: RDA and HPC

  1. 1. Research Data Alliance (RDA) for HPC SC13 Birds of a Feather session November 20, 2013 17:30-19:00 MST Colorado Convention Center Denver Colorado Contribution of John W. Cobb Oak Ridge National Lab. DataONE Project
  2. 2. Why Am I here? From what perspectives do I speak? •  Discipline scientist •  HPC application evangelist •  Cyberinfrastructure leverage for experimental facilities •  Cyberinfrastructure/HPC center operations •  Cyberinfrastructure efforts for data-Intensive science efforts Without data there is no science 2 Presentation name
  3. 3. HPC centers and archive have different service objectives Cycles not used are lost Data management involves a long-term commitment of resources 3 Presentation name
  4. 4. Comparing HPC centers and data archives Simulations Experiment/Observation •  Generate data at will •  Collect data from physical events •  Can programmatically control data quality •  Data quality may be limited by collection methods •  Can be reproduced more easily •  May be difficult, expensive, or impossible to reproduce •  ==> Can be copious •  ==> May be more limited •  weaker tradition of metadata and data quality •  long-term focus on metadata and data quality 4 Presentation name
  5. 5. Consequently different challenges •  HPC centers excel at: –  Volume and velocity –  Analysis at scale 5 Presentation name •  Archives excel at: –  Variety –  Metadata capture –  Data quality
  6. 6. Convergence of data and HPC Some DataONE experience 6 Presentation name
  7. 7. eBird pilot project exploration and visualization Diverse  bird  observa$ons  and   environmental  data  from   300,00  loca$ons  in  the  US   integrated  and  analyzed  using   High  Performance  Compu$ng   Resources   Model  results   Occurrence  of  Indigo  Bun=ng  (2008)   Land  Cover   Jan   Meteorology   MODIS  –   Remote   sensing  data   7 Presentation name Apr   Jun   Sep   Dec   •  Examine  pa;erns  of   migra$on     Spa$o-­‐Temporal  Exploratory   Model  iden$fies  factors   affec$ng  pa;erns  of  migra$on   •  Infer  how  climate   change  may  affect   bird  migra$on  
  8. 8. 8 Presentation name 8
  9. 9. Exploration, Visualization, and Analysis Benchmark   Observa=ons   Workflows for hypothesis development, testing, and exploration Interactive maps and plots for multidimensional data exploration and analysis Terrestrial   Biosphere   Model  Output   Model     Structure   Informa=on   Provenance Framework 9 9 Presentation name
  10. 10. DataONE experience •  CI created: interoperable data service functional interfaces •  4 reference interface implementations completed •  8 client-side “investigator toolkit” tools released, 4 more in development •  16 collaborating Member Node repositories (internationally) •  > 100,000 data objects published •  Conducted 81 workshops of data management •  Published 65 data management “best practices” •  Completed several baseline and follow-up surveys on state of data management with scientists, libraries, librarians, … 10 Presentation name
  11. 11. DataONE experience (cont.) About half the effort has been on education, training and outreach about data management practices 11 Presentation name
  12. 12. “Data = Human” - Genevieve Bell SC13 Keynote 12 Presentation name