0
Research Data
Alliance (RDA) for
HPC
SC13
Birds of a Feather session
November 20, 2013
17:30-19:00 MST
Colorado Convention...
Why Am I here? From what
perspectives do I speak?

•  Discipline scientist
•  HPC application evangelist
•  Cyberinfrastru...
HPC centers and archive have
different service objectives
Cycles not used are lost

Data management involves a
long-term c...
Comparing HPC centers and data
archives
Simulations

Experiment/Observation

•  Generate data at will

•  Collect data fro...
Consequently different challenges
•  HPC centers excel at:
–  Volume and velocity
–  Analysis at scale

5 Presentation nam...
Convergence of data and HPC
Some DataONE experience

6 Presentation name
eBird pilot project
exploration and visualization
Diverse	
  bird	
  observa$ons	
  and	
  
environmental	
  data	
  from	...
8 Presentation name

8
Exploration, Visualization, and Analysis

Benchmark	
  
Observa=ons	
  

Workflows for
hypothesis
development, testing,
an...
DataONE experience

•  CI created: interoperable data service functional interfaces
•  4 reference interface implementatio...
DataONE experience (cont.)

About half the effort has been on
education, training and outreach about
data management pract...
“Data = Human”
- Genevieve Bell SC13 Keynote

12 Presentation name
Upcoming SlideShare
Loading in...5
×

SC13 BoF: RDA and HPC

236

Published on

5 minute presentation during the SC13 Birds of a Feather Session on the relationship between the Research Data Alliance and High Performance Computing.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
236
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "SC13 BoF: RDA and HPC"

  1. 1. Research Data Alliance (RDA) for HPC SC13 Birds of a Feather session November 20, 2013 17:30-19:00 MST Colorado Convention Center Denver Colorado Contribution of John W. Cobb Oak Ridge National Lab. DataONE Project
  2. 2. Why Am I here? From what perspectives do I speak? •  Discipline scientist •  HPC application evangelist •  Cyberinfrastructure leverage for experimental facilities •  Cyberinfrastructure/HPC center operations •  Cyberinfrastructure efforts for data-Intensive science efforts Without data there is no science 2 Presentation name
  3. 3. HPC centers and archive have different service objectives Cycles not used are lost Data management involves a long-term commitment of resources 3 Presentation name
  4. 4. Comparing HPC centers and data archives Simulations Experiment/Observation •  Generate data at will •  Collect data from physical events •  Can programmatically control data quality •  Data quality may be limited by collection methods •  Can be reproduced more easily •  May be difficult, expensive, or impossible to reproduce •  ==> Can be copious •  ==> May be more limited •  weaker tradition of metadata and data quality •  long-term focus on metadata and data quality 4 Presentation name
  5. 5. Consequently different challenges •  HPC centers excel at: –  Volume and velocity –  Analysis at scale 5 Presentation name •  Archives excel at: –  Variety –  Metadata capture –  Data quality
  6. 6. Convergence of data and HPC Some DataONE experience 6 Presentation name
  7. 7. eBird pilot project exploration and visualization Diverse  bird  observa$ons  and   environmental  data  from   300,00  loca$ons  in  the  US   integrated  and  analyzed  using   High  Performance  Compu$ng   Resources   Model  results   Occurrence  of  Indigo  Bun=ng  (2008)   Land  Cover   Jan   Meteorology   MODIS  –   Remote   sensing  data   7 Presentation name Apr   Jun   Sep   Dec   •  Examine  pa;erns  of   migra$on     Spa$o-­‐Temporal  Exploratory   Model  iden$fies  factors   affec$ng  pa;erns  of  migra$on   •  Infer  how  climate   change  may  affect   bird  migra$on  
  8. 8. 8 Presentation name 8
  9. 9. Exploration, Visualization, and Analysis Benchmark   Observa=ons   Workflows for hypothesis development, testing, and exploration Interactive maps and plots for multidimensional data exploration and analysis Terrestrial   Biosphere   Model  Output   Model     Structure   Informa=on   Provenance Framework 9 9 Presentation name
  10. 10. DataONE experience •  CI created: interoperable data service functional interfaces •  4 reference interface implementations completed •  8 client-side “investigator toolkit” tools released, 4 more in development •  16 collaborating Member Node repositories (internationally) •  > 100,000 data objects published •  Conducted 81 workshops of data management •  Published 65 data management “best practices” •  Completed several baseline and follow-up surveys on state of data management with scientists, libraries, librarians, … 10 Presentation name
  11. 11. DataONE experience (cont.) About half the effort has been on education, training and outreach about data management practices 11 Presentation name
  12. 12. “Data = Human” - Genevieve Bell SC13 Keynote 12 Presentation name
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×