DuraSpace/ARL/DLF   E-Science InstituteDataONE: Tools and Approaches for  Supporting the Data Life Cycle          Suppleme...
DataONE: Tools and Approaches for      Supporting the Data Life Cycle        Presented by William Michener,            Uni...
3
Three Key Challenges                            Plan               Analyze               CollectI v on a nn to i        In...
1. Data Preservation and Planning✔       DuraSpace/ARL/DLF E‐Science Institute   ?   5
The Long Tail of Orphan Data                                                            “Most of the bytes                ...
Planning ?               Metadata standard?                Data repository?DuraSpace/ARL/DLF E‐Science Institute          ...
DataONE and the DMPTool             Support Data PreservationThree major components for a       Member Nodesflexible, scal...
DataONE and the DMPTool             Support Data PreservationThree major components for a       Member Nodesflexible, scal...
DataONE and the DMPTool             Support Data PreservationThree major components for a       Member Nodesflexible, scal...
Dryad (>3,000 data products)Coordinatedsubmission of articlesand underlying dataHandshaking withspecializedrepositoriesPro...
Knowledge Network for Biocomplexity   (20,000+ data packages)                                Data Types                   ...
✔Check for best practices                 ✔Create metadata                 ✔Connect to ONEShare   Data & Metadata (EML)   ...
14
15
DuraSpace/ARL/DLF E‐Science Institute                                        16
DuraSpace/ARL/DLF E‐Science Institute                                        17
18
DuraSpace/ARL/DLF E‐Science Institute                                        19
20
21
22
23
DuraSpace/ARL/DLF E‐Science Institute                                        24
DuraSpace/ARL/DLF E‐Science Institute                                        25
2. Data Discovery                    26
Data Silos             27
The DataONE Federation                         28
Member Node Functional Tiers• Tier 1: Read only, public content    ping(), getLogRecords(), getCapabilities(),get(), getSy...
ORNL DAAC  as a DataONE Member Node                           NASA collectors   DAAC Users   (UWG) Investigator Toolkit   ...
DuraSpace/ARL/DLF E‐Science Institute                                        31
32
DuraSpace/ARL/DLF E‐Science Institute                                        33
34
DuraSpace/ARL/DLF E‐Science Institute                                        35
3. InnovationThe Fourth Paradigm:1. Observational and    experimental 2. Theoretical research 3. Computer simulations of  ...
“Data Intensive Science” and the “80:20 Rule”                               Increasing Process Knowledge Decreasing Spatia...
Investigator Toolkit Support                             Plan                          DMP-Tool                Analyze    ...
Exploration, Visualization, and Analysis                           Diverse bird observations and            Model results ...
Scientific workflows     DuraSpace/ARL/DLF E‐Science Institute                                             40
Workflows Evolution with VisTrails           DuraSpace/ARL/DLF E‐Science Institute                                        ...
Collaboration environments                             42
Taverna, MyExperiment      DuraSpace/ARL/DLF E‐Science Institute                                              43
Community Engagement                       44
User AssessmentsScientists: BL                          Scientists: FU                 Library Policies: BL               ...
Results• “More than half of the respondents (56%)   reported that they did not use any metadata   standard and about 22% o...
Community Engagement     DuraSpace/ARL/DLF E‐Science Institute                                             47
Best Practices and Software Tools                                    48
Best Practices and Software Tools                                    49
June 3-21, 2013University of New Mexico   50
DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation                                            51
Recommendations• 9 areas where you can help researchers                DuraSpace/ARL/DLF E‐Science Institute              ...
1. Plan ‐ https://dmp.cdlib.org                                  53
2. Collect and assure the data                    http://www.dataone.org/best‐practices                                   ...
3.  Describe and document the data                    http://metavist2.codeplex.com/      http://knb.ecoinformatics.org/mo...
4. Select a repository for the datahttp://databib.org/http://www.dataone.org/best-practiceshttp://www.opendoar.org/       ...
5. Preserve the datahttp://daac.ornl.gov/PI/BestPractices-2010.pdf                                                 57
6. Use the data http://www.nutnet.umn.edu/                             58
7. Budget for it – 10‐>25% of total budget                                             59
8. Communicate (early and often)Meetings, web portals, newsletters,phone and video conferences                            ...
9. Train (in‐person and/or virtually)                                        61
DataONE.org DuraSpace/ARL/DLF E‐Science Institute                                         62
DataONE Team and Sponsors  • Amber Budden, Roger Dahl, Rebecca Koskela,  Bill      • Ewa Deelman    Michener, Robert Nahf,...
Questions?DuraSpace/ARL/DLF E‐Science Institute   64
Upcoming SlideShare
Loading in...5
×

ESI Supplemental Webinar 2 - DataONE presentation slides

1,241

Published on

Presented by William Michener on 11-15-2012

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,241
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

ESI Supplemental Webinar 2 - DataONE presentation slides

  1. 1. DuraSpace/ARL/DLF E-Science InstituteDataONE: Tools and Approaches for Supporting the Data Life Cycle Supplemental Webinar Thursday, November 15, 2012 1:00-2:30 pm EDT 1 1
  2. 2. DataONE: Tools and Approaches for  Supporting the Data Life Cycle Presented by William Michener, University of New MexicoProfessor and Director of e‐Science Initiatives for  University Libraries DuraSpace/ARL/DLF E‐Science Institute 2
  3. 3. 3
  4. 4. Three Key Challenges Plan Analyze CollectI v on a nn to i Integrate Assure Discover Describe Preserve 4
  5. 5. 1. Data Preservation and Planning✔ DuraSpace/ARL/DLF E‐Science Institute ? 5
  6. 6. The Long Tail of Orphan Data “Most of the bytes are at the high end, Specialized repositories but most of the (e.g. GenBank, PDB) datasets are at theVolume low end” – Jim Gray Orphan data (B. Heidorn) Rank frequency of datatype DuraSpace/ARL/DLF E‐Science Institute 6
  7. 7. Planning ? Metadata standard? Data repository?DuraSpace/ARL/DLF E‐Science Institute 7
  8. 8. DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutionsnetwork • serve local community • provide resources for  managing their data • retain copies of data 8
  9. 9. DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata  • provide resources for  catalog  managing their data • indexing for search • retain copies of data • network‐wide services • ensure content  availability (preservation)   • replication services 9
  10. 10. DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata  Investigator Toolkit • provide resources for  catalog  managing their data • indexing for search • retain copies of data • network‐wide services • ensure content  availability (preservation)   • replication services 10
  11. 11. Dryad (>3,000 data products)Coordinatedsubmission of articlesand underlying dataHandshaking withspecializedrepositoriesPromotion of reuseand incentives fordeposit DuraSpace/ARL/DLF E‐Science Institute 11
  12. 12. Knowledge Network for Biocomplexity  (20,000+ data packages) Data Types • Ecological • Environmental • Demographic • Social/Legal/EconomicContributors 60• Individual investigators 45 Data• Field stations and networks 30 Sizes• Government agencies % 15• Non‐profit partnerships 0 10‐200 >200 < 1 1‐10• Synthesis centers MB 12
  13. 13. ✔Check for best practices ✔Create metadata ✔Connect to ONEShare Data & Metadata (EML) 13
  14. 14. 14
  15. 15. 15
  16. 16. DuraSpace/ARL/DLF E‐Science Institute 16
  17. 17. DuraSpace/ARL/DLF E‐Science Institute 17
  18. 18. 18
  19. 19. DuraSpace/ARL/DLF E‐Science Institute 19
  20. 20. 20
  21. 21. 21
  22. 22. 22
  23. 23. 23
  24. 24. DuraSpace/ARL/DLF E‐Science Institute 24
  25. 25. DuraSpace/ARL/DLF E‐Science Institute 25
  26. 26. 2. Data Discovery 26
  27. 27. Data Silos 27
  28. 28. The DataONE Federation 28
  29. 29. Member Node Functional Tiers• Tier 1: Read only, public content ping(), getLogRecords(), getCapabilities(),get(), getSystemMetadata(), getChecksum(),listObjects(), synchronizationFailed()• Tier 2: Read only, with access control isAuthorized(), setAccessPolicy()• Tier 3: Read/Write using client tools create(), update(), delete()• Tier 4: Able to operate as a replication target –replicate(),getReplica()• http://mule1.dataone.org/ArchitectureDocs‐current/apis/MN_APIs.html DuraSpace/ARL/DLF E‐Science Institute 29
  30. 30. ORNL DAAC  as a DataONE Member Node  NASA collectors DAAC Users   (UWG) Investigator Toolkit DataONE Users 30
  31. 31. DuraSpace/ARL/DLF E‐Science Institute 31
  32. 32. 32
  33. 33. DuraSpace/ARL/DLF E‐Science Institute 33
  34. 34. 34
  35. 35. DuraSpace/ARL/DLF E‐Science Institute 35
  36. 36. 3. InnovationThe Fourth Paradigm:1. Observational and  experimental 2. Theoretical research 3. Computer simulations of  natural phenomena4. Data‐intensive research • new tools, techniques,  and ways of working 36 36
  37. 37. “Data Intensive Science” and the “80:20 Rule” Increasing Process Knowledge Decreasing Spatial Coverage Intensive science sites and experiments Extensive science sites Volunteer &  education networks Remote sensing Adapted from CENR‐OSTP 37
  38. 38. Investigator Toolkit Support  Plan DMP-Tool Analyze CollectKepler Integrate Assure Discover Describe Preserve 38
  39. 39. Exploration, Visualization, and Analysis Diverse bird observations and  Model results environmental data from  300,00 locations in the US  Occurrence of Indigo Bunting (2008) integrated and analyzed using  High Performance Computing  ResourcesLand Cover Jan Ap Jun Sep Dec rMeteorology • Examine patterns of  migration MODIS – Spatio‐Temporal Exploratory  • Infer how climate Remote  Model identifies factors  change may affect sensing data affecting patterns of  bird migration migration 39
  40. 40. Scientific workflows DuraSpace/ARL/DLF E‐Science Institute 40
  41. 41. Workflows Evolution with VisTrails DuraSpace/ARL/DLF E‐Science Institute 41
  42. 42. Collaboration environments 42
  43. 43. Taverna, MyExperiment DuraSpace/ARL/DLF E‐Science Institute 43
  44. 44. Community Engagement 44
  45. 45. User AssessmentsScientists: BL Scientists: FU Library Policies: BL Library Policies: FU Librarians: BL Librarians: FU Policy Makers: BL Policy Makers: FU Educators: BL Educators: FU Year 1 Year 2 Year 3 Year 4 Year 5 DuraSpace/ARL/DLF E‐Science Institute 45
  46. 46. Results• “More than half of the respondents (56%)  reported that they did not use any metadata  standard and about 22% of respondents  indicated they used their own lab metadata  standard.”• Less than 6% of scientists are making “All” of  their data available via some mechanism. DuraSpace/ARL/DLF E‐Science Institute 46
  47. 47. Community Engagement DuraSpace/ARL/DLF E‐Science Institute 47
  48. 48. Best Practices and Software Tools 48
  49. 49. Best Practices and Software Tools 49
  50. 50. June 3-21, 2013University of New Mexico 50
  51. 51. DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation  51
  52. 52. Recommendations• 9 areas where you can help researchers DuraSpace/ARL/DLF E‐Science Institute 52
  53. 53. 1. Plan ‐ https://dmp.cdlib.org 53
  54. 54. 2. Collect and assure the data               http://www.dataone.org/best‐practices 54
  55. 55. 3.  Describe and document the data http://metavist2.codeplex.com/ http://knb.ecoinformatics.org/morphoportal.jsp 55
  56. 56. 4. Select a repository for the datahttp://databib.org/http://www.dataone.org/best-practiceshttp://www.opendoar.org/ 56
  57. 57. 5. Preserve the datahttp://daac.ornl.gov/PI/BestPractices-2010.pdf 57
  58. 58. 6. Use the data http://www.nutnet.umn.edu/ 58
  59. 59. 7. Budget for it – 10‐>25% of total budget 59
  60. 60. 8. Communicate (early and often)Meetings, web portals, newsletters,phone and video conferences 60
  61. 61. 9. Train (in‐person and/or virtually) 61
  62. 62. DataONE.org DuraSpace/ARL/DLF E‐Science Institute 62
  63. 63. DataONE Team and Sponsors • Amber Budden, Roger Dahl, Rebecca Koskela,  Bill  • Ewa Deelman Michener, Robert Nahf, Skye Roseboom, Mark  Servilla • Deborah McGuinness • Dave Vieglais  • Suzie Allard, Nick Dexter, Kimberly Douglass,  • Jeff Horsburgh Carol Tenopir, Robert Waltz, Bruce Wilson • John Cobb, Bob Cook, Ranjeet Devarakonda,  • Robert Sandusky Giri Palanismy, Line Pouchard  • Patricia Cruse, John Kunze • Bertram Ludaescher • Sky Bristol, Mike Frame, Richard Huffine, Viv • Peter Honeyman Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly • Stephanie Hampton, Chris Jones, Matt  • Cliff Duke Jones, Ben Leinfelder, Andrew Pippin • Paul Allen, Rick Bonney, Steve Kelling • Carole Goble • Ryan Scherle, Todd Vision • Donald Hobern • Randy Butler • David DeRoure LEON LEVY FOUNDATION 63
  64. 64. Questions?DuraSpace/ARL/DLF E‐Science Institute 64
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×