ESI Supplemental Webinar 2 - DataONE presentation slides
Upcoming SlideShare
Loading in...5
×
 

ESI Supplemental Webinar 2 - DataONE presentation slides

on

  • 1,254 views

Presented by William Michener on 11-15-2012

Presented by William Michener on 11-15-2012

Statistics

Views

Total Views
1,254
Views on SlideShare
1,006
Embed Views
248

Actions

Likes
2
Downloads
12
Comments
0

5 Embeds 248

http://biiiogeek.blogspot.mx 224
http://biiiogeek.blogspot.com 15
http://biiiogeek.blogspot.com.es 7
http://biiiogeek.blogspot.ru 1
http://biiiogeek.blogspot.com.ar 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides Presentation Transcript

    • DuraSpace/ARL/DLF E-Science InstituteDataONE: Tools and Approaches for Supporting the Data Life Cycle Supplemental Webinar Thursday, November 15, 2012 1:00-2:30 pm EDT 1 1
    • DataONE: Tools and Approaches for  Supporting the Data Life Cycle Presented by William Michener, University of New MexicoProfessor and Director of e‐Science Initiatives for  University Libraries DuraSpace/ARL/DLF E‐Science Institute 2
    • 3
    • Three Key Challenges Plan Analyze CollectI v on a nn to i Integrate Assure Discover Describe Preserve 4
    • 1. Data Preservation and Planning✔ DuraSpace/ARL/DLF E‐Science Institute ? 5
    • The Long Tail of Orphan Data “Most of the bytes are at the high end, Specialized repositories but most of the (e.g. GenBank, PDB) datasets are at theVolume low end” – Jim Gray Orphan data (B. Heidorn) Rank frequency of datatype DuraSpace/ARL/DLF E‐Science Institute 6
    • Planning ? Metadata standard? Data repository?DuraSpace/ARL/DLF E‐Science Institute 7
    • DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutionsnetwork • serve local community • provide resources for  managing their data • retain copies of data 8
    • DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata  • provide resources for  catalog  managing their data • indexing for search • retain copies of data • network‐wide services • ensure content  availability (preservation)   • replication services 9
    • DataONE and the DMPTool Support Data PreservationThree major components for a  Member Nodesflexible, scalable, sustainable  • diverse institutions Coordinating Nodesnetwork • serve local community • retain complete metadata  Investigator Toolkit • provide resources for  catalog  managing their data • indexing for search • retain copies of data • network‐wide services • ensure content  availability (preservation)   • replication services 10
    • Dryad (>3,000 data products)Coordinatedsubmission of articlesand underlying dataHandshaking withspecializedrepositoriesPromotion of reuseand incentives fordeposit DuraSpace/ARL/DLF E‐Science Institute 11
    • Knowledge Network for Biocomplexity  (20,000+ data packages) Data Types • Ecological • Environmental • Demographic • Social/Legal/EconomicContributors 60• Individual investigators 45 Data• Field stations and networks 30 Sizes• Government agencies % 15• Non‐profit partnerships 0 10‐200 >200 < 1 1‐10• Synthesis centers MB 12
    • ✔Check for best practices ✔Create metadata ✔Connect to ONEShare Data & Metadata (EML) 13
    • 14
    • 15
    • DuraSpace/ARL/DLF E‐Science Institute 16
    • DuraSpace/ARL/DLF E‐Science Institute 17
    • 18
    • DuraSpace/ARL/DLF E‐Science Institute 19
    • 20
    • 21
    • 22
    • 23
    • DuraSpace/ARL/DLF E‐Science Institute 24
    • DuraSpace/ARL/DLF E‐Science Institute 25
    • 2. Data Discovery 26
    • Data Silos 27
    • The DataONE Federation 28
    • Member Node Functional Tiers• Tier 1: Read only, public content ping(), getLogRecords(), getCapabilities(),get(), getSystemMetadata(), getChecksum(),listObjects(), synchronizationFailed()• Tier 2: Read only, with access control isAuthorized(), setAccessPolicy()• Tier 3: Read/Write using client tools create(), update(), delete()• Tier 4: Able to operate as a replication target –replicate(),getReplica()• http://mule1.dataone.org/ArchitectureDocs‐current/apis/MN_APIs.html DuraSpace/ARL/DLF E‐Science Institute 29
    • ORNL DAAC  as a DataONE Member Node  NASA collectors DAAC Users   (UWG) Investigator Toolkit DataONE Users 30
    • DuraSpace/ARL/DLF E‐Science Institute 31
    • 32
    • DuraSpace/ARL/DLF E‐Science Institute 33
    • 34
    • DuraSpace/ARL/DLF E‐Science Institute 35
    • 3. InnovationThe Fourth Paradigm:1. Observational and  experimental 2. Theoretical research 3. Computer simulations of  natural phenomena4. Data‐intensive research • new tools, techniques,  and ways of working 36 36
    • “Data Intensive Science” and the “80:20 Rule” Increasing Process Knowledge Decreasing Spatial Coverage Intensive science sites and experiments Extensive science sites Volunteer &  education networks Remote sensing Adapted from CENR‐OSTP 37
    • Investigator Toolkit Support  Plan DMP-Tool Analyze CollectKepler Integrate Assure Discover Describe Preserve 38
    • Exploration, Visualization, and Analysis Diverse bird observations and  Model results environmental data from  300,00 locations in the US  Occurrence of Indigo Bunting (2008) integrated and analyzed using  High Performance Computing  ResourcesLand Cover Jan Ap Jun Sep Dec rMeteorology • Examine patterns of  migration MODIS – Spatio‐Temporal Exploratory  • Infer how climate Remote  Model identifies factors  change may affect sensing data affecting patterns of  bird migration migration 39
    • Scientific workflows DuraSpace/ARL/DLF E‐Science Institute 40
    • Workflows Evolution with VisTrails DuraSpace/ARL/DLF E‐Science Institute 41
    • Collaboration environments 42
    • Taverna, MyExperiment DuraSpace/ARL/DLF E‐Science Institute 43
    • Community Engagement 44
    • User AssessmentsScientists: BL Scientists: FU Library Policies: BL Library Policies: FU Librarians: BL Librarians: FU Policy Makers: BL Policy Makers: FU Educators: BL Educators: FU Year 1 Year 2 Year 3 Year 4 Year 5 DuraSpace/ARL/DLF E‐Science Institute 45
    • Results• “More than half of the respondents (56%)  reported that they did not use any metadata  standard and about 22% of respondents  indicated they used their own lab metadata  standard.”• Less than 6% of scientists are making “All” of  their data available via some mechanism. DuraSpace/ARL/DLF E‐Science Institute 46
    • Community Engagement DuraSpace/ARL/DLF E‐Science Institute 47
    • Best Practices and Software Tools 48
    • Best Practices and Software Tools 49
    • June 3-21, 2013University of New Mexico 50
    • DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation  51
    • Recommendations• 9 areas where you can help researchers DuraSpace/ARL/DLF E‐Science Institute 52
    • 1. Plan ‐ https://dmp.cdlib.org 53
    • 2. Collect and assure the data               http://www.dataone.org/best‐practices 54
    • 3.  Describe and document the data http://metavist2.codeplex.com/ http://knb.ecoinformatics.org/morphoportal.jsp 55
    • 4. Select a repository for the datahttp://databib.org/http://www.dataone.org/best-practiceshttp://www.opendoar.org/ 56
    • 5. Preserve the datahttp://daac.ornl.gov/PI/BestPractices-2010.pdf 57
    • 6. Use the data http://www.nutnet.umn.edu/ 58
    • 7. Budget for it – 10‐>25% of total budget 59
    • 8. Communicate (early and often)Meetings, web portals, newsletters,phone and video conferences 60
    • 9. Train (in‐person and/or virtually) 61
    • DataONE.org DuraSpace/ARL/DLF E‐Science Institute 62
    • DataONE Team and Sponsors • Amber Budden, Roger Dahl, Rebecca Koskela,  Bill  • Ewa Deelman Michener, Robert Nahf, Skye Roseboom, Mark  Servilla • Deborah McGuinness • Dave Vieglais  • Suzie Allard, Nick Dexter, Kimberly Douglass,  • Jeff Horsburgh Carol Tenopir, Robert Waltz, Bruce Wilson • John Cobb, Bob Cook, Ranjeet Devarakonda,  • Robert Sandusky Giri Palanismy, Line Pouchard  • Patricia Cruse, John Kunze • Bertram Ludaescher • Sky Bristol, Mike Frame, Richard Huffine, Viv • Peter Honeyman Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly • Stephanie Hampton, Chris Jones, Matt  • Cliff Duke Jones, Ben Leinfelder, Andrew Pippin • Paul Allen, Rick Bonney, Steve Kelling • Carole Goble • Ryan Scherle, Todd Vision • Donald Hobern • Randy Butler • David DeRoure LEON LEVY FOUNDATION 63
    • Questions?DuraSpace/ARL/DLF E‐Science Institute 64