• Save
CSU-ACADIS_dataManagement101-20120217
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

CSU-ACADIS_dataManagement101-20120217

on

  • 444 views

 

Statistics

Views

Total Views
444
Views on SlideShare
444
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • How about physical samples?Read your DMP guidelines carefully!
  • NSF OPP meeting – open to including programmer time in budget requests to help with this kind of work
  • CADIS was funded initially in 2007, and this is the 3rd AMS IIPS presentation I’ve given on it. For the Arctic Observing Network Mostly field observationsServe NSF-funded AON investigators by archiving AON dataNot so much the wider communityAssumptions starting outThe AON data portal would support full integration of a diverse collection - scientists could archive their data AND find all data relevant to a location or processInformatics and cyberinfrastructure would play a large role Implications were that …all data have browse imagery and complete documentation; …time series or fields can be plotted online;…and all metadata are in a relational database
  • For all NSF programs that collect Arctic dataOffice of Polar Programs (OPP) Division of Arctic Sciences (ARC)AON, Arctic System Sciences (ARCSS), Arctic Natural Sciences (ANS) and the Arctic Social Sciences Program (ASSP) within OPP/ARCServe NSF-funded Arctic investigators by archiving data from many field programsStill little or no remote sensing dataEmphasis is still on serving those contributing data first, but will begin to shift to making the ACADIS portal more useful for those who need to use the data held or cataloged by ACADIS

CSU-ACADIS_dataManagement101-20120217 Presentation Transcript

  • 1. Data Literacy For the Arctic and Below: Help your data help you (and satisfy NSF requirements in the process!) Lynn Yarmey and Liz Schlagel – National Snow and Ice Data Center
  • 2. Where we are going today: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond) EVERYTHING we talk about will be able to go into your Data Management Plan (DMP)
  • 3. Why Care – Big Picture Photo from: http://www.mediafuturist.com/2010/11/gues-ddb-blog-future-of-marketing-media-data-is-new-oil.html
  • 4. Why Care – Big Picture These days, Dr. Hodes said, “the old model in which researchers jealously guarded their data is no longer applicable.” http://www.nytimes.com/2011/04/04/health/04alzheimer.html Image courtesy of:http://www.sciencemag.org/content/331/6018.cover-expansion
  • 5. Why Care – Your work http://www.phdcomics.com/comics/archive.php?comicid=382 You are a Data Manager
  • 6. Data Management is Important! Because…… … Reproducibility is the foundation of science … Journals are starting to require data deposit … You want to get credit for producing data (data citations) … Others can use and build on your work (data reuse)
  • 7. Data Management is Important! Because…… … Reproducibility is the foundation of science … Journals are starting to require data deposit … You want to get credit for producing it (data citations) … Others can use and build on your work (data reuse) … Your new instruments collect a LOT more data than older ones … Recreating a figure from a 2006 paper shouldn’t be painful … Funders tell us so (See NSF, NIH, NOAA, etc) … Students graduate!
  • 8. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 9. Data Types What types of data do you collect or generate?
  • 10. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 11. Data Stages Raw Organized Standardized Transformed Processed Quality Controlled Analyzed Summarized Presented/Published Photo courtesy of Zillow Database gurus: http://www.zillow.com/blog/2007-11-02/we-know-how-to-celebrate-halloween/
  • 12. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 13. Data Storage http://chronicle.texterity.com/chronicle/20110318a?pg=16#pg16
  • 14. Data Storage Tips: - 1 working copy on your computer - 1 copy on infrastructure near you - 1 copy on infrastructure far away - ‘Final’ copy with a data center/archive - Get help! (CSS, CSU Libraries, etc.) (Note: These won’t work well in all cases, ex. For Very Large Data, but are a good start for coming up with a storage plan)
  • 15. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 16. Versioning Tips: - communicate with your lab/research group and agree on a versioning system (file names, what makes a new version) - WRITE IT DOWN and post/save to a shared space.
  • 17. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 18. File naming conventions – Discuss
  • 19. File naming conventions – Better example Make names unique! Include (as appropriate): - Project name or acronym - Study title - Location - Data type - Researcher initials - Date - Data stage - Version number - File type DO – Use_underscores-or-dashes DO NOT – Use spaces &/or special characters! For more info - https://www.dataone.org/content/assign-descriptive-file-names
  • 20. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 21. Metadata “Data about Data” But what does that MEAN?!
  • 22. Metadata – The bottom line What would someone* unfamiliar with your data (and possibly your research) need in order to find, evaluate, understand, and reuse them? *How about someone: - who works in your lab? - from a different lab in your field? - who is in a related interdisciplinary field? - who researches a completely different area? - who works for a newspaper? Congress?
  • 23. Metadata – Example Temperature 31.5
  • 24. Metadata – Example Temperature 31.5 For what purpose? Instrument precision/accuracy? When was the sensor last cleaned/calibrated? AKA – T, Temp, degC, C, oF… lots of different names!
  • 25. Metadata Just like file names, metadata does it’s job best when it is: - consistent - documented - for people - such that computers are happy Enter Metadata Standards
  • 26. Metadata Standards – Examples Local (people -> people) Naming Conventions Standard Operating Procedures Beyond (people -> computers -> people) ISO 19115 (http://www.fgdc.gov/metadata/geospatial-metadata-standards#nap) GCMD DIF (http://gcmd.nasa.gov/User/difguide/difman.html) EML (http://knb.ecoinformatics.org/software/eml/)
  • 27. Metadata Standards – ExampleScripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
  • 28. Metadata Standards – Example (XML) <attributeName>Sea Surface Temperature</attributeName> <attributeDefinition>temperature measurement</attributeDefinition> <measurementScale> <unit>celsius</unit> <numericDomain><numberType>real</numberType></numericDomain> </measurementScale> <missingValueCode><code>-99</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-99999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <methods><description> subject { seaSurface } </description> <description> calculationType { calculated }; calculationTypeDetail { average }; calculationInterval { day }; </description></methods>Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
  • 29. Metadata Standards – Example (XML) <attributeName>Sea Surface Temperature</attributeName> <attributeDefinition>temperature measurement</attributeDefinition> <measurementScale> <unit>celsius</unit> <numericDomain><numberType>real</numberType></numericDomain> </measurementScale> <missingValueCode><code>-99</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-99999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <methods><description> subject { seaSurface } </description> <description> calculationType { calculated }; calculationTypeDetail { average }; calculationInterval { day }; </description></methods>Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
  • 30. Metadata – Standards They exist! If everyone used them, you could do very cool science! Compliance is often a lot of work There are lots HOWEVER, there are baby steps to get started
  • 31. Metadata – Yikes and/or Yay! Tips for the short-term: - Get help! - support@aoncadis.org, librarians, standards groups, data centers, domain communities, tools - Get your own house in order - use common date formats, codes, smart file names - WRITE EVERYTHING DOWN! (keep good readme files) - Put in the time early on to implement a standard - most have minimum compliance levels with options to get more detailed - Stay flexible Tips for the long-term: - Get help! - Watch for Best Practices and standards in your field
  • 32. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 33. Sharing and Access Levels: Low - Not sharing your data (note: appropriate in a few cases) - Emailing your data to a researcher who asks for it - Posting your data on your project or lab websiteFunder Happiness - Posting your data AND METADATA on your website - Submitting your metadata to an online catalog (ex. ACADIS) - Submitting your data and metadata to an appropriate repository and getting a permanent ID (DOI, EZID, etc) - Data Repositories (ex. ACADIS, GenBank, Dryad) - CSU Digital Repository High
  • 34. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 35. Archiving Terminology Fuzziness in the data world: Archival = Preservation (close enough) Archival ≠ Storage! Tips for the short-term: - Leave yourself time at the end of a project to clean up - Choose open source formats when you can (ex. CSV > XLS) Tips for the long-term: - Work with NREL data experts: IBIS team, LTER, Computer Systems Support
  • 36. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 37. Data Management Plans (DMPs) NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data, samples, physical collections, software, curriculum materials, andother materials to be produced in the course of the project;2. the standards to be used for data and metadata format and content (where existingstandards are absent or deemed inadequate, this should be documented along with anyproposed solutions or remedies);3. policies for access and sharing including provisions for appropriate protection ofprivacy, confidentiality, security, intellectual property, or other rights or requirements;4. policies and provisions for re-use, re-distribution, and the production of derivatives;and5. plans for archiving data, samples, and other research products, and for preservation ofaccess to them. From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
  • 38. Data Management Plans (DMPs) NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data, samples, physical collections, software, curriculum materials, andother materials to be produced in the course of the project;2. the standards to be used for data and metadata format and content (where existingstandards are absent or deemed inadequate, this should be documented along with anyproposed solutions or remedies);3. policies for access and sharing including provisions for appropriate protection ofprivacy, confidentiality, security, intellectual property, or other rights or requirements;4. policies and provisions for re-use, re-distribution, and the production of derivatives;and5. plans for archiving data, samples, and other research products, and for preservation ofaccess to them. From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
  • 39. Data Management Plans (DMPs) Tips for the short-term: - Check your Directorate/Agency policy before every proposal - Keep it real(istic), you will need to include your actions in your project report and next proposal. Tips for the long-term: - Keep working on implementing metadata standards - Watch out for emerging trends, repositories, tools - Partner with data people (data centers, libraries, etc)
  • 40. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
  • 41. CADIS - Data Support for NSF-Arctic Program • Cooperative Arctic Data and Information System • The Mandate: – Develop advanced data management system for the Arctic Observing Network (AON) – Preserve metadata and data – Serve NSF-funded AON investigators
  • 42. Transition to Advanced CADIS NSF Arctic• A new mandate Field Sites – For all NSF programs that collect Arctic data – Serve NSF-funded Arctic investigators by archiving data from many field programs• Other changes: – An advisory group – Value-added products – Two full time Data Curators – New data types – biological, social, terrestrial, ecological
  • 43. Transition to Advanced CADIS (ACADIS) • A new mandate – For all NSF/ARC programs that collect Arctic data – Serve NSF-funded Arctic investigators by archiving data from field programs and individual investigators • Other changes: – An advisory group – Value-added products – Two full time Data Curators – New data types – biological, social, terrestrial, ecological – Expanded metadata tool for diverse disciplines
  • 44. ACADIS – Metadata and Standards Metadata Profile Supports established standards Based on IPY-DIS profile. Compatible with GCMD, FGDC, ISO… Profile driven interface validates fields NASA GCMD vocabulary used where possible
  • 45. ACADIS Data Management Plan Template Example guidance from the ACADIS DMP template:• Assists investigators in developing the DMP now required for all NSF proposals• Linked from aoncadis.org
  • 46. Beyond ACADIS - Other Resources • IBIS Local: NREL • CSU Digital Repository Local: CSU • Knowledge Network for Biocomplexity • ESA Ecological Archives Remote: • DAAC at ORNL Centralized and Domain Specific • Advanced Cooperative • Arctic Data and Information Service • Data Conservancy • DataONE Federated and distributed
  • 47. Beyond ACADIS – Other Resources General Info and help - Earth Science Information Partners (ESIP): http://wiki.esipfed.org/ UVA Libraries: http://www2.lib.virginia.edu/brown/data/ Data Management Plan and other tools – DMP Tool: https://dmp.cdlib.org/ DataOne: https://www.dataone.org/cattools/Data%20and%20Metadata%20Management Metadata - Excel Plug-in tool (in development): http://www.cdlib.org/cdlinfo/2011/09/01/facilitating-data-management-dcxl/ Lists of Standards (not complete!) for bio, climate, ecology, oceanography - http://marinemetadata.org/conventions Stanford-based portal for medical/bio - http://bioportal.bioontology.org/resources
  • 48. Questions? Contact me: Lynn.yarmey@nsidc.org For questions, help, or to submit Arctic data: support@aoncadis.org Visit ACADIS: www.aoncadis.orgSpecial thanks for pilfered slides and content approaches: Florence Fetterer, Carly Strasser, and Dorothea Salo