Data Literacy For the Arctic and Below:        Help your data help you           (and satisfy NSF requirements in the proc...
Where we are going today: • Why care about data management? • “What the heck is metadata?” and other jargon          Data ...
Why Care – Big Picture          Photo from: http://www.mediafuturist.com/2010/11/gues-ddb-blog-future-of-marketing-media-d...
Why Care – Big Picture These days, Dr. Hodes said, “the old model in which researchers jealously guarded their data is no ...
Why Care – Your work              http://www.phdcomics.com/comics/archive.php?comicid=382         You are a Data Manager
Data Management is Important! Because……  … Reproducibility is the foundation of science  … Journals are starting to requir...
Data Management is Important! Because……  … Reproducibility is the foundation of science  … Journals are starting to requir...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Data Types         What types of data do        you collect or generate?
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Data Stages  Raw  Organized  Standardized      Transformed  Processed      Quality Controlled  Analyzed  Summarized  Prese...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Data Storage  http://chronicle.texterity.com/chronicle/20110318a?pg=16#pg16
Data Storage               Tips:                 - 1 working copy on your computer                 - 1 copy on infrastruct...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Versioning  Tips:    - communicate with your lab/research group and agree on      a versioning system (file names, what ma...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
File naming conventions – Discuss
File naming conventions – Better example Make names unique! Include (as appropriate):   - Project name or acronym   - Stud...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Metadata           “Data about Data”                 But what does that MEAN?!
Metadata – The bottom line What would someone* unfamiliar with your data (and possibly your research) need in order to fin...
Metadata – Example          Temperature             31.5
Metadata – Example           Temperature              31.5             For what purpose?                                  ...
Metadata  Just like file names, metadata  does it’s job best when it is:    - consistent    - documented    - for people  ...
Metadata Standards – Examples  Local (people -> people)     Naming Conventions     Standard Operating Procedures  Beyond (...
Metadata Standards – ExampleScripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Onlin...
Metadata Standards – Example (XML)      <attributeName>Sea Surface Temperature</attributeName>      <attributeDefinition>t...
Metadata Standards – Example (XML)      <attributeName>Sea Surface Temperature</attributeName>      <attributeDefinition>t...
Metadata – Standards They exist! If everyone used them, you could do very cool     science! Compliance is often a lot of w...
Metadata – Yikes and/or Yay!  Tips for the short-term:   - Get help!          - support@aoncadis.org, librarians, standard...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Sharing and Access                      Levels:                   Low    - Not sharing your data (note: appropriate in a f...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Archiving              Terminology Fuzziness in the data world:             Archival = Preservation (close enough)        ...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
Data Management Plans (DMPs)        NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data,...
Data Management Plans (DMPs)        NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data,...
Data Management Plans (DMPs)  Tips for the short-term:    - Check your Directorate/Agency policy before every      proposa...
Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon       Data Types       Dat...
CADIS - Data Support for NSF-Arctic Program  • Cooperative Arctic Data and Information System  • The Mandate:    – Develop...
Transition to Advanced CADIS                             NSF Arctic•   A new mandate            Field Sites    – For all N...
Transition to Advanced CADIS (ACADIS)  • A new mandate    – For all NSF/ARC programs that collect Arctic data    – Serve N...
ACADIS – Metadata and Standards Metadata Profile Supports established standards Based on IPY-DIS profile. Compatible with ...
ACADIS Data Management Plan Template                  Example guidance from the ACADIS DMP template:• Assists  investigato...
Beyond ACADIS - Other Resources    • IBIS         Local: NREL    • CSU Digital Repository              Local: CSU    •   K...
Beyond ACADIS – Other Resources General Info and help -    Earth Science Information Partners (ESIP): http://wiki.esipfed....
Questions?                                                             Contact me:    Lynn.yarmey@nsidc.org               ...
Upcoming SlideShare
Loading in...5
×

CSU-ACADIS_dataManagement101-20120217

335

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
335
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • How about physical samples?Read your DMP guidelines carefully!
  • NSF OPP meeting – open to including programmer time in budget requests to help with this kind of work
  • CADIS was funded initially in 2007, and this is the 3rd AMS IIPS presentation I’ve given on it. For the Arctic Observing Network Mostly field observationsServe NSF-funded AON investigators by archiving AON dataNot so much the wider communityAssumptions starting outThe AON data portal would support full integration of a diverse collection - scientists could archive their data AND find all data relevant to a location or processInformatics and cyberinfrastructure would play a large role Implications were that …all data have browse imagery and complete documentation; …time series or fields can be plotted online;…and all metadata are in a relational database
  • For all NSF programs that collect Arctic dataOffice of Polar Programs (OPP) Division of Arctic Sciences (ARC)AON, Arctic System Sciences (ARCSS), Arctic Natural Sciences (ANS) and the Arctic Social Sciences Program (ASSP) within OPP/ARCServe NSF-funded Arctic investigators by archiving data from many field programsStill little or no remote sensing dataEmphasis is still on serving those contributing data first, but will begin to shift to making the ACADIS portal more useful for those who need to use the data held or cataloged by ACADIS
  • Transcript of "CSU-ACADIS_dataManagement101-20120217"

    1. 1. Data Literacy For the Arctic and Below: Help your data help you (and satisfy NSF requirements in the process!) Lynn Yarmey and Liz Schlagel – National Snow and Ice Data Center
    2. 2. Where we are going today: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond) EVERYTHING we talk about will be able to go into your Data Management Plan (DMP)
    3. 3. Why Care – Big Picture Photo from: http://www.mediafuturist.com/2010/11/gues-ddb-blog-future-of-marketing-media-data-is-new-oil.html
    4. 4. Why Care – Big Picture These days, Dr. Hodes said, “the old model in which researchers jealously guarded their data is no longer applicable.” http://www.nytimes.com/2011/04/04/health/04alzheimer.html Image courtesy of:http://www.sciencemag.org/content/331/6018.cover-expansion
    5. 5. Why Care – Your work http://www.phdcomics.com/comics/archive.php?comicid=382 You are a Data Manager
    6. 6. Data Management is Important! Because…… … Reproducibility is the foundation of science … Journals are starting to require data deposit … You want to get credit for producing data (data citations) … Others can use and build on your work (data reuse)
    7. 7. Data Management is Important! Because…… … Reproducibility is the foundation of science … Journals are starting to require data deposit … You want to get credit for producing it (data citations) … Others can use and build on your work (data reuse) … Your new instruments collect a LOT more data than older ones … Recreating a figure from a 2006 paper shouldn’t be painful … Funders tell us so (See NSF, NIH, NOAA, etc) … Students graduate!
    8. 8. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    9. 9. Data Types What types of data do you collect or generate?
    10. 10. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    11. 11. Data Stages Raw Organized Standardized Transformed Processed Quality Controlled Analyzed Summarized Presented/Published Photo courtesy of Zillow Database gurus: http://www.zillow.com/blog/2007-11-02/we-know-how-to-celebrate-halloween/
    12. 12. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    13. 13. Data Storage http://chronicle.texterity.com/chronicle/20110318a?pg=16#pg16
    14. 14. Data Storage Tips: - 1 working copy on your computer - 1 copy on infrastructure near you - 1 copy on infrastructure far away - ‘Final’ copy with a data center/archive - Get help! (CSS, CSU Libraries, etc.) (Note: These won’t work well in all cases, ex. For Very Large Data, but are a good start for coming up with a storage plan)
    15. 15. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    16. 16. Versioning Tips: - communicate with your lab/research group and agree on a versioning system (file names, what makes a new version) - WRITE IT DOWN and post/save to a shared space.
    17. 17. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    18. 18. File naming conventions – Discuss
    19. 19. File naming conventions – Better example Make names unique! Include (as appropriate): - Project name or acronym - Study title - Location - Data type - Researcher initials - Date - Data stage - Version number - File type DO – Use_underscores-or-dashes DO NOT – Use spaces &/or special characters! For more info - https://www.dataone.org/content/assign-descriptive-file-names
    20. 20. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    21. 21. Metadata “Data about Data” But what does that MEAN?!
    22. 22. Metadata – The bottom line What would someone* unfamiliar with your data (and possibly your research) need in order to find, evaluate, understand, and reuse them? *How about someone: - who works in your lab? - from a different lab in your field? - who is in a related interdisciplinary field? - who researches a completely different area? - who works for a newspaper? Congress?
    23. 23. Metadata – Example Temperature 31.5
    24. 24. Metadata – Example Temperature 31.5 For what purpose? Instrument precision/accuracy? When was the sensor last cleaned/calibrated? AKA – T, Temp, degC, C, oF… lots of different names!
    25. 25. Metadata Just like file names, metadata does it’s job best when it is: - consistent - documented - for people - such that computers are happy Enter Metadata Standards
    26. 26. Metadata Standards – Examples Local (people -> people) Naming Conventions Standard Operating Procedures Beyond (people -> computers -> people) ISO 19115 (http://www.fgdc.gov/metadata/geospatial-metadata-standards#nap) GCMD DIF (http://gcmd.nasa.gov/User/difguide/difman.html) EML (http://knb.ecoinformatics.org/software/eml/)
    27. 27. Metadata Standards – ExampleScripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
    28. 28. Metadata Standards – Example (XML) <attributeName>Sea Surface Temperature</attributeName> <attributeDefinition>temperature measurement</attributeDefinition> <measurementScale> <unit>celsius</unit> <numericDomain><numberType>real</numberType></numericDomain> </measurementScale> <missingValueCode><code>-99</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-99999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <methods><description> subject { seaSurface } </description> <description> calculationType { calculated }; calculationTypeDetail { average }; calculationInterval { day }; </description></methods>Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
    29. 29. Metadata Standards – Example (XML) <attributeName>Sea Surface Temperature</attributeName> <attributeDefinition>temperature measurement</attributeDefinition> <measurementScale> <unit>celsius</unit> <numericDomain><numberType>real</numberType></numericDomain> </measurementScale> <missingValueCode><code>-99</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <missingValueCode><code>-99999</code> <codeExplanation>missing value</codeExplanation> </missingValueCode> <methods><description> subject { seaSurface } </description> <description> calculationType { calculated }; calculationTypeDetail { average }; calculationInterval { day }; </description></methods>Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institutionof Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
    30. 30. Metadata – Standards They exist! If everyone used them, you could do very cool science! Compliance is often a lot of work There are lots HOWEVER, there are baby steps to get started
    31. 31. Metadata – Yikes and/or Yay! Tips for the short-term: - Get help! - support@aoncadis.org, librarians, standards groups, data centers, domain communities, tools - Get your own house in order - use common date formats, codes, smart file names - WRITE EVERYTHING DOWN! (keep good readme files) - Put in the time early on to implement a standard - most have minimum compliance levels with options to get more detailed - Stay flexible Tips for the long-term: - Get help! - Watch for Best Practices and standards in your field
    32. 32. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    33. 33. Sharing and Access Levels: Low - Not sharing your data (note: appropriate in a few cases) - Emailing your data to a researcher who asks for it - Posting your data on your project or lab websiteFunder Happiness - Posting your data AND METADATA on your website - Submitting your metadata to an online catalog (ex. ACADIS) - Submitting your data and metadata to an appropriate repository and getting a permanent ID (DOI, EZID, etc) - Data Repositories (ex. ACADIS, GenBank, Dryad) - CSU Digital Repository High
    34. 34. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    35. 35. Archiving Terminology Fuzziness in the data world: Archival = Preservation (close enough) Archival ≠ Storage! Tips for the short-term: - Leave yourself time at the end of a project to clean up - Choose open source formats when you can (ex. CSV > XLS) Tips for the long-term: - Work with NREL data experts: IBIS team, LTER, Computer Systems Support
    36. 36. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    37. 37. Data Management Plans (DMPs) NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data, samples, physical collections, software, curriculum materials, andother materials to be produced in the course of the project;2. the standards to be used for data and metadata format and content (where existingstandards are absent or deemed inadequate, this should be documented along with anyproposed solutions or remedies);3. policies for access and sharing including provisions for appropriate protection ofprivacy, confidentiality, security, intellectual property, or other rights or requirements;4. policies and provisions for re-use, re-distribution, and the production of derivatives;and5. plans for archiving data, samples, and other research products, and for preservation ofaccess to them. From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
    38. 38. Data Management Plans (DMPs) NSF Data Management Plan - General Requirement (as of 2011-10-10)1. the types of data, samples, physical collections, software, curriculum materials, andother materials to be produced in the course of the project;2. the standards to be used for data and metadata format and content (where existingstandards are absent or deemed inadequate, this should be documented along with anyproposed solutions or remedies);3. policies for access and sharing including provisions for appropriate protection ofprivacy, confidentiality, security, intellectual property, or other rights or requirements;4. policies and provisions for re-use, re-distribution, and the production of derivatives;and5. plans for archiving data, samples, and other research products, and for preservation ofaccess to them. From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
    39. 39. Data Management Plans (DMPs) Tips for the short-term: - Check your Directorate/Agency policy before every proposal - Keep it real(istic), you will need to include your actions in your project report and next proposal. Tips for the long-term: - Keep working on implementing metadata standards - Watch out for emerging trends, repositories, tools - Partner with data people (data centers, libraries, etc)
    40. 40. Where we are: • Why care about data management? • “What the heck is metadata?” and other jargon Data Types Data Stages Storage Versioning Naming Conventions Metadata and Standards Data Sharing and Access Archiving and Preservation • Pulling this all together – Data Management Plans • From the lab to ACADIS (and beyond)
    41. 41. CADIS - Data Support for NSF-Arctic Program • Cooperative Arctic Data and Information System • The Mandate: – Develop advanced data management system for the Arctic Observing Network (AON) – Preserve metadata and data – Serve NSF-funded AON investigators
    42. 42. Transition to Advanced CADIS NSF Arctic• A new mandate Field Sites – For all NSF programs that collect Arctic data – Serve NSF-funded Arctic investigators by archiving data from many field programs• Other changes: – An advisory group – Value-added products – Two full time Data Curators – New data types – biological, social, terrestrial, ecological
    43. 43. Transition to Advanced CADIS (ACADIS) • A new mandate – For all NSF/ARC programs that collect Arctic data – Serve NSF-funded Arctic investigators by archiving data from field programs and individual investigators • Other changes: – An advisory group – Value-added products – Two full time Data Curators – New data types – biological, social, terrestrial, ecological – Expanded metadata tool for diverse disciplines
    44. 44. ACADIS – Metadata and Standards Metadata Profile Supports established standards Based on IPY-DIS profile. Compatible with GCMD, FGDC, ISO… Profile driven interface validates fields NASA GCMD vocabulary used where possible
    45. 45. ACADIS Data Management Plan Template Example guidance from the ACADIS DMP template:• Assists investigators in developing the DMP now required for all NSF proposals• Linked from aoncadis.org
    46. 46. Beyond ACADIS - Other Resources • IBIS Local: NREL • CSU Digital Repository Local: CSU • Knowledge Network for Biocomplexity • ESA Ecological Archives Remote: • DAAC at ORNL Centralized and Domain Specific • Advanced Cooperative • Arctic Data and Information Service • Data Conservancy • DataONE Federated and distributed
    47. 47. Beyond ACADIS – Other Resources General Info and help - Earth Science Information Partners (ESIP): http://wiki.esipfed.org/ UVA Libraries: http://www2.lib.virginia.edu/brown/data/ Data Management Plan and other tools – DMP Tool: https://dmp.cdlib.org/ DataOne: https://www.dataone.org/cattools/Data%20and%20Metadata%20Management Metadata - Excel Plug-in tool (in development): http://www.cdlib.org/cdlinfo/2011/09/01/facilitating-data-management-dcxl/ Lists of Standards (not complete!) for bio, climate, ecology, oceanography - http://marinemetadata.org/conventions Stanford-based portal for medical/bio - http://bioportal.bioontology.org/resources
    48. 48. Questions? Contact me: Lynn.yarmey@nsidc.org For questions, help, or to submit Arctic data: support@aoncadis.org Visit ACADIS: www.aoncadis.orgSpecial thanks for pilfered slides and content approaches: Florence Fetterer, Carly Strasser, and Dorothea Salo

    ×