Your SlideShare is downloading. ×
Data Stewardship for SPATIAL/IsoCamp 2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Stewardship for SPATIAL/IsoCamp 2014

551

Published on

Overview of open science and best practices for data management for 2014 SPATIAL and IsoCamp, University of Utah. 13 June 2014.

Overview of open science and best practices for data management for 2014 SPATIAL and IsoCamp, University of Utah. 13 June 2014.

Published in: Science, Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
551
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Stewardship Carly Strasser California Digital Library carlystrasser@gmail.com SPATIAL / IsoCamp June 2014 Tips & Tools
  • 2. FromFlickrvialibrarianinsta.tumblr.com I am not a librarian. But I do work at a library.
  • 3. Enable data sharing Encourage new incentives Think about code sharing Work with libraries, publishers and researchers Explore new tools to help change system Build tools
  • 4. Why are you here? Science: you’re (probably) doing it wrong
  • 5. Back in the day… Da Vinci Curie Newton classicalschool.blogspot.com Darwin
  • 6. Research has changed Better
  • 7. From wikimedia Such Internet! So many tools! From Flickr by John Jobby So much data!
  • 8. Research has changed Worse
  • 9. Digital data FromFlickrbyFlickmor FromFlickrbyUSArmyEnvironmentalCommand FromFlickrbyDW0825 C. Strasser CourteseyofWHOI FromFlickrbydeltaMike
  • 10. Digital data + Complex workflows
  • 11. Scientists are bad at data management.
  • 12. An embarrassing example… From Flickr by lincolnblues
  • 13. ?
  • 14. From Flickr by ransomtech Didn’t share the data Didn’t document the data (metadata) Didn’t document provenance/workflow
  • 15. From Flickr by ransomtech Reproducibility Transparency Reuse NO
  • 16. From Flickr by johntrainor Why should I care?
  • 17. Because reproducibility* is one of the fundamental tenets of science. *reproducibility: being able to go from data to figures/results not reproducibility: independently verifiable via following same techniques.
  • 18. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible.
  • 19. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible. Because Fox News, creationism, and the war on science.
  • 20. “Help us identify grants that are wasteful or that you don’t think are a good use of taxpayer dollars.” ! Rep. Adrian Smith (R-Nebraska), a member of the House Committee on Science and Technology
  • 21. Because reproducibility is one of the fundamental tenets of science. Because we need to be credible. Because Fox News, creationism, and the war on science Because it means faster progress.
  • 22. Because you are a good person.
  • 23. From Flickr by Redden-McAllister From Flickr by Ken Cowell From Flickr Brandi Jordan
  • 24. Open Science Making data research dissemination available to all
  • 25. flowingdata.com Map of Scientific Collaborations
  • 26. Because you have to.
  • 27. Journals Institutions Funders From Flickr by Eva Rinaldi Celebrity and Live Music Photographer
  • 28. … “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.” Feb 2013
  • 29. 1.  Maximize free public access 2.  Ensure researchers create data management plans 3.  Allow costs for data preservation and access in proposal budgets 4.  Ensure evaluation of data management plan merits 5.  Ensure researchers comply with their data management plans 6.  Promote data deposition into public repositories 7.  Develop approaches for identification and attribution of datasets 8.  Educate folks about data stewardship From Flickr by Joe Crimmings Photography
  • 30. From  Flickr  by  Michael  Tinkler  
  • 31. data management FromFlickrbyBigSwedeGuy Best Practices
  • 32. From Flickr by Mark Sardella Plan before data collection
  • 33. •  Create a key (data dictionary) •  Make sure names are unique •  Define codes FromFlickrbyzebbie Planning Design sample naming scheme
  • 34. PhDcomics.com Planning Design file naming scheme
  • 35. Use descriptive file names •  Unique •  Reflect contents From  R  Cook,  ESA  Best  Practices  Workshop  2010   Bad: Mydata.xls 2001_data.csv best version.txt Better: Eaffinis_nanaimo_2010_counts.xls Site name Year What was measured Study organism *Not for everyone * Planning Design file naming scheme
  • 36. Biodiversity Lake Experiments Field work Grassland Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv … From S. Hampton Planning Design file organization Consider… •  Dependencies? •  File formats? •  Time of collection? •  Order of analysis?
  • 37. Planning Constrain entries Atomize Break down spreadsheets Design your spreadsheet
  • 38. A relational database is A set of tables Relationships among the tables A language to specify & query the tables A RDB provides Scalability: millions+ records Features for sub-setting, querying, sorting Reduced redundancy & entry errors From Mark Schildhauer Planning Consider a database
  • 39. You should invest time in learning databases if your data sets are large or complex Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old Planning From Mark Schildhauer Consider a database
  • 40. Store your data in a repository Institutional archive Discipline/specialty archive Pick a data repository From Flickr by torkildr Ask a librarian Repos of repos: databib.org re3data.org Planning
  • 41. FromFlickrbysepasynod From Flickr by taberandrew From Flickr by withassociates What software? What hardware? What personnel? How often? Set up reminders! Test system Decide on preservation/backup Planning
  • 42. …document that describes what you will do with your data throughout the research project From Flickr by Barbies Land Write a data management plan! Planning
  • 43. DMP components But they all have different requirements and express them in different ways •  What will be collected •  Methods •  Standards •  Metadata •  Sharing/access •  Long-term storage Planning From Flickr by Barbies Land
  • 44. Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community dmptool.org Planning
  • 45. During Data Collection & Entry From Flickr by Julia Manzerova
  • 46. Realistically: •  Archive .csv version of raw data •  Make a “raw” tab in working data file •  Do all work on other tabs During collection Keep raw data raw
  • 47. Raw data as .csv R script for processing & analysis During collection Ideally: •  Use scripts to process data •  Save them with data Keep raw data raw
  • 48. During collection Document your workflow Temperature data Salinity data Data import into Excel Analysis: mean, SD Graph production Quality control & data cleaning “Clean” T & S data Summary statistics Data in spread- sheet Workflow: how you get from the raw data to the final products of your research Simple workflow: flow chart
  • 49. During collection Workflow: how you get from the raw data to the final products of your research Simple workflow: commented script •  R, SAS, MATLAB… •  Well-documented code is Easier to review Easier to share Easier to use for repeat analysis # % $ & Document your workflow
  • 50. Fancy schmancy workflows Resulting output https://kepler-project.org During collection Document your workflow
  • 51. Workflows enable •  Reproducibility •  Transparency •  Reuse From Flickr by merlinprincesse During collection Document your workflow
  • 52. Constrain data entries •  Excel lists •  Data validation •  Google docs forms Modified from K. Vanderbilt During collection
  • 53. Atomize During collection One piece of information per cell
  • 54. Create parameter table From doi:10.3334/ORNLDAAC/777 From doi:10.3334/ORNLDAAC/777 From R Cook, ESA Best Practices Workshop 2010 During collection Break down spreadsheets Fake a relational database Create a site table
  • 55. Why are you promoting Excel? During collection Create metadata
  • 56. Metadata: data reporting WHO created the data? WHAT is the content of the data set? WHEN was it created? WHERE was it collected? HOW was it developed? WHY was it developed? FromFlickrby//ichaelPatric|{ During collection Create metadata
  • 57. Digital context •  Name of the data set •  The name(s) of the data file(s) in the data set •  Date the data set was last modified •  Example data file records for each data type file •  Pertinent companion files •  List of related or ancillary data sets •  Software (including version number) used to prepare/read the data set •  Data processing that was performed Personnel & stakeholders •  Who collected •  Who to contact with questions •  Funders Scientific context •  Scientific reason why the data were collected •  What data were collected •  What instruments (including model & serial number) were used •  Environmental conditions during collection •  Temporal & spatial resolution •  Standards or calibrations used Information about parameters •  How each was measured or produced •  Units of measure •  Format used in the data set •  Precision & accuracy if known Information about data •  Definitions of codes used •  Quality assurance & control measures •  Known problems that limit data use (e.g. uncertainty, sampling problems) During collection Create metadata
  • 58. •  Provide structure to describe data Common terms | definitions | language | structure •  Come in many flavors EML , FGDC, ISO19115, DarwinCore,… •  Can be met using software tools Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM) What is metadata? Metadata standards… During collection Standard < Create metadata
  • 59. Back up daily During collection From Flickr by lippo From Flickr by see phar Original Near Far
  • 60. During collection From Flickr by Barbies Land Remember that data management plan? Revisit Review Revise
  • 61. During collection Schedule a time each week or month Revisit Review Revise From Flickr by purplemattfish
  • 62. From  Flickr  by  celikins   Where to start?
  • 63. From Flickr by Andy Graulund Make a resolution • Triage on current projects • Get advisor, lab mates, collaborators on board • Do better next time
  • 64. Start working online From  Flickr  by  karindalziel  
  • 65. http://datapub.cdlib.org Reproducibility, E-notebooks, Online science
  • 66. Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community dmptool.orgWrite a DMP
  • 67. databib.org Where should I put my data? Find a repository
  • 68. Get help FromFlickrbythewmatt
  • 69. FromFlickrbyNorthCarolinaDigital HeritageCenter From Flickr by Madison Guy Get help from your library
  • 70. Learn new skills software carpentry www.software-carpentry.org
  • 71. From Flickr by Micah Taylor Other Fun Stuff
  • 72. Altmetrics? Impact Factors + Citation Counts Credit in academia…
  • 73. Altmetrics Article-level metrics Altmetrics for alt-products Data Code Slides Blogs Downloads Tweets Mentions Views From Flickr by Skakerman
  • 74. Altmetrics Article-level metrics Altmetrics for alt-products
  • 75. Researcher  Identification  
  • 76. BIG initiatives…
  • 77. NSF funded DataNet Project Office of Cyberinfrastructure www.dataone.org
  • 78. New partners…
  • 79. Better methods…
  • 80. Better methods…
  • 81. Science is changing. Embrace it.
  • 82. From Flickr by dotpolka Manage & share your data!
  • 83. Website Email Twitter Slides carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser

×