BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
Demography pro sem
1. It’s 2015.
Do You Know
Where Your Data
Are?
Professional
Development Seminar
Demography 590
Penn State University
22 October 2015
This presentation is licensed CC BY 4.0.
2. Patricia Hswe | University Libraries
Co-department Head, Publishing and Curation Services
Digital Content Strategist and Head, ScholarSphere User Services
http://www.libraries.psu.edu/psul/pubcur.html
phswe@psu.edu | 867-3702
7. What we’ll talk about
• What’s the future of
your data?
• Tips, tools, resources
for managing data
• DMPs – What are they?
• Discussion: questions,
comments, concerns?
8. WHAT’S THE FUTURE OF YOUR
DATA?
“The Availability of Research Data Declines Rapidly with Article Age.”
(Title of a 2014 article by Vines et al.)
9. “The major cause of the
reduced data availability
for older papers was the
rapid increase in the
proportion of data sets
reported as either lost
or on inaccessible
storage media.”
Forty years of removable storage by
David Smith via Flickr CC BY
10. “The odds that we
were able to find an
apparently working e-
mail address (either in
the paper or by
searching online) for
any of the contacted
authors did decrease
by about 7% per
year.”
e-mail symbol by Micky Aldridge via Flickr CC BY
11. “Unfortunately, many of these missing data sets
could be retrieved only with considerable effort by
the authors, and others are completely lost to
science.”
• The implications are apparent.
• What can researchers begin doing
differently?
13. NIH Data Sharing Policy
(required for proposed projects > $500K)
• When will you make the data available?
• What file formats will you use for your data, and why?
• What transformations will be necessary to prepare
data for preservation/data sharing?
• What metadata/documentation will be submitted
alongside the data?
• Will a data-sharing agreement will be required? What
will the agreement state?
• What are your plans for providing access to your data?
• Which archive/repository/central database have you
identified as a place to deposit data?
14. Quick tips and best practices
• Lifecycle mindset for
research and data
• File-naming
conventions
• Standards for
description
• File formats
• Storage
Tool library by takomabibelot
via Flickr CC BY
15. From DataONE Best Practices
https://www.dataone.org/best-practices
Reflect on the “during” & end
of research data at the beginning
16. File-naming conventions
• Consistency
– Patterns
• Descriptiveness
– Keywords
– “Aboutness” / content
• Versions
– Which versions need to
be saved, tracked?
• Major components (will
depend on type of
research)
– Project name
– Content of the file
– Date
– Version number
– Location
– Instrument name /
number
18. Data description for access/use
• What standards does your
discipline use to describe
information?
– Darwin Core
– DDI (Data Documentation
– Initiative)
• README.TXT
• Consult librarians to assist
with describing/documenting
Old Standard Fireworks
Poster by Epic Fireworks
via Flickr CC BY
19. File formats –
be intentional about them
• Open rather than proprietary
–Interoperable, usable across platforms
• What’s commonly used in your
community / discipline?
• Formats for use vs. formats for archiving
–PNG or JPG vs. TIFF
–Word vs. PDF
20. Storage – spread / repeat / copy
• Distribution and redundancy
– Keep the same files in more than one place
– Local options: internal (computer, laptop) hard drive;
external hard drive; college/department servers
– Campus enterprise services: Box, Tivoli Storage
Manager, High Performance Computing (may cost)
– Cloud services: Dropbox, Box, Spideroak, Amazon Web
Services
• At least 3 copies
• Have master files from which copies get made
22. NIH Data Sharing Policy
(required for proposed projects > $500K)
• When will you make the data available?
• What file formats will you use for your data, and why?
• What transformations will be necessary to prepare
data for preservation/data sharing?
• What metadata/documentation will be submitted
alongside the data?
• Will a data-sharing agreement will be required? What
will the agreement state?
• What are your plans for providing access to your data?
• Which archive/repository/central database have you
identified as a place to deposit data?
23. Each funding agency, seemingly its
own DMP requirements
But commonalities exist:
• Expected data?
• Data retention?
• Data formats?
• Dissemination of data?
• Data preservation?
• Access to data?
• Whose responsibility in
the project?
Snowflake-017 by yellowcloud via
Flickr CC BY
24. Restricted data and DMPs
• Security measures to protect data?
• How will data be anonymized? Deidentified?
• Consent forms? Will possibility of sharing be
addressed in consent forms?
• Policy for sharing parts of the data?
Conditions of use?
• Embargoes?
• Where will data be kept? For how long?
25. Restricted data guidance
• “Restricted Use Data Management at ICPSR”
• “Managing sensitive research data” – U.
Bristol, U.K.
• Review what our institution states in Research
Administration Guidelines / Policies.
• Evaluate for sensitivity.
• Comply, if relevant – e.g., HIPAA, FERPA.
• Enable restricted use / access, if possible.
27. Tools / Resources / Services
• Training
– MANTRA: http://datalib.edina.ac.uk/mantra/
– Penn State’s DMP Tutorial: https://www.e-
education.psu.edu/dmpt/
• Resources
– DMPTool: https://dmp.cdlib.org/
– re3data - data repository index: http://www.re3data.org/
– PSU resources: Penn State boilerplate language andPenn
State DMP local guidance
• Services
– ScholarSphere: https://scholarsphere.psu.edu/
• Sandbox environment: https://scholarsphere-demo.dlt.psu.edu/
– Libraries also consult, teach, review DMPs
28. Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman,
Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth,
Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta
Siemiginowska, Aleksandra Slavkovic. 2014.
“Ten Simple Rules for the Care and Feeding of Scientific Data.”
PLoS Comput Biol 10 (4): e1003542. doi:10.1371/journal.pcbi.1003542.
29. A few of the rules
• Practice science with
certain level of reuse in
mind
• Publish workflow as
context
• Link your data to your
publications
• Publish your code
• Say how you want to be
credited for your data
• Foster and use data
repositories as much as
possible.
Reuse by GotCredit via Flickr CC BY
The authors of the article were able to obtain only 19.5% of the data sets they requested – and only 11% for articles published before 2000.
What does your discipline use to describe information?
Biology uses Darwin Core
Ecology has Ecological Metadata Language
Social sciences has DDI (Data Documentation Initiative)
Consult with librarians for help with standards for describing and documenting data.
README.TXT – or some file providing guidance
- M E T A D A T A -
Get used to seeing this term!
Expected data: be able to describe the data you’ll be collecting
Data retention – how long?