It’s 2015.
Do You Know
Where Your Data
Are?
Professional
Development Seminar
Demography 590
Penn State University
22 October 2015
This presentation is licensed CC BY 4.0.
Patricia Hswe | University Libraries
Co-department Head, Publishing and Curation Services
Digital Content Strategist and Head, ScholarSphere User Services
http://www.libraries.psu.edu/psul/pubcur.html
phswe@psu.edu | 867-3702
Data accountability—or
lack thereof—keeps
making the news.
This is . . .
data?
I’m confused by Brian Moore via Flickr CC BY-SA
1108845-
godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_12453844
35_super by Patty Marvel via Flickr CC BY-NC-ND
What we’ll talk about
• What’s the future of
your data?
• Tips, tools, resources
for managing data
• DMPs – What are they?
• Discussion: questions,
comments, concerns?
WHAT’S THE FUTURE OF YOUR
DATA?
“The Availability of Research Data Declines Rapidly with Article Age.”
(Title of a 2014 article by Vines et al.)
“The major cause of the
reduced data availability
for older papers was the
rapid increase in the
proportion of data sets
reported as either lost
or on inaccessible
storage media.”
Forty years of removable storage by
David Smith via Flickr CC BY
“The odds that we
were able to find an
apparently working e-
mail address (either in
the paper or by
searching online) for
any of the contacted
authors did decrease
by about 7% per
year.”
e-mail symbol by Micky Aldridge via Flickr CC BY
“Unfortunately, many of these missing data sets
could be retrieved only with considerable effort by
the authors, and others are completely lost to
science.”
• The implications are apparent.
• What can researchers begin doing
differently?
MANAGE YOUR RESEARCH DATA
NOW
Be proactive!
NIH Data Sharing Policy
(required for proposed projects > $500K)
• When will you make the data available?
• What file formats will you use for your data, and why?
• What transformations will be necessary to prepare
data for preservation/data sharing?
• What metadata/documentation will be submitted
alongside the data?
• Will a data-sharing agreement will be required? What
will the agreement state?
• What are your plans for providing access to your data?
• Which archive/repository/central database have you
identified as a place to deposit data?
Quick tips and best practices
• Lifecycle mindset for
research and data
• File-naming
conventions
• Standards for
description
• File formats
• Storage
Tool library by takomabibelot
via Flickr CC BY
From DataONE Best Practices
https://www.dataone.org/best-practices
Reflect on the “during” & end
of research data at the beginning
File-naming conventions
• Consistency
– Patterns
• Descriptiveness
– Keywords
– “Aboutness” / content
• Versions
– Which versions need to
be saved, tracked?
• Major components (will
depend on type of
research)
– Project name
– Content of the file
– Date
– Version number
– Location
– Instrument name /
number
1108845-
godzilla_facepalm_godzilla_facepalm_face_palm_epic_fail_demotivational_poster_12453844
35_super - NOT A USEFUL FILE NAME!
Data description for access/use
• What standards does your
discipline use to describe
information?
– Darwin Core
– DDI (Data Documentation
– Initiative)
• README.TXT
• Consult librarians to assist
with describing/documenting
Old Standard Fireworks
Poster by Epic Fireworks
via Flickr CC BY
File formats –
be intentional about them
• Open rather than proprietary
–Interoperable, usable across platforms
• What’s commonly used in your
community / discipline?
• Formats for use vs. formats for archiving
–PNG or JPG vs. TIFF
–Word vs. PDF
Storage – spread / repeat / copy
• Distribution and redundancy
– Keep the same files in more than one place
– Local options: internal (computer, laptop) hard drive;
external hard drive; college/department servers
– Campus enterprise services: Box, Tivoli Storage
Manager, High Performance Computing (may cost)
– Cloud services: Dropbox, Box, Spideroak, Amazon Web
Services
• At least 3 copies
• Have master files from which copies get made
DATA MANAGEMENT PLANS
What funding agencies expect
NIH Data Sharing Policy
(required for proposed projects > $500K)
• When will you make the data available?
• What file formats will you use for your data, and why?
• What transformations will be necessary to prepare
data for preservation/data sharing?
• What metadata/documentation will be submitted
alongside the data?
• Will a data-sharing agreement will be required? What
will the agreement state?
• What are your plans for providing access to your data?
• Which archive/repository/central database have you
identified as a place to deposit data?
Each funding agency, seemingly its
own DMP requirements
But commonalities exist:
• Expected data?
• Data retention?
• Data formats?
• Dissemination of data?
• Data preservation?
• Access to data?
• Whose responsibility in
the project?
Snowflake-017 by yellowcloud via
Flickr CC BY
Restricted data and DMPs
• Security measures to protect data?
• How will data be anonymized? Deidentified?
• Consent forms? Will possibility of sharing be
addressed in consent forms?
• Policy for sharing parts of the data?
Conditions of use?
• Embargoes?
• Where will data be kept? For how long?
Restricted data guidance
• “Restricted Use Data Management at ICPSR”
• “Managing sensitive research data” – U.
Bristol, U.K.
• Review what our institution states in Research
Administration Guidelines / Policies.
• Evaluate for sensitivity.
• Comply, if relevant – e.g., HIPAA, FERPA.
• Enable restricted use / access, if possible.
DEMOS OF
TOOLS/RESOURCES/SERVICES
Tools / Resources / Services
• Training
– MANTRA: http://datalib.edina.ac.uk/mantra/
– Penn State’s DMP Tutorial: https://www.e-
education.psu.edu/dmpt/
• Resources
– DMPTool: https://dmp.cdlib.org/
– re3data - data repository index: http://www.re3data.org/
– PSU resources: Penn State boilerplate language andPenn
State DMP local guidance
• Services
– ScholarSphere: https://scholarsphere.psu.edu/
• Sandbox environment: https://scholarsphere-demo.dlt.psu.edu/
– Libraries also consult, teach, review DMPs
Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman,
Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth,
Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta
Siemiginowska, Aleksandra Slavkovic. 2014.
“Ten Simple Rules for the Care and Feeding of Scientific Data.”
PLoS Comput Biol 10 (4): e1003542. doi:10.1371/journal.pcbi.1003542.
A few of the rules
• Practice science with
certain level of reuse in
mind
• Publish workflow as
context
• Link your data to your
publications
• Publish your code
• Say how you want to be
credited for your data
• Foster and use data
repositories as much as
possible.
Reuse by GotCredit via Flickr CC BY
So,
plan
for
the
future
of
your
data.
Questions? Comments? Feedback? Words of wisdom?
Keep in touch: Patricia Hswe | phswe@psu.edu
futuresoonbykruppviaFlickr

Demography pro sem

  • 1.
    It’s 2015. Do YouKnow Where Your Data Are? Professional Development Seminar Demography 590 Penn State University 22 October 2015 This presentation is licensed CC BY 4.0.
  • 2.
    Patricia Hswe |University Libraries Co-department Head, Publishing and Curation Services Digital Content Strategist and Head, ScholarSphere User Services http://www.libraries.psu.edu/psul/pubcur.html phswe@psu.edu | 867-3702
  • 4.
  • 5.
    This is .. . data? I’m confused by Brian Moore via Flickr CC BY-SA
  • 6.
  • 7.
    What we’ll talkabout • What’s the future of your data? • Tips, tools, resources for managing data • DMPs – What are they? • Discussion: questions, comments, concerns?
  • 8.
    WHAT’S THE FUTUREOF YOUR DATA? “The Availability of Research Data Declines Rapidly with Article Age.” (Title of a 2014 article by Vines et al.)
  • 9.
    “The major causeof the reduced data availability for older papers was the rapid increase in the proportion of data sets reported as either lost or on inaccessible storage media.” Forty years of removable storage by David Smith via Flickr CC BY
  • 10.
    “The odds thatwe were able to find an apparently working e- mail address (either in the paper or by searching online) for any of the contacted authors did decrease by about 7% per year.” e-mail symbol by Micky Aldridge via Flickr CC BY
  • 11.
    “Unfortunately, many ofthese missing data sets could be retrieved only with considerable effort by the authors, and others are completely lost to science.” • The implications are apparent. • What can researchers begin doing differently?
  • 12.
    MANAGE YOUR RESEARCHDATA NOW Be proactive!
  • 13.
    NIH Data SharingPolicy (required for proposed projects > $500K) • When will you make the data available? • What file formats will you use for your data, and why? • What transformations will be necessary to prepare data for preservation/data sharing? • What metadata/documentation will be submitted alongside the data? • Will a data-sharing agreement will be required? What will the agreement state? • What are your plans for providing access to your data? • Which archive/repository/central database have you identified as a place to deposit data?
  • 14.
    Quick tips andbest practices • Lifecycle mindset for research and data • File-naming conventions • Standards for description • File formats • Storage Tool library by takomabibelot via Flickr CC BY
  • 15.
    From DataONE BestPractices https://www.dataone.org/best-practices Reflect on the “during” & end of research data at the beginning
  • 16.
    File-naming conventions • Consistency –Patterns • Descriptiveness – Keywords – “Aboutness” / content • Versions – Which versions need to be saved, tracked? • Major components (will depend on type of research) – Project name – Content of the file – Date – Version number – Location – Instrument name / number
  • 17.
  • 18.
    Data description foraccess/use • What standards does your discipline use to describe information? – Darwin Core – DDI (Data Documentation – Initiative) • README.TXT • Consult librarians to assist with describing/documenting Old Standard Fireworks Poster by Epic Fireworks via Flickr CC BY
  • 19.
    File formats – beintentional about them • Open rather than proprietary –Interoperable, usable across platforms • What’s commonly used in your community / discipline? • Formats for use vs. formats for archiving –PNG or JPG vs. TIFF –Word vs. PDF
  • 20.
    Storage – spread/ repeat / copy • Distribution and redundancy – Keep the same files in more than one place – Local options: internal (computer, laptop) hard drive; external hard drive; college/department servers – Campus enterprise services: Box, Tivoli Storage Manager, High Performance Computing (may cost) – Cloud services: Dropbox, Box, Spideroak, Amazon Web Services • At least 3 copies • Have master files from which copies get made
  • 21.
    DATA MANAGEMENT PLANS Whatfunding agencies expect
  • 22.
    NIH Data SharingPolicy (required for proposed projects > $500K) • When will you make the data available? • What file formats will you use for your data, and why? • What transformations will be necessary to prepare data for preservation/data sharing? • What metadata/documentation will be submitted alongside the data? • Will a data-sharing agreement will be required? What will the agreement state? • What are your plans for providing access to your data? • Which archive/repository/central database have you identified as a place to deposit data?
  • 23.
    Each funding agency,seemingly its own DMP requirements But commonalities exist: • Expected data? • Data retention? • Data formats? • Dissemination of data? • Data preservation? • Access to data? • Whose responsibility in the project? Snowflake-017 by yellowcloud via Flickr CC BY
  • 24.
    Restricted data andDMPs • Security measures to protect data? • How will data be anonymized? Deidentified? • Consent forms? Will possibility of sharing be addressed in consent forms? • Policy for sharing parts of the data? Conditions of use? • Embargoes? • Where will data be kept? For how long?
  • 25.
    Restricted data guidance •“Restricted Use Data Management at ICPSR” • “Managing sensitive research data” – U. Bristol, U.K. • Review what our institution states in Research Administration Guidelines / Policies. • Evaluate for sensitivity. • Comply, if relevant – e.g., HIPAA, FERPA. • Enable restricted use / access, if possible.
  • 26.
  • 27.
    Tools / Resources/ Services • Training – MANTRA: http://datalib.edina.ac.uk/mantra/ – Penn State’s DMP Tutorial: https://www.e- education.psu.edu/dmpt/ • Resources – DMPTool: https://dmp.cdlib.org/ – re3data - data repository index: http://www.re3data.org/ – PSU resources: Penn State boilerplate language andPenn State DMP local guidance • Services – ScholarSphere: https://scholarsphere.psu.edu/ • Sandbox environment: https://scholarsphere-demo.dlt.psu.edu/ – Libraries also consult, teach, review DMPs
  • 28.
    Goodman, Alyssa, AlbertoPepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta Siemiginowska, Aleksandra Slavkovic. 2014. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Comput Biol 10 (4): e1003542. doi:10.1371/journal.pcbi.1003542.
  • 29.
    A few ofthe rules • Practice science with certain level of reuse in mind • Publish workflow as context • Link your data to your publications • Publish your code • Say how you want to be credited for your data • Foster and use data repositories as much as possible. Reuse by GotCredit via Flickr CC BY
  • 30.
    So, plan for the future of your data. Questions? Comments? Feedback?Words of wisdom? Keep in touch: Patricia Hswe | phswe@psu.edu futuresoonbykruppviaFlickr

Editor's Notes

  • #12 The authors of the article were able to obtain only 19.5% of the data sets they requested – and only 11% for articles published before 2000.
  • #19 What does your discipline use to describe information? Biology uses Darwin Core Ecology has Ecological Metadata Language Social sciences has DDI (Data Documentation Initiative) Consult with librarians for help with standards for describing and documenting data. README.TXT – or some file providing guidance - M E T A D A T A - Get used to seeing this term!
  • #24 Expected data: be able to describe the data you’ll be collecting Data retention – how long?