Data Literacy:
Creating and
Managing
Research Data
Research Resources Forum 2014
Cunera Buys
Kelsey Rydland
Claire Stewart
Data nightmares
Tweeted in 2012 by Gail Steinhart,
Head of Research Services, Mann
Library, Cornell University
Data nightmares
Science Staff. 2011. “Challenges and Opportunities.”
Science 331 (6018) (February 11): 692 -693. doi:10.1126/science.331.6018.692.
What are data?
Data
Digital Curation Center (UK): “Data, any
information in binary digital form, is at the
centre of the Curation Lifecycle.”
Office of Management and Budget: “Research
data means the recorded factual material
commonly accepted in the scientific
community as necessary to validate research
findings”
BICEP2 (South Pole telescope) Performativity, Place, Space
Burgess and Hamming, 2011BICEP2 Collaboration, 2014
Data in the sciences, humanities
Every discipline has data!
• Spreadsheets
• Scanned books and images
• Instrument data
Managing data well from the start of the
project is critical: make a plan
What is Data?
Types of data include:
• observational data
• laboratory experimental data
• computer simulation
• textual analysis
• physical artifacts or relics
Examples of data include:
• Audio and video files
• Code or scripts
• Digital text
• Lab notebooks
• Geospatial images
• Photographs
• Rock samples
• Survey results
• Scanned documents
• Spreadsheets
• Video games
Data management is important because…..
FUNDING AGENCIES
Why do funders and broader science
community want to share and preserve
data?
Prevent Data Loss
Scientific Reproducibility
Recognition
Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the
“Publications” section to “Products” and amend terminology and instructions
accordingly. This change makes clear that products may include, but are not
limited to, publications, data sets, software, patents, and copyrights.
Journal Requirements
7. Sharing of Data, Materials, and Software
Publication is conditional upon the agreement of the authors to make freely available any materials and
information described in their publication that may be reasonably requested by others.
Data Availability
PLOS journals require authors to make all data underlying the findings described in their manuscript fully
available without restriction, with rare exception1.
When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance
with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as
part of the final article.
Refusal to share data and related metadata and methods in accordance with this policy will be grounds for
rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining
data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we
reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to
retract the publication.
Deposit on publication of article
• Some Journal publishers require or recommend that supporting data for articles
be made publicly available.
• The Joint Data Archiving Policy (JDAP) requires data sharing in a public archive as
a condition of publication.
– Journals that have adopted JDAP include: Science, Nature and Genetics
• The author is usually responsible for making data available in repository/ archive.
• Check data archiving policies of journals before submitting articles.
Why share data? Why make it open?
• Clearly documents and provides evidence for research in conjunction with
published results.
• Meet copyright and ethical compliance (i.e. HIPAA).
• Increases the impact of research through data citation.
• Preserves data for long-term access and prevents loss of data.
• Describes and shares data with others to further new discoveries and research.
• Prevent duplication of research.
• Accelerates the pace of research.
• Promotes reproducibility of research.
Start with a plan…
Common Data
Lifecycle Stages
From: Fary, Michael and Owen, Kim, Developing an
Institutional Research Data Management Plan
Service, Educause ACTI white paper, January 2013,
http://net.educause.edu/ir/library/pdf/ACTI1301.pdf
Data management
planning
Points to address in your
DMP
• Types of data to be produced.
• Standards or descriptions that would be used with the data (metadata).
• How these data will be accessed and shared.
• Policies and provisions for data sharing and reuse.
• Provisions for archiving and preservation.
flickr.com/photos/inl/5097547405
Thoughts on naming stuff…
• File naming
• Versioning
• Directory structures
• Metadata
Why should you care?
• Find your files easier
• Creates uniformity
• Allows for sorting
• Understand what is “under the hood”
• Allows for versioning
Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
File naming – Part I
• Create names that allow for useful sorting
YES: 20130909_RogersParkAnalysis
NO: Kelseys Rogers Park Files
• Keep names short and easy to read
YES: 2014_RogersParkStudy
NO: Rogers Park Demographic Analysis of..
• Use camel case
YES: 2014_RogersParkAnalysis
NO: 2014 Rogers Park Analysis
Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
File naming – Part II
• Avoid spaces, symbols, abbreviations
• OK to use underscores _ and hyphens –
• DATES! Use them!
– Enhances sorting
– Should be: YEAR_MONTH_DAY (19791203 or
1979_12_03)
• File name as version control
– (e.g. KelseyPartyPolicy_rev2013_02_20.docx)
Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
Some thoughts on directories…
• Folders should be major functions/activities
• Subfolders by year
• Make folder names explanatory
• Avoid personal names
• Avoid duplication
• Simple and simplistic
Source: http://bentley.umich.edu/dchome/resources/filenaming.php
Don’t lose your…
…DATA
• Store at least 3 versions
– USB, someplace else and someplace else (e.g.
USB, personal computer, Northwestern Box)
– box.northwestern.edu – 30 gb of FREE storage
Northwestern Box Demo!
Do you have a repository?
• Project repository?
• Funder repository?
• Open data?
• Who knows?!
See DataLib (http://databib.org/)
Metadata
• Metadata (metacontent) is defined as the data
providing information about one or more aspects
of the data, such as:
– Means of creation of the data
– Purpose of the data
– Time and date of creation
– Creator or author of the data
– Location on a computer network where the data were
created
– Standards used
• Data about data...
“Data about data?”
Metadata
• Data about data
– Information that describes the data
• Two types:
– Structural metadata
– Descriptive metadata
• Ability to explain to somebody that knows
nothing about your research
Metadata according to ICPSR…
• A number of elements should be included in metadata, including, but not
limited to:
• Principal investigator
• Funding sources
• Data collector/producer
• Project description
• Sample and sampling procedures
• Weighting
• Substantive, temporal, and geographic coverage of the data collection
• Data source(s)
• Unit(s) of analysis/observation
• Variables
• Technical information on files
• Data collection instruments
RESOURCES
• Northwestern University Library Data Management Web Page:
http://www.library.northwestern.edu/dmp
• DMPTool: https://dmp.org/
• Northwestern University's Research Data: Ownership, Retention and Access Policy:
http://www.research.northwestern.edu/policies/documents/research_data.pdf
• Northwestern University Library's Center for Scholarly Communication & Digital
Curation: http://www.library.northwestern.edu/services/faculty-graduate-
students/scholarly-communication
Contact information
Data Management Support
• Cunera Buys, e-science librarian:
c-buys@northwestern.edu
• Kelsey Rydland, GIS/Data Analyst:
kelsey.rydland@northwestern.edu
• Claire Stewart, Head Digital Collections &
Scholarly Communication Services:
claire-stewart@northwestern.edu

Data Literacy: Creating and Managing Reserach Data

  • 1.
    Data Literacy: Creating and Managing ResearchData Research Resources Forum 2014 Cunera Buys Kelsey Rydland Claire Stewart
  • 2.
    Data nightmares Tweeted in2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University
  • 3.
  • 4.
    Science Staff. 2011.“Challenges and Opportunities.” Science 331 (6018) (February 11): 692 -693. doi:10.1126/science.331.6018.692.
  • 5.
  • 6.
    Data Digital Curation Center(UK): “Data, any information in binary digital form, is at the centre of the Curation Lifecycle.” Office of Management and Budget: “Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
  • 7.
    BICEP2 (South Poletelescope) Performativity, Place, Space Burgess and Hamming, 2011BICEP2 Collaboration, 2014 Data in the sciences, humanities
  • 8.
    Every discipline hasdata! • Spreadsheets • Scanned books and images • Instrument data Managing data well from the start of the project is critical: make a plan
  • 9.
    What is Data? Typesof data include: • observational data • laboratory experimental data • computer simulation • textual analysis • physical artifacts or relics Examples of data include: • Audio and video files • Code or scripts • Digital text • Lab notebooks • Geospatial images • Photographs • Rock samples • Survey results • Scanned documents • Spreadsheets • Video games
  • 10.
    Data management isimportant because…..
  • 11.
  • 12.
    Why do fundersand broader science community want to share and preserve data?
  • 13.
  • 14.
  • 16.
    Recognition Chapter II.C.2.f(i)(c), BiographicalSketch(es), has been revised to rename the “Publications” section to “Products” and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.
  • 17.
    Journal Requirements 7. Sharingof Data, Materials, and Software Publication is conditional upon the agreement of the authors to make freely available any materials and information described in their publication that may be reasonably requested by others. Data Availability PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception1. When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article. Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.
  • 18.
    Deposit on publicationof article • Some Journal publishers require or recommend that supporting data for articles be made publicly available. • The Joint Data Archiving Policy (JDAP) requires data sharing in a public archive as a condition of publication. – Journals that have adopted JDAP include: Science, Nature and Genetics • The author is usually responsible for making data available in repository/ archive. • Check data archiving policies of journals before submitting articles.
  • 19.
    Why share data?Why make it open? • Clearly documents and provides evidence for research in conjunction with published results. • Meet copyright and ethical compliance (i.e. HIPAA). • Increases the impact of research through data citation. • Preserves data for long-term access and prevents loss of data. • Describes and shares data with others to further new discoveries and research. • Prevent duplication of research. • Accelerates the pace of research. • Promotes reproducibility of research.
  • 20.
    Start with aplan…
  • 21.
    Common Data Lifecycle Stages From:Fary, Michael and Owen, Kim, Developing an Institutional Research Data Management Plan Service, Educause ACTI white paper, January 2013, http://net.educause.edu/ir/library/pdf/ACTI1301.pdf Data management planning
  • 22.
    Points to addressin your DMP • Types of data to be produced. • Standards or descriptions that would be used with the data (metadata). • How these data will be accessed and shared. • Policies and provisions for data sharing and reuse. • Provisions for archiving and preservation. flickr.com/photos/inl/5097547405
  • 24.
    Thoughts on namingstuff… • File naming • Versioning • Directory structures • Metadata
  • 25.
    Why should youcare? • Find your files easier • Creates uniformity • Allows for sorting • Understand what is “under the hood” • Allows for versioning Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  • 26.
    File naming –Part I • Create names that allow for useful sorting YES: 20130909_RogersParkAnalysis NO: Kelseys Rogers Park Files • Keep names short and easy to read YES: 2014_RogersParkStudy NO: Rogers Park Demographic Analysis of.. • Use camel case YES: 2014_RogersParkAnalysis NO: 2014 Rogers Park Analysis Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  • 27.
    File naming –Part II • Avoid spaces, symbols, abbreviations • OK to use underscores _ and hyphens – • DATES! Use them! – Enhances sorting – Should be: YEAR_MONTH_DAY (19791203 or 1979_12_03) • File name as version control – (e.g. KelseyPartyPolicy_rev2013_02_20.docx) Source: http://library.stanford.edu/spc/university-archives/managing-university-records/file-naming-guidelines
  • 28.
    Some thoughts ondirectories… • Folders should be major functions/activities • Subfolders by year • Make folder names explanatory • Avoid personal names • Avoid duplication • Simple and simplistic Source: http://bentley.umich.edu/dchome/resources/filenaming.php
  • 29.
    Don’t lose your… …DATA •Store at least 3 versions – USB, someplace else and someplace else (e.g. USB, personal computer, Northwestern Box) – box.northwestern.edu – 30 gb of FREE storage
  • 30.
  • 31.
    Do you havea repository? • Project repository? • Funder repository? • Open data? • Who knows?! See DataLib (http://databib.org/)
  • 32.
    Metadata • Metadata (metacontent)is defined as the data providing information about one or more aspects of the data, such as: – Means of creation of the data – Purpose of the data – Time and date of creation – Creator or author of the data – Location on a computer network where the data were created – Standards used • Data about data...
  • 33.
  • 34.
    Metadata • Data aboutdata – Information that describes the data • Two types: – Structural metadata – Descriptive metadata • Ability to explain to somebody that knows nothing about your research
  • 35.
    Metadata according toICPSR… • A number of elements should be included in metadata, including, but not limited to: • Principal investigator • Funding sources • Data collector/producer • Project description • Sample and sampling procedures • Weighting • Substantive, temporal, and geographic coverage of the data collection • Data source(s) • Unit(s) of analysis/observation • Variables • Technical information on files • Data collection instruments
  • 36.
    RESOURCES • Northwestern UniversityLibrary Data Management Web Page: http://www.library.northwestern.edu/dmp • DMPTool: https://dmp.org/ • Northwestern University's Research Data: Ownership, Retention and Access Policy: http://www.research.northwestern.edu/policies/documents/research_data.pdf • Northwestern University Library's Center for Scholarly Communication & Digital Curation: http://www.library.northwestern.edu/services/faculty-graduate- students/scholarly-communication
  • 37.
    Contact information Data ManagementSupport • Cunera Buys, e-science librarian: c-buys@northwestern.edu • Kelsey Rydland, GIS/Data Analyst: kelsey.rydland@northwestern.edu • Claire Stewart, Head Digital Collections & Scholarly Communication Services: claire-stewart@northwestern.edu