Graduate                                                                                Office                            ...
CONTEXT: DATA LIFECYCLESource: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Access...
OVERVIEWPlanningDescribing the dataHandling data filesStorage & backup
PLAN AHEAD: BEYOND THE PROTOCOL              Plan early, before data collection Identify ethical and legal issues Define...
ETHICAL & LEGAL ISSUES Privacy   Are there people (human subjects) involved in your project? Animals?   Does the study ...
DESCRIBING YOUR DATA   Describe   the research project   Describe   overall organization of your dataset   Describe   y...
HANDLING DATA FILES Create, manage, and document your data storage system   Use descriptive file names   Define       ...
STORAGE & BACKUP Backup your data: regular intervals, 3 copies   Local   Semi-local   Remote Document your backup str...
PROCESSING & ANALYSIS Defining your research questions and documenting your data are  iterative processes   Inform each ...
RESOURCES @IU     IUWare     IUanyWARE     StatMath     ITTraining     RFS & SDA Open access/public use data sets  ...
THANK YOUFind us at http://ulib.iupui.edu/digitalscholarshipHeather Coates, MLS, MSDigital Scholarship & Data Management L...
Upcoming SlideShare
Loading in …5
×

Good data practices for graduate students

508 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
508
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Be aware of the research process, so you have some context for your experience. This can also help you organize your thoughts about executing/carrying out your projects.
  • Goal: help you translate your research protocol into a practical plan to carry out your project/studyAlthough these things do take some extra time at the beginning of your project, it will make analysis and writing much, much easier because you will be clear about what was done.
  • -data model: map out relationships between data, especially aggregated or calculated variables; translate research questions into analyses, then map to data to be used; can be particularly important if you are integrating data from multiple sources or have large quantitative datasets-data organization strategy: it should be part of the planning process and answer where, when, how?  will talk more about this in the next slide-software: IUWare, IUanyWare, StatMath, RFS, SDA (links on handout)-ethical & legal issues: confidentiality, privacy, HIPAA, intellectual property, and copyright issues may arise; discuss these potential problems with your advisor; links for further information on handout)
  • -although facts cannot be copyrighted, specific instances of them (such as a database) can be
  • -research project: one option is to write a structured abstract (see handout)-dataset organization: use your plan and update it as things change (more on the next slide)-describeyour data files: what do you need to know to interpret the data? parameters, units, define coded values, define missing values-methods: -standards: don’t deviate from standards in your discipline or research community, unless you have a good reason for doing so; these standards reflect a common understanding and help to make data interoperable-citation: if you use someone else’s data, you should document and cite it: source, URL/DOI, detailed title of dataset, version information, date retrieved, authors/creators, brief description-timeframe: particularly if you’re using data from multiple sources or collecting data over a period of time, this needs to be documented clearly
  • -data typing: use appropriate field for data: date field for dates; comments included in a separate column-document your folder structure & file naming system -don’t rely on the computer’s time and date metadata; it’s not reliable and can be manipulated -keep file names short but descriptive; use a coding system to include project name, file contents, date, etc.-QA & data integrity: minimize opportunity to introduce human error, automate processing, check and verify periodically-version control & authenticity: especially important if multiple people are working on the same dataset; keep copies of your data before/after each major processing step; save you lots of work if errors creep in; you won’t have to start all over from the raw data; document how this is done
  • -backup strategy: quick and dirty way is to check and verify file quantity, file size, and randomly check values in original and copies-if you need to share or transfer files, use Slashtmp instead of a flash drive; especially if the data involve human subjects data
  • Good data practices for graduate students

    1. 1. Graduate Office Student Success Series GOOD DATA PRACTICES FOR RESEARCH January 12, 2012Heather Coates, MLS, MS | Digital Scholarship & Data Management Librarian
    2. 2. CONTEXT: DATA LIFECYCLESource: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.<http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf>.
    3. 3. OVERVIEWPlanningDescribing the dataHandling data filesStorage & backup
    4. 4. PLAN AHEAD: BEYOND THE PROTOCOL Plan early, before data collection Identify ethical and legal issues Define the data model Think about a data organization strategy Identify the most appropriate tools: instruments & software
    5. 5. ETHICAL & LEGAL ISSUES Privacy  Are there people (human subjects) involved in your project? Animals?  Does the study involve personal or health information? Can it be used to identify an individual? Copyright  Are you using copyrighted data?  Have you sought permission? Intellectual Property  You should cite any product that you use for your project: data, publications, software, etc.
    6. 6. DESCRIBING YOUR DATA Describe the research project Describe overall organization of your dataset Describe your data files Describe the methods used to create your data  Describe measurement techniques (protocols, instruments)  Data processing – why, how, assumptions  Sensor network, taxonomic information, spatial location Choose & use standard terminology (concepts, methods, tools)  Identify and use relevant metadata standards Data citation Describe the timeframe
    7. 7. HANDLING DATA FILES Create, manage, and document your data storage system  Use descriptive file names  Define  Formats for date and time  Units of measurement  Parameters  Missing code values  Values that are estimated  Use consistent codes  Use appropriate field delimiters  Store data values separately from data annotations or notes  Store data at the right level of precision Quality assurance & data integrity Version control & authenticity
    8. 8. STORAGE & BACKUP Backup your data: regular intervals, 3 copies  Local  Semi-local  Remote Document your backup strategy Make sure backup locations are secure and accessible Use standard file formats  Non-proprietary, open format  Commonly used in your community  Unencrypted*  Uncompressed*
    9. 9. PROCESSING & ANALYSIS Defining your research questions and documenting your data are iterative processes  Inform each other  Are never done, until the project is complete  Developing good documentation will make analysis easier and more efficient Having good documentation will make writing your paper/thesis/dissertation much easier  Use your readme or codebook files as source documents for your methods sections Having good documentation will identify problems sooner, when it may be possible to resolve them or minimize the damage to your data
    10. 10. RESOURCES @IU  IUWare  IUanyWARE  StatMath  ITTraining  RFS & SDA Open access/public use data sets  DataCite  ICPSR  Data.gov Subject liaison librarians can assist in locating data on your topic
    11. 11. THANK YOUFind us at http://ulib.iupui.edu/digitalscholarshipHeather Coates, MLS, MSDigital Scholarship & Data Management Librarianhcoates@iupui.edu317-278-7125

    ×