Data Management for Education Research


Published on

PowerPoint for a data management panel in the Urban Schooling division of UCLA's education department led by Libbie Stephenson and Rebekah Cummings.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduce selves
  • This is the intro slide – LibbieWelcome everyone and thank for attending. These days most of the research you conduct and the kind of materials you will gather for research will be in digital format. Whether you conduct surveys, conduct focus groups, videotape, analyze transcribed interviews, or create spreadsheets with data points from a variety of sources, most of the time you will be doing so in a digital environment. What we hope for today is that we can give you a brief introduction and perhaps provide you with ways to do research that you can use now and in your future careers.And as some of you already know, as researchers it will be important for you to be able to use your data whenever you want to, or share it with others, or publish it in some form. And many funding agencies urge or even require you to do this. Today we will talk about how you can best organize and manage your data, and we will let you know about the tools and support available to help you do itBut some kinds of data have identifiers that make it difficult to share or re-use. Today we will provide you with some information on the steps you can take to protect privacy and confidentiality. And finally we are here for you to ask us questions and, we hope, get answers that help you move forward with your work.
  • I thought I would show you an interactive version of the data life cycle. You can use it when you plan your research and we will be going through these sections in our talk today. And then we will begin with Rebekah discussing data management plans.Go to demo at UKDA
  • One of the reasons that we are here today is to learn how to create a data management plan. By a show of hands, how many of you know how to create a data management plan? You guys are researchers. You collect data, you analyze it, you write about it, hopefully you publish the results of that analysis. Where does a data management plan fit into all that? In this part of the presentation we’re going to talk about what a data management plan is, why you need to know how to write one, and the tools that are available to help you do that.
  • First what is a data management plan? “A document that describes what you will do with your data during your research and after you complete your research.” Basically, it is a document that describes the lifecycle of your data. An important thing to remember though is that a data management plan is a live document that is never finished. You should review your plan regularly throughout your project and make adjustments when necessary.
  • So, what is in a data management plan? What do the granting agencies want to see?We’ll talk more about these individual elements throughout the presentation, but some of the questions you need to think about when writing a data management include (read slide)  These are important questions for a number of reasons and if you don’t consider them at the beginning of your project, it may be too late to go back and fix it later.
  • -Funding agencies requirements- The most compelling reason is that several funding agencies, including the National Science Foundation, are now requiring data management plans to be included as a part of your research proposal. This is not a daunting requirement; it is a two-page explanation of the lifecycle of your data and we are here to help youJournal requirements- The second reason is that some journals are now requiring data along with submission of your publication, the most notable example being the journal Nature. -Essential skill for researchers- Good data management ensures the integrity and reproducibility of your research results. If someone accuses you three years after publication of falsifying your research, what are you going to show them to prove you did your research honestly and accurately? Good data management and preservation will allow you to show them exactly how you moved through your research, what steps you took, what tools you used, and what decisions you made. It allows for reproducibility which is the gold standard of science. Additionally, good data management protects you from data loss and enhances your data security. In a way, we should be grateful for these new requirements because it encourages us to do better, more transparent research and perhaps even saves you time and money in the long run.
  • Data collection is usually the most expensive part of research. Funding agencies are hoping to maximize their investments by making data available for reanalysis and secondary use. Unfortunately, current science practices make it difficult, if not impossible to reuse data. Data gets lost, computers crash, researchers don’t document their data so that others can use their data or replicate findings. The goal of these data management plans, is to manage data so that it can be shared and ultimately reused for future research. When you share your data and other people can use it that means you get credit for your data through Data citation. New metrics being developed for impact factors. NSF Bio Sketch has recently changes in Include “Research Products” not just “Research Publications.” Studies have already shown in astronomy and physics that when data is released with a publication, the publication is cited more often. It can be trusted.
  • All of these tools are free and available to UCLA graduate students. DMP Tool from the California Digital Library. Will give step by step instruction and guidance in writing a DMP. Shows examples of data management plans. Useful for very general information.ICPSR- trusted resource for data management and may be an option for a digital repository. Not only will they work with you in depositing your data, but their website a great resource for learning more about data management plans. The UCLA Social Science Data Archive. That’s us! Libbie and I are in Rolfe Hall and we are a free and personal resource for the UCLA community. Libbie can help direct you to the best formats, the best repositories, best practices. Feel free to use us when you need to put a DMP together. It could be the element of your proposal that helps you stand out. Next, Libbie is going to walk you through best practices for your data lifecycle and the best ways to manage your research data.
  • So, as you probably know from other aspects of your life, the more you plan up front and the better organized you are, the easier things go. This is true in research as well.Notice that these questions are similar to those you should address in a data management plan. Remind: talk to the Archive from the very beginning of the project and when preparing the data management plan. Archive can advise on what steps to take and what resources there are for help, whether it is in finding software tools, organizing data, deciding how to manage the data during the project and after, and what you would need to do to be sure the data can be preserved for the long term.As we go through these, and as you think of your own projects, consider these questions:Could the files be useful as a long-term resource?Will the files need to be accessed at a later date?Do the files have any significant value (intellectual or financial)?If the files have little or no value at present, could this change in time?
  • So now, let’s get specific: We can’t cover all the how-to’s today but we wanted to give you some specifics on managing data for some key file formats many of you will use, and links to resources. And remember that we are here to help you as you go along with your project.Each kind of data you produce has particular requirements for you to keep in mind when you are first collecting your data, when you or organizing your data during a project, and if you do this, you will have materials that will be much easier to preserve for the long term. So, we’ll cover features of video, audio, qualitative and quantitative files.
  • So let’s talk about video … here are some key items to prepare at the beginning of your project about collecting your data and describing what you will do with it once you are finished. You will need these pieces of information when you prepare a DMP. There are video archives at some research universities and some social science archives also handle video materials. The best place I know of for managing education research videos, such as video of classroom settings, teachers, children’s activities, etc. is Data Research and Development Center at University of Chicago. They have published a fantastic guide that is also on their site.
  • You should plan to get in the habit of documenting all aspects of your data collection and organization. The term for this kind of information is “Metadata”. So one of the most important aspects of managing you video files is to ensure you have kept track of the details listed here – or metadata. You will need to describe these in your data management plan. You can use a spreadsheet to write down this kind of detail.If you can, use care in choosing equipment and record in formats that can be easily maintained for the long term. Any control you can assert over the process will be important. The JISC guides are a big help and provide lots of detail. schemas:Gateway to Educational Materials (GEMs) instructional topics hierarchy (,Sharing: resource list of archvies at the Open Video Project times a researcher will choose to share data via a website they have designed. This is fine as long as the researcher can make a long term commitment to keep the website up. Some decide to store files on a departmental server. This approach will not suffice in a DMP. You have to be able to specify a specific preservation approach and a place where the files will be maintained over the long term.Youtube is not an archival long term preservation site – prefer Vimeo practice is to make several copies and keep in more than one place.
  • Now let’s turn to audio. This format for collecting audio materials, would most likely be recorded interviews, oral histories, recordings of focus groups, discussion, etc. As with video, you need a project overview and it should contain the following details, for each audio file that is part of a project. free open source audio management tool - if just recording interviews can usually go with cheapest/simplest tools such as Adobe audition 3.0 In case they ask: The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application. Lossy methods are most often used for compressing sound, images or videos.
  • These are the kinds of information that you need to document for your recordings; this is so that an archive or a future user with whom you share the data, can understand what you did and how to use the material. ALA document on audio with video, the metadata you keep needs to be kept for each recording and you can do this in a spreadsheet or use text processing tool. Internet Archive, Library of Congress have archives of audio materials; may also be available at research institutions.
  • Qualitative data is somewhat difficult to characterize and the standards for collecting an organizing vary depending on the project. Here are some examples, taken from the ICPSR and UKDA websites on the types of data considered “qualitative”. Two recognized archives for qualitative data are Qualidata in the UK and the Henry Murray Center in the US. Audio and even video files can be transcribed and analyzed using qualitative software tools.Qualidata Murray Center when you are preparing a data management plan, you can specify the archive into which you will deposit your data. Both Qualidata and the HMC are recognized by funding agencies as employing best practices for long term data management.
  • I am mentioning these because you need to work with a tool that is robust enough to handle your data and also output in formats that are archivable. There are several techniques used to analyze – try to save out your data in non-software dependent format – all of these packages will output a migratable format for preservation including: raw data, coding tree, coded data, and associated memos and notes to be saved.We are interested to know if people would like to have workshops in how to use these tools.
  • ESDS Qualidata is working to encourage the development of data documentation standards using XML. The Data Exchange Tools and Conversion Utilities (DExT) project proposed an XML schema, QuDEx, to represent annotated and complex multimedia data.This is just an example of the kinds of information you need to record, and much of this will be in your audio or video files already. Each transcript needs to be well described and documented separately from the transcript itself. Again, use consistent file naming conventions.
  • So quantitative data is usually thought of as survey data, but it can also refer to spreadsheet or other numerically coded material, for example, administrative records. There are a number of ways that surveys are carried out and lots of people like to use free or nearly free tools for web surveys. One key consideration is that these free tools rarely let you output datasets in a format that you can use for preservation; this is often only available from the paid versions. If you think you will do this kind of work frequently it is worth investing in the paid version of these tools.In preparing your data management plan, you should consider the way you will be collecting the data.
  • These are the recommended points to address when you are managing your survey data. Ask how many people use surveys as a data gathering method? Survey data or numerical statistical files need a huge amount of documentation if you plan to share. And in your data management plans you need to address each of these areas. As Rebekah said at the beginning of the session, ICPSR is the best place to go for help in developing a data management plan for survey data. (Bring up web site again if needed)
  • This will be my section on Data Protection including Confidentiality and Intellectual Property RightsDMPs are written prior to IRB review. This gives you a chance to think about how you will treat the confidentiality of your subjects, how you will obtain informed consent, how you will protect their identity after data collection and publication. Some repositories, such as the UCLA SSDA and ICPSR offer dark archives as well open access. Make sure that the repository that you choose supports restricted access if you need it. Some archives will put the identified data in a dark archive and make a de-identified version of the dataset available to other researchers. Think about whether or not this is an option for you. **keep in mind that each file format you use (audio, video, statistical file) and each kind of data collection (oral history, recorded interview) will have different challenges for you as you protect privacy and confidentiality.
  • There are best practices for keeping identifiable data secure Note: Be careful about indirect identifiers. Some data repositories will allow you to keep a copy of identifiable data in a dark archive while making available a copy of your data stripped of identifiers.
  • Intellectual Property rights are always murky, but even more so with data. The most important thing to remember is that facts are not covered by copyright and, in most discussions of intellectual property and data, most discussions of IP law as it relates to data, treat data as being synonomous with facts. However, if they are arranged and selected in certain ways they may be. It is difficult at times to know what data you can use, for what purposes, and how you should cite the data. Similarly, when you create a data set you want to make it clear to others how they can cite your data and what they can do with the dataset. If you are working for the University of California, they own your data. However, as a data collector you have rights to say how your data should be shared, used, and cited, assuming those rights weren’t already established in your grant. As previously mentioned, some granting agencies have data sharing requirements. There are three legal mechanisms for sharing your data: licenses, contracts, and waivers. Discuss terms of use. Creative commons.
  • We have covered a huge amount of information in a very short time and we want you to know that we are here to help you as you proceed with you work. You can meet with us one-on-one and we encourage you to do this as you begin your work.On this page are the key resources we have used in our presentation and you can refer back to them at any time.Explain why each is included.And now we can take any questions you may have.
  • Thanks, etc.
  • Data Management for Education Research

    2. 2. 2 G OALS FOR TODAY ’ S SESSION In this session you will:  Understand opportunities and reasons for data management planning  Learn about best practices for collecting and organizing research data  Learn about best practices for protecting privacy and confidentiality in data  Learn about resources and support available to researchers at UCLA GSE&IS 3/28/2008
    3. 3. 3Data Life-cycle GSE&IS 3/28/2008
    4. 4. 4 D ATA M ANAGEMENT P LANS  What is a data management plan?  Why do I need to write one?  What tools are available to help me? GSE&IS 3/28/2008
    5. 5. 5 W HAT IS A DATA MANAGEMENT PLAN ? A data management plan is a document that describes what you will do with your data during your research and after you complete your research GSE&IS 3/28/2008 From Carly Strasser, Caliufornia Digital Library
    6. 6. 6 W HAT IS A DATA MANAGEMENT PLAN , PART 2  Elements of a data management plan  What are your data?  What formats will you be using?  How will you describe this data?  What intellectual property and privacy rights are associated with this data?  How will you share this data? If you don’t plan on sharing it, why not?  How much will your data management cost? GSE&IS 3/28/2008
    7. 7. 7 W HY CREATE A DATA MANAGEMENT PLAN ?  Fulfill requirements from funding agencies  Fulfill requirements from journals  Regardless of the requirements, good data management is an essential skill for researchers. GSE&IS 3/28/2008
    8. 8. 8 W HY ALL THESE NEW REQUIREMENTS ? GSE&IS 3/28/2008
    9. 9. 9 TOOLS FOR CREATING A DMP  DMP Tool -  ICPSR - nt/datamanagement/dmp/elements.html  The UCLA Social Science Data Archive (that’s us!) - GSE&IS 3/28/2008
    10. 10. 10 P LANNING & ORGANIZING DATA COLLECTION  What kind of data is being collected?In planning your researchyou should think about  What methods will you use to collect the data?  How would describe your data so that others can use it without your help?  Where do you plan to store your data? If you plan tothese issues: share your data with others, how do you plan to do this?  Where can you get training?  Where can you get help? GSE&IS 3/28/2008
    11. 11. 11 D IFFERENT FILE TYPES = DIFFERENT DATA MANAGEMENT  Video •Project overview •Participants  Audio •Privacy  Qualitative •Intended use now and  Quantitative in the future •Equipment and Software •Metadata GSE&IS 3/28/2008
    12. 12. 12 D ATA C OLLECTION – VIDEO•Create a Project overview *Participants and events, main point of the video, structure, participant/observer/interviewer relationship *Specific problems you hope the video will solve *Intended uses; whether or not publicly sharable *Consent of participants; Protection of privacy/confidentiality•Choose pre- and post-production or analysis software: *open source vs proprietary standards•Keep detailed metadata – keep lots of copies GSE&IS 3/28/2008
    13. 13. 13 M ETADATA – VIDEO•Type/Format •System req, for access *DVD, HD, mpeg, mov, *Windows media•Run time *QuickTime *hours, min, sec *RealPlayer•Title •Download req. *size of file•Producer/author *software needed•Date(s) •Contact info *real time video was made •Persistent identifier•Location(s) *place of production •Other documentation *geographical areas *annotations, docs•Content •Video clips (if app.) *annotations 3/28/2008
    14. 14. Data collection – Audio 14•As with video, begin with a Project overview•Equipment - portable audio recorder most useful (pre- and post- recording)• Software for recording and editing *Computing specification needed and what is available to you *Open source vs proprietary software• Plan how you will manage during and after project *Always keep original file and edited file copies; use highest quality possible *Web hosting/streaming , CD or DVD storage *Licensing and privacy issues• Maintain your metadata from the very beginning *File naming conventions *File formats – MP3, WAV, AIFF or lossless FLAC (compressed or uncompressed) GSE&IS 3/28/2008
    15. 15. 15 Metadata – AudioKey pieces of information needed:• Structural • Technical schemas (most important for *Relationship to other audio files in re- use and preservation) same project *AudioMD *Time period *Adobes XMP (Extensible Metadata *Geographic and location details Platform) *MPEG-7• Descriptive *Title, Creator, Subject, Description of • Embedded project, content and Coverage *Do not rely on this as a metadata schema or preservation resource• Administrative *Rights, licensing, who can use GSE&IS 3/28/2008
    16. 16. 16 D ATA COLLECTION – QUALITATIVE Examples :  Observation field notes/technical In-depth/unstructured interviews, fieldwork notes including video  Case study notes Semi-structured interviews  Minutes of meetings Structured interview questionnaires containing  Press clippings substantial open comments  Court transcripts Focus groups Unstructured or semi-structured File format: diaries text , ascii, rtf, etc. GSE&IS 3/28/2008
    17. 17. 17 D ATA ANALYSIS – QUALITATIVE Dedoose *Cross-platform app for analyzing text, video, and spreadsheet data (analyzing qualitative, quantitative, and mixed methods research). NVivo, ATLAS-tiand MAXQDA *Organizes projects into raw data, coding tree, coded data, and associated memos and notes to be saved NUD*IST – no longer really used, prefer Nvivo GSE&IS 3/28/2008
    18. 18. 18 M ETADATA - Q UALITATIVE Interviews, transcriptions, oral histories, etc. *Unique identifier, a name or number, uniform layout, numbered pages *Note date, place, interviewer name and interviewee details *Use speaker tags, have line breaks between turn-takes *Use pseudonyms to anonymize personal identifying information Metadata schema QuDEx *Analysis software can output to this schema *Covers: *creator *method *date/time *place *size, *unique identifier *codes *coding structure 3/28/2008
    19. 19. 19 D ATA COLLECTION - QUANTITATIVE Surveys, numerically coded records, spreadsheets Variety of methods: f-2-f, phone, mail, web *Survey monkey, Qualtrix Documented with questionnaire, codebook *Question wording, universe, sampling, weighting, unit of analysis, geography, time period, coding format/structure, Consent of participants; Protection of privacy/confidentiality Choose pre- and post-production or analysis software: *open source vs proprietary standards Keep detailed metadata GSE&IS 3/28/2008
    20. 20. 20 M ETADATA - Q UANTITATIVE Hypothesis, data collection method Names of those involved in the project File structure: question text, variables, variable names/labels, value names/labels, values, frequencies, missing data, recodes, branching, interviewer instructions Disclosure analysis Storage formats, distribution formats , persistent identifier Source of data if not from survey Source of funding, if any Data Documentation Initiative (DDI) GSE&IS 3/28/2008
    21. 21. 21 D ATA P ROTECTION - CONFIDENTIALITY  If your research involves human subjects, you will need to consider both legal and ethical obligations in managing and sharing your data.  Confidentiality refers to the agreement between the researcher and the participant about how the participants identifiable private information will be handled, managed, and disseminated.  As a researcher, you need a clear view about how to protect the privacy of your research subjects. From: MANTRA: Research Data Management Training GSE&IS 3/28/2008
    22. 22. 22 C ONFIDENTIALITY, PART 2  How to minimize risk of disclosure:  If possible, collect the necessary data without using personally identifying information.  If personally identifying information is required, de- identify your data upon collection or as soon as possible thereafter.  Avoid transmitting unencrypted personal data electronically.  Be careful with indirect identifiers. GSE&IS 3/28/2008
    23. 23. 23 D ATA M ANAGEMENT R OLLOUT S URVEY JISC Data Management Rollout Project Survey Results- 2012- GSE&IS 3/28/2008
    24. 24. 24 I NTELLECTUAL P ROPERTY  Know your rights as a data producer and data consumer.  Ownership of data  Three legal mechanisms for sharing data: 1. Contracts 2. Licenses 3. Waivers GSE&IS 3/28/2008
    25. 25. 25 R ESOURCES UKDA tools for creating and managing data: MANTRA – online learning tool/tutorial JISC Digital Media Advice UCLA SSDA resources for data management ml ICPSR resources for documenting and preserving mework.html GSE&IS 3/28/2008 Qualidata
    26. 26. 26 Q UESTIONS AND D ISCUSSION GSE&IS 3/28/2008