The title of my talk is “What is the research life cycle?” but I don’t think that quite captures it.
How about this?
Or better yet this! I like this. What is the research data life cycle?
We talk about life cycles when we want to describe processes that can repeat, but don’t necessarily, can be linear, but aren’t necessarily, can start at a certain point, but don’t necessarily, and that do follow certain patterns. Many organizations and groups use this approach; let’s take a quick look.Image credit: http://www.flickr.com/photos/learnscope/3025020871/ by robynejay
DataONE is an NSF funded, virtual data center for biology, ecology, and environmental sciences.DataOne has the overarching goal of building a new culture of data access and data sharing. This is an international collaboration working with scientists and librarians, as well as other stakeholders.This depiction takes a look from the scientist’s perspective; we see an interconnected “bi-cycle.” A complex scientific process on the right and a more streamlined funding and publication cycle on the left.Source: Carly Strasser & DataONE
Let’s look at another one, this time from the University of Virginia. Here, the data cycle is encompassed by the proposal and project aspects. This is a more library-centric view than DataONE’s model—it’s looking for the points where the library’s services might be engaged.A recognition that getting involved earlier in the life cycle of research makes handling the products of research easier to manage for the long term. Credit: Sherry Lake, http://www.slideshare.net/shlake/managing-the-research-life-cycle
This depiction comes from Dr. Liz Lyon of the UKOLN (formerly Office of Library and Information Networking) at the University of Bath. She has done a great deal of thinking and speaking on the topic of what she calls the “Informatics Transform” or re-engineering libraries for the data environment.So this depiction of the research data life cycle pulls together the researcher and the library view and it adds the the context of community, of stakeholders. So, we’ll use this one to take a closer look at the life cycle.Based on: http://blogs.bath.ac.uk/research360/
We start with Plan & DesignSource: http://blogs.bath.ac.uk/research360/
The DMPTool“walks” scientists through the process of developing a concise, but comprehensive data management plan that could enable good stewardship of data and meet requirements of sponsors and home institutions.
After planning, assess what it takes to fulfill the terms of the plan.Credit: Dr. Liz Lyon, http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/liz-lyon-vala2012-informatics-transform-final.pdf
Moving on to Collect & CaptureSource: http://blogs.bath.ac.uk/research360/
You saw a history of data collection from the sciences in Carly’s talk this morning. I’m going to show you just a couple of examples from the digital humanities. I’m sure you know about many more. Here’s a topic cloud from the blog of a digital humanist,Source: https://dhs.stanford.edu/
And here we have Los Angeles neighborhood with an overlay mapping the railroads of 1900 from HyperCities.Source: http://hypercities.ats.ucla.edu/
Source: http://blogs.bath.ac.uk/research360/Interpret & Analyze
Carly has covered Data Analysis and visualization at some length this morning so I’ll spend most of my time talking about Quality.
Analysis tools for scientific data come in 3 categories: programming languages, statistics and analysis tools, and workflow tools. DataONE gives the scientists who participate in its network an Investigator Toolkit, which standardizes the toolset and promotes the use of tools that follow good practice.
Researchers following good data management have to engage in two kinds of Quality-related activities.Quality assuranceActivities to ensure data quality before collectionQuality controlMonitoring & maintaining data quality during the studyImage source: http://www.flickr.com/photos/debaird/2761296354/By debaird™
When they do this work, they are looking for two kinds of errors.Errors of CommissionIncorrect or inaccurate data enteredExamples: malfunctioning instrument, mistyped dataErrors of OmissionData or metadata not recordedExamples: inadequate documentation, human error, anomalies in the fieldhttp://www.flickr.com/photos/nickwebb/3016498475/By Nick J WebbCredit: Bill Michener & Carol Tenopir, “Summit on Fundamental Concepts in Data Training”
Before: 1. Define & enforce standards2. Assign responsibility for data qualityDuring: Techniquesto reduce errors, such as double entry, text-to-speech, database, and careful documentation of any changeAfter: No missing, impossible, or anomalous values; Ensure data line up in columns; Perform statistical and graphical summaries Credit: Bill Michener & Carol Tenopir, “Summit on Fundamental Concepts in Data Training”
Source: http://blogs.bath.ac.uk/research360/Manage and Preserve is a very large topic. This is an area that researchers hope only to engage in small part, and then ask for assistance with the rest. I would argue that we’ve been talking about good data MANAGEMENT practices all along here, as Carly alluded to this morning. One topic needs a bit more focus…
Image credit: http://www.flickr.com/photos/leia/29147578/By LeiaLeia ScofieldCredit: Bill Michener & Carol Tenopir, “Summit on Fundamental Concepts in Data Training”The researcher can use the metadata to communicate with other scientists who may re-use the data; working with metadata standards is important.
Credit: Bill Michener & Carol Tenopir, “Summit on Fundamental Concepts in Data Training”Image credit: http://www.flickr.com/photos/lwr/5122938862/in/photostream/By Leo ReynoldsThe researcher can use the metadata to communicate with other scientists who may re-use the data; working with metadata standards is important.
Rather than talk about preservation per se, we use the term “curation” now, because it bakes in this sense of life cycle. This image illustrates all of the concepts that we think are included in this term.
Source: http://blogs.bath.ac.uk/research360/Release and Publish, or we could call this SHARE and PUBLISH.
Carly talked this morning about the barriers to data sharing. Let me talk a bit about the key enabler. Data citation.The backbone of data sharing rests upon the research actually citing the data. In scholarly writing, in social networking and in other data papers. What constitutes a data citation? This is a matter of some discussion right now. Research fields differ on the specific elements they consider essential, often due to requirements of their collection methodology. For example, if the source of data is a continuously running censor, then the timestamp of capture is a critical piece of information.Bill Michener, the PI for DataONE, admonishes researchers to assign a meaningful title to all data elements, but other researchers insist that title doesn’t make sense in the world of data.One thing we know: a persistent or long-term identifier is an absolute must.
The KEY IDEA here is: Put something unfamiliar (a dataset) in a familiar wrapper (a citable paper)
Source: http://blogs.bath.ac.uk/research360/And finally, Discover and Reuse. REUSE is a recognition of the feedback loop.
Data user agreements (DUA): Requiring anyone downloading their data to first agree to a statement about using the data. Before downloading an object with a DUA, the enduser will first be presented with additional information about using the data, including any restrictions. In some systems, the data owner can pre-define the DUAs at some level to fit their requirements. This is a functionality that Merritt has just built out for the DataShare project, so you’ll want to ask about that.
Image source: http://www.flickr.com/photos/ausnahmezustand/4752989186/By ausnahmezustandAccording to Liz Lyon, one of the keys to success is implementing integrated services across multiple institutional stakeholders with joint planning and shared service development.
1. What is the research life cycle? Data Curation for Practitioners Workshop
2. dataWhat is the research life cycle? Data Curation for Practitioners Workshop
3. What is the research data life cycle? Data Curation for Practitioners Workshop
4. Many models to choose from Data Curation for Practitioners Workshop
5. DataONE Plan Proposal writing Analyze CollectIdeas Research Integrate Assure Discover Describe Publication Preserve Data Curation for Practitioners Workshop
6. University of Virginia Data Re- Data Deposit Discovery Use ArchiveProposal Project Data Data Data End ofPlanning Start Up Collection Analysis Sharing ProjectWriting Re- Purpose Data Life Cycle Data Curation for Practitioners Workshop
7. Research360 Data Curation for Practitioners Workshop
8. Research Data Life Cycle Data Curation for Practitioners Workshop
9. What is a data management plan?A document that describes what you will do with your data during and after you complete your research Data Curation for Practitioners Workshop
10. Data Curation for Practitioners Workshop 10
11. Going from plan to realityInfrastructureStaff skills and resourcesManagement support Data Curation for Practitioners Workshop
12. Research Data Life Cycle Data Curation for Practitioners Workshop
13. Collection, an evolving story Data Curation for Practitioners Workshop
14. Collection, an evolving story Data Curation for Practitioners Workshop
15. Research Data Life Cycle Data Curation for Practitioners Workshop
16. Components• Data analysis• Data visualization• Quality assurance/control Data Curation for Practitioners Workshop
17. Analysis & Visualization Data Curation for Practitioners Workshop
18. QualityData Curation for Practitioners Workshop
19. 2 kinds of errors• Commission• Omission Data Curation for Practitioners Workshop
20. QA/QC Activities• Before data entry• During…• After… Data Curation for Practitioners Workshop
21. Research Data Life Cycle Data Curation for Practitioners Workshop
22. Metadata for researchers• Promotes data sharing and re-use• Preserves institutional memory and investment in data• Promotes partnerships and “advertises” data collections• Creates efficiency – avoids duplication of data collection efforts• Gives credit to dataset creator(s) Data Curation for Practitioners Workshop
23. Metadata for researchers• Standards & templates• Metadata tools• Re-use documentation• Team approach• Data dictionaries• Get help! Data Curation for Practitioners Workshop
24. Data curation for practitioners MacKenzie Smith – IDCC 2010 Data Curation for Practitioners Workshop
25. Research Data Life Cycle Data Curation for Practitioners Workshop
26. Data Sharing1. Data citation2. Data citation3. Data citation Data Curation for Practitioners Workshop
27. Data Publication1. Cover sheet with citation data2. title, date, authors, abstract, and persistent identifier (DOI, ARK, etc.) Data Curation for Practitioners Workshop
28. Research Data Life Cycle Data Curation for Practitioners Workshop
29. Discovery ≥ Google• A&I Indexes• Academic/domain portals• Social media Data Curation for Practitioners Workshop
30. Data (re-)use agreements Data Curation for Practitioners Workshop
31. Stakeholders• Industry• Publishers• Public• Academics• Funders• Others Data Curation for Practitioners Workshop
32. For more information• The Informatics Transform: Re- Engineering Libraries for the Data Decade – http://ijdc.net/index.php/ijdc/article/view/210• DataONE Education Modules – http://www.dataone.org/education-modules• UC3 Data Management Planning Resources – http://www.cdlib.org/services/uc3/dmp/index.html Data Curation for Practitioners Workshop