6. Develop Solutions for Research
6
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
7. The DataONE Data Life Cycle
7
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
8. 1. Plan:
Create and Follow a Data Management Plan
8
Michener WK (2015) Ten Simple Rules
for Creating a Good Data Management Plan.
PLoS Comput Biol 11(10): e1004525.
doi:10.1371/journal.pcbi.1004525
26. The DataONE Data Life Cycle
26
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
27. 2. Collect and Organize:
Logically Structure the Data to Support Use
27
CCimagebyJustinSeeonFlickr
Jones et al. 2007
28. 2. Collect and Organize
28
• Columns of data are consistent:
only numbers, dates, or text
• Consistent Names, Codes, Formats (date) used in each column
• Data are all in one table, which is much easier for a statistical program to work
with than multiple small tables which each require human intervention
29. 2. Collect and Organize
29
• Columns of data are consistent:
only numbers, dates, or text
• Consistent Names, Codes, Formats (date) used in each column
• Data are all in one table, which is much easier for a statistical program to work
with than multiple small tables which each require human intervention
49. The DataONE Data Life Cycle
49
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
50. 4. Describe:
Develop Comprehensive, Standardized
Metadata
50
Darwin Core – species and biodiversity
collections
EML – Ecological Metadata Language
ISO 19115 – geospatial data
http://rs.tdwg.org/dwc/
54. Exercise
• Search for repositories that host particular
types of data (e.g., biodoversity, trait)
• Visit one of the repositories and identify the
services that they offer
54
55. The DataONE Data Life Cycle
55
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
79. 6. Discover:
Support Discovery of Relevant Data
79
Dryad DataONE google
plant trait 2,137 26,300,000
plant trait datadryad 803 1,908 17,400
• Differential content searched
• Automated annotation via ontologies and other
approaches
• Differential filtering
• Different definitions of data sets (e.g., entire
package vs individual data sets)
80. The DataONE Data Life Cycle
80
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
We use the DLC concept for 2 primary activities:
Identifying and developing (needed) tools that facilitate different components of the data life cycle; and
Training, which you will hear more about later
The basic DLC
DataONE has worked with several other institutions to pioneer the DMPTool and to promote training in the use of the DMPTool
The basic DLC
DataONE has developed connectors with R and Matlab that make it easy to take data from DataONE repositories, analyze the data and deposit derived products back in DataONE repositories. We also encourage best practices in organizing data using common tools such as ArcGIS and MySQL.
This is an example of a data entry form created in Googledocs. Such forms are easy to create, and free. Here, a form field is being created that will allow the user to select from three locations where data were collected. In practice, GoogleDocs work best for entering survey data, or entering lots of text data.
The advantages to using a data entry form, as opposed to entering data directly in to a spreadsheet, is that the form can enforce data entry rules – that is, you can create a pick-list of items for a user to select from. That way, you have consistent info being entered – a user will always enter Deep Well, instead of DW.
Data entered into a Googledoc form is stored in a spreadsheet. These data can be downloaded for further analysis.
Excel is a very popular data entry tool. It also allows you to enforce data validation rules. Here, a dropdown list has been generated that allows the user to only select entries from this list. In this way, only defined species codes get entered, and the data is consistent.
Excel is a very popular data entry tool. It also allows you to enforce data validation rules. Here, a dropdown list has been generated that allows the user to only select entries from this list. In this way, only defined species codes get entered, and the data is consistent.
Here is another example of data validation using Excel. Height has been defined to contain values between 11 and 15. When 20 is entered, the user is told that they have entered an illegal value.
Here is another example of data validation using Excel. Height has been defined to contain values between 11 and 15. When 20 is entered, the user is told that they have entered an illegal value.
Here is another example of data validation using Excel. Height has been defined to contain values between 11 and 15. When 20 is entered, the user is told that they have entered an illegal value.
The basic DLC
Through the connectors with R and MATLAB, QA/QC is facilitated. We will also soon be releasing a new tool (on the right) that analyzes metadata and provides guidance wrt meeting community standards (such as LTER best practices).
The basic DLC
DataONE’s affiliates have developed user friendly metadata management tools like “Morpho” and will soon be releasing a new web-based metadata entry tool that will further ease metadata management (initially through the Arctic Data Center).
DataONE’s affiliates have developed user friendly metadata management tools like “Morpho” and will soon be releasing a new web-based metadata entry tool that will further ease metadata management (initially through the Arctic Data Center).
The basic DLC
DataONE encourages good practices in data preservation via our 30+ affiliated repositories as well as other repositories listed in re3data.org. We also encourage preservation of code, workflows, and other data resources.
DataONE encourages good practices in data preservation via our 30+ affiliated repositories as well as other repositories listed in re3data.org. We also encourage preservation of code, workflows, and other data resources.
The basic DLC
Color of lines set by computer monitor
Color of lines set by computer monitor
Color of lines set by computer monitor
Color of lines set by computer monitor
Color of lines set by computer monitor
Color of lines set by computer monitor
DataONE encourages good practices in data preservation via our 30+ affiliated repositories as well as other repositories listed in re3data.org. We also encourage preservation of code, workflows, and other data resources.
Color of lines set by computer monitor
The basic DLC
DataONE is doing a number of things related to data integration. First, connections with R and MATLAB simply many data processing setps. Second, manual and automated semantic annotation services make it easy to find and interpret the data one is looking for, Third, the DataONE provenance management system is being release to allow researcher to readily see how data were derived and where they were subsequently used.
DataONE is doing a number of things related to data integration. First, connections with R and MATLAB simply many data processing setps. Second, manual and automated semantic annotation services make it easy to find and interpret the data one is looking for, Third, the DataONE provenance management system is being release to allow researcher to readily see how data were derived and where they were subsequently used.
The basic DLC
DataONE has partnered with development teams from Kepler and VisTrails to develop and support use of scientfic workflows in research.
DataONE has partnered with development teams from Kepler and VisTrails to develop and support use of scientfic workflows in research.
DataONE has partnered with development teams from Kepler and VisTrails to develop and support use of scientfic workflows in research.
D
D
We do this through in-person, web-based, and resource based training activities.
Lastly, DataONE focuses significant effort of training in various elements of the Data Life Cycle.