Introduction to concepts of data management in oceanography covering data life cycle, data management plan, some tools and examples of mistakes to avoid.
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Recipes for geodata management in oceanography
1. NF—POGO Alumni Network for Oceans
“A global study of coastal production, acidification
and oxygenation at selected study sites”
1st
workshop
18–20 April 2018
Lisbon, Portugal
Sebastian Krieger
sebastian@nublia.com
RECIPES FOR GEODATA
MANAGEMENT IN OCEANOGRAPHY
2. AGENDA
PART 1
●
Introduction
●
Planing, preparation
●
Data collection,
sampling
– Discrete
– Time-series
– Satellite images
●
Data management
and quality control
PART 2
●
Documentation,
storage
●
Data curation
●
Tools
●
Please avoid (some
examples)
●
Concluding remarks
5. IMPORTANCE OF OCEAN DATA
●
Understand processes that control the
environment, especially the climate;
●
Necessary for effective decision making:
– Promote sustainable development of economic
activities;
– Ensure maritime safety;
●
Impacted activities:
– Navigation;
– Sea transportation;
– Fisheries;
– Disaster mitigation;
– Environmental monitoring.
6. THE VALUE AND THE COST OF OCEAN
DATA
●
Expensive
– Staff
– Instruments and laboratory infrastructure
– Ship rates
– Data communication and storage infrastructure
●
Unique and unrepeatable
– Changing environment
●
Sparse spatio-temporal coverage
●
Share data to ensure maximum benefit of the
information
– Data reuse
7. IMPORTANCE OF DATA MANAGEMENT
●
Constantly increasing volume of data
– In some cases more rapidly than our ability to analyse.
●
Handling data:
– Point of collection;
– Processing;
– Quality control;
– Archival;
– Dissemination
●
Allows data integration from different sources and
sensors (i.e. in situ, satellite, model)
●
Allows near real-time and high quality operational data
distribution;
●
Information to facilitate data dissemination
https://www.whoi.edu/
8. ELEMENTS
●
Standardized data collection
– Ensures long-term value of datasets
– Allows data integration from different sources
●
Common vocabularies
– Standardized terms
– Ensures consistency and interoperability
– Reduces ambiguity
– Enables automation of data analysis
●
Standard data formats (i.e. netCDF)
– Proper data stewardship
– Helps preserving information over longer terms
12. “A goal without a plan is just a wish.”
– Antoine de Saint-Exupery
13. PLAN
●
Remember: the goal is to produce self-describing, reusable data sets
●
Establish your data management strategy in advance, before the first
piece of data is collected.
●
Define:
– How you will collect, document, organize, manage, and preserve your data
●
Documenting your data ensures that you and others will understand, and
use the data in the future
●
Recommend appropriate ways to cite your data
●
Any scientist should discover, use and interpret the data even long after
data collection (i.e. 20 years)
●
Revisit your data management plan frequently and make changes as
necessary
14. PLAN
●
Based on your scientific
hypotheses and sampling plan,
define what data will be
generated
●
Decide on a data repository
●
Organize your data (i.e. directory
structure, file formats, …)
●
Manage your data:
– Who will be in charge?
– How to handle version control?
– Do you backup your data? How
often?
●
Describe your data (metadata
record)
●
Share your data
●
Preserve your data
●
Consider your budget
●
Explore available institutional
resources
15. CONTENTS OF THE DATA MANAGEMENT PLAN
●
Some funding agencies might request researchers to include a data
management plan within their research proposals
●
Types of data to be authored;
●
Standards that would be applied, for example format and metadata
content;
●
Provisions for archiving and preservation;
●
Access policies and provisions; and
●
Plans for eventual transition or termination of the data collection in the
long-term future.
18. COLLECT
●
Ensure data usability
●
Consider methods and
documentation carefully in
advance
●
Study your instruments’ user
manuals
●
Create templates to use during
data collection
– Contextual data
●
Describe each parameter
(readme.txt):
– Format, units, code, missing
values
●
Use consistent data organization
●
Use same format throughout files
– Include header rows to describe
columns
●
Use plain ASCII characters
●
Use stable, non-proprietary
software and hardware
●
Assign descriptive file names
●
Keep your raw data raw
●
Create a parameter table
●
Create a site table
●
Use ISO dates and UTC time
19. DON’T FORGET YOUR SECCHI DISK
●
Make complimentary measurements using the
Secchi disk.
– Even if you already measure Chl-a, turbidity, or
PAR profiles
●
Affordable
●
One of the oldest and simplest marine instrument.
●
But remember and record
– Secchi depth
– Date and time of the measurement
– Position of the sun with respect to the observer,
– Amount of cloud cover
●
http://www.secchidisk.org/
20. DISCRETE, TIME-SERIES, SATELLITE
IMAGES
●
For each kind of data sampling, we need
different data management strategies
●
For example,
– Discrete station data may be stored on individual
text files for each station
– Time-series data is ideally stored in one single
text file
– Satellite images may be stored in netCDF files on
a specific folder structure
22. QUALITY ASSURANCE AND QUALITY CONTROL
●
Quality assurance:
– Prevents defects
– Focuses on the process of data collection
– Proactive process
●
Quality control:
– Identify and correct defects in data products
– Reactive process
●
Standards for quality assurance and quality control should be well
documented.
23. QUALITY MANAGEMENT SYSTEMS
●
Quality: “degree to which a set of inherent characteristics of an object
fulfils requirements” (ISO 9000:2015)
– If characteristics meet all requirements, high quality is achieved
– Relative concept
– Question of degree
●
Quality management: “management with regard to quality” (ISO
9000:2015)
– Establishing quality policies, quality objectives and processes
– Activities used to direct, control and coordinate quality
●
Quality control: “part of quality management focused on fulfilling quality
requirements” (ISO 9000:2015)
– Activities to ensure that quality requirements are actually being met
24. QUALITY MANAGEMENT SYSTEMS
●
Quality management systems: “part of a management system with regard to
quality” (ISO 9000:2015)
– Framework to comply with applicable requirements, control its processes and
minimize risk, and satisfy needs and expectations
– Usually uses a process approach to manage and control how the quality
policy is implemented and how quality objectives are achieved
– Set of rules (procedures) to follow in order to achieve quality
– Encourage and support continual improvement of the quality of delivered
services and products
– Covers:
●
Management of the organization
●
Technical procedures
●
Quality controls on products or services
●
Actions to be taken if specifications are not met
25. ASSURE
●
Perform basic quality assurance
and quality control during data
collection, entry and analysis
●
Describe any conditions that
might affect data quality
●
Identify estimated values
●
Double-check data entered by
hand
●
Use quality level flags to indicate
potential problems
●
Check data format for
consistency
●
Make statistical and graphical
summaries (i.e. minimum,
maximum, average)
●
Check questionable or
impossible values and identify
outliers
●
Communicate data quality
●
Identify missing values
27. DESCRIBE
●
Data documentation (metadata) is essential for future understanding of
your data
●
Describe the digital context:
– Name of data set
– Name of data files in data set
– Date the data was last modified
– Example data file records
– Pertinent companion files
– List of related data sets
– Software used to prepare data, including version
– Data processing that was performed
28. DESCRIBE
●
Describe personnel and stackeholders
– Who collected the data?
– Who should we contact for questions?
– Sponsors
●
Describe scientific context
– Why did we collect the data?
– What data were collected?
– What instruments were used (including model and serial number)?
– What were the environmental conditions during collection?
– Where was the data collected and at what spatial resolution?
– When was the data collected and at what temporal resolution?
– What were the standards and calibrations used?
29. DESCRIBE
●
Information about parameters:
– How were data measured or produced?
– What are the units of measurement?
– What was the format used in the data set?
– What are the precision, accuracy and uncertainty?
– Any additional information about data?
– Are there taxonomic details?
– Define codes that were used
– Quality assurance and activities
– Are there known problems that limit data use?
– How should we cite the data?
31. PRESERVE
●
Use a data centre or archiving service that is familiar with your research
area
●
Identify data with long-term value
– You don’t need to archive all your data products
●
Store data using appropriate precision (significant digits)
●
Use standard terminology (i.e. CF conventions)
●
Consider legal and other policies
– Institutional policies on privacy and confidentiality
– Ensure you have appropriate permissions
– Data licenses
42. ONLINE DATA REPOSITORY
●
Planned development of an online georefereced
data management system
– Different environmental parameters and their
associated metadata
– Project management
– Cloud-based data distribution
– Online data visualization and analysis
43. MOBILE APP
●
NANO mobile application for data distribution
and visualization
●
Features (brainstorming):
– Tools to assist:
●
Cruise planning
●
Data collection
– Integration with online data repository
– Data visualization data from NANO projects
– Citizen science
– Early warning messages
45. POSSIBLE SOURCES OF MISTAKES
●
Relying too much on your
memory
●
Confusing longitude and latitude:
– Decimal degrees
– Degrees, minutes, seconds
– Universal Transverse Mercator
(UTM) coordinate system
●
Relying too much on your
instrument calibration
– Internal compass
●
Not understanding the
instrument’s manual
●
No preliminary sampling
simulation and instrumentation
tests
●
No backup
– Data
– Batteries
●
Forgetting to remove outliers and
missing values
●
No standardized date format
●
Using unfamiliar tools
●
Not checking data quality on site
or right after data collection
51. QUICK RECIPE
●
Plan: describe the data and how it will be managed and made
accessible throughout its lifetime
●
Collect: observe by hand or with sensors or other instruments and
place data into digital form
●
Assure quality of the data through checks and inspections
●
Describe data accurately and thoroughly using the appropriate
metadata standards
●
Preserve: submit to an appropriate long-term archive (i.e. data center)
●
Discover: locate and obtain useful data, along with its metadata
●
Integrate: combine data from different to form one homogeneous set
that can be readily analysed
●
Analyse the data