I gave this presentation at the University of New Hampshire's Center for Coastal and Ocean Mapping on April 18, 2014 describing the marine geophysical data life cycle and a variety of resources available to help investigators navigate the world of data management, as well as efforts focused on optimizing high-quality publicly available data.
1. Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
2. Research InterestsResearch Interests
• Mapping seafloor morphology to
understand processes at a variety of scales
– Coastal, deep sea, rivers, lakes
• Techniques for remote seafloor
characterization using multibeam sonar
– Morphology
– Backscatter intensity
• Multibeam sonar data quality
• Data preservation, integration and access
3. Increasing Importance ofIncreasing Importance of
Data ManagementData Management
• Support science and discovery
• Scientific reproducibility
• Costs of acquisition
• Optimizing operations
• Increasing volumes of data
• Data policies with increasing focus on data
sharing
• Data Syntheses
• Data Publication
4. How can we “lessen the burden” of data
management for the science community?
5. A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
6. • Investigator-focused
• Ensure ‘Fitness for Re-use’ through data
stewardship
• Ensure professional data curation services
– Long-term archiving & access
– Persistent, unique identification
– Discoverability (metadata registration)
• Integrate with the ‘scholarly communication
ecosystem’
Domain-specific Repositories & Services
7. • Marine Geophysical Data
• Bathymetry, sidescan, subbottom
• Academic Seismic Facility (MCS, SCS)
• Data from AUV, ROV, HOV, Ship
• Complementary datasets
• Navigation, bottom photos
• Sample-based Data
• Sample Registry (SESAR)
• Geochemistry
• Geochronology
• Technical reports
Data Curated in IEDA Systems
8.
9. Data Life Cycle: Plan
• Data Management Plan Tool
• Facilitate assembly
• Inform Investigators
• Inform down-stream repositories
• Promote dialogue
• Data Acquisition Plan
• Metadata & data templates
• Promote & facilitate
contemporaneous
documentation
10. Data Life Cycle: Collect & Assure
• Promote Best Practices
• What to document
• How to document
• Tools and workflows
to facilitate digital
documentation
• Metadata & Data
Templates
11. Data Life Cycle: Document & Preserve
• Document & capture data &
metadata as soon as it is available
• Simple interfaces & guidelines
• Sample metadata registry
• Link to complementary data
& metadata
12. Data Life Cycle: Analyze
• Tools to:
• Support domain specialists
• Make specialist data accessible
to non-specialist users
• Integrate & visualize data
• Quantitative access to Data
Syntheses
• Access to complementary
data & resources
13. Data Life Cycle: Integrate & Share
• Advise on what to preserve & how
• Data supporting pubs
• Data of value
• Facilitate data prep.
• Metadata requirements
• Templates
• Format guidelines
14. Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
• Long-term curation & access
• Inclusion in syntheses
• Links to scientific publications
• Data Publication
• Data use, discovery & re-use
• Attribution & collaboration
• Data Download Stats
• Data Compliance Reporting
15. Data Compliance Reporting Tool
• Tool for demonstrating compliance:
• Award-based
• Informed by DMP
• Report includes:
• Data Inventory
• Data release Status
• Links to data
• Save as PDF
✔
http://www.iedadata.org/compliance/
18. MGDS Search & Data CatalogMGDS Search & Data Catalog
• Text & Map-based Search
• Rich metadata
• Download data files
• Proprietary Hold
• Password Access
• Attribution
• Links to Refs
• Data DOIs
• Download Stats
• Web Services
19. MGDS Search & Data CatalogMGDS Search & Data Catalog
20.
21.
22.
23.
24.
25.
26. Data files are great - if you know what to do
with them…
How can we make data quantitatively
accessible to non-specialists?
31. GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
32. • 1992 – Ridge Multibeam
Synthesis Project
• 2003 – Expanded to
include US-funded data
from Southern Ocean
• 2004 – present --
Expanded to include
public domain data from
throughout global oceans
• ongoing growth by ~80
cruises/yr
• 2009 – G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
33. GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
44. GMRT: Next Steps
• GMRT Version 2.6 April 2014
– MB Data from ~50 more cruises
– More Contributed grids (including LOS)
• Revise GMRT MapTool (web interface)
– more download format options
• Enhanced Web Services
– Gridded Content
– Attribution
• Enhanced Accessibility to Source Data
– DOIs on processed source data files
– Search & download multiple processed MB files
• GEBCO High-Res Effort
45. How can we optimize quality of the
data being preserved?
(Good data in = good data out)
47. • Focus on Raw Underway Data
• Instruments permanently installed on ships
• Fleet-wide solution
~500 cruises/year
• Core Services
• Data documentation & preservation
• Programmatic Quality Assessment
• Navigation Products
• Event Logger
• Real-time MET/TSG
R2R Data Stewardship
48. MB Raw Data PreservationMB Raw Data Preservation
• 769 file sets
• 291,673 files
• ~ 7.6 TB
• from 671
cruises
as of Apr 15, 2014
49. R2R: Quality AssessmentR2R: Quality Assessment
• Programmatic post-cruise review of data
• Identify “suspicious” data
• Feedback to Operators
• Distributable Code
• Leverage existing tools where possible
(for MB data: MB-System)
• Customizable thresholds
• Generate QA Report
• Document QA procedures
• Provide info for downstream data use
58. • How can we “lessen the burden”?
– Simple workflows, interfaces, & guidelines
• How can we engage the science community?
– High-value Content
– Reward (attribution, citation)
• How can we make data accessible to non-
specialists?
– Data synthesis
• How can we optimize data quality?
– Best practices at acquisition
Summary:
59. How to Navigate the
Data Life Cycle
• Know what resources are available
• Tools to make process easier
• Access existing Data
• Communicate
• Upstream
• Downstream (Data Managers)
• Plan ahead
• Document contemporaneously
• Treat data as a valuable community resource
• Participate! Input always needed for:
• Metadata & data format standards
• Usability of interfaces
Editor's Notes
Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
Copy right claims can not be made by data contributors for the synthesized products