Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
Research InterestsResearch Interests
• Mapping seafloor morphology to
understand processes at a variety of scales
– Coastal, deep sea, rivers, lakes
• Techniques for remote seafloor
characterization using multibeam sonar
– Morphology
– Backscatter intensity
• Multibeam sonar data quality
• Data preservation, integration and access
Increasing Importance ofIncreasing Importance of
Data ManagementData Management
• Support science and discovery
• Scientific reproducibility
• Costs of acquisition
• Optimizing operations
• Increasing volumes of data
• Data policies with increasing focus on data
sharing
• Data Syntheses
• Data Publication
How can we “lessen the burden” of data
management for the science community?
A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
• Investigator-focused
• Ensure ‘Fitness for Re-use’ through data
stewardship
• Ensure professional data curation services
– Long-term archiving & access
– Persistent, unique identification
– Discoverability (metadata registration)
• Integrate with the ‘scholarly communication
ecosystem’
Domain-specific Repositories & Services
• Marine Geophysical Data
• Bathymetry, sidescan, subbottom
• Academic Seismic Facility (MCS, SCS)
• Data from AUV, ROV, HOV, Ship
• Complementary datasets
• Navigation, bottom photos
• Sample-based Data
• Sample Registry (SESAR)
• Geochemistry
• Geochronology
• Technical reports
Data Curated in IEDA Systems
Data Life Cycle: Plan
• Data Management Plan Tool
• Facilitate assembly
• Inform Investigators
• Inform down-stream repositories
• Promote dialogue
• Data Acquisition Plan
• Metadata & data templates
• Promote & facilitate
contemporaneous
documentation
Data Life Cycle: Collect & Assure
• Promote Best Practices
• What to document
• How to document
• Tools and workflows
to facilitate digital
documentation
• Metadata & Data
Templates
Data Life Cycle: Document & Preserve
• Document & capture data &
metadata as soon as it is available
• Simple interfaces & guidelines
• Sample metadata registry
• Link to complementary data
& metadata
Data Life Cycle: Analyze
• Tools to:
• Support domain specialists
• Make specialist data accessible
to non-specialist users
• Integrate & visualize data
• Quantitative access to Data
Syntheses
• Access to complementary
data & resources
Data Life Cycle: Integrate & Share
• Advise on what to preserve & how
• Data supporting pubs
• Data of value
• Facilitate data prep.
• Metadata requirements
• Templates
• Format guidelines
Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
• Long-term curation & access
• Inclusion in syntheses
• Links to scientific publications
• Data Publication
• Data use, discovery & re-use
• Attribution & collaboration
• Data Download Stats
• Data Compliance Reporting
Data Compliance Reporting Tool
• Tool for demonstrating compliance:
• Award-based
• Informed by DMP
• Report includes:
• Data Inventory
• Data release Status
• Links to data
• Save as PDF
✔
http://www.iedadata.org/compliance/
How do we engage the science community?
http://www.marine-geo.org/
MGDS Search & Data CatalogMGDS Search & Data Catalog
• Text & Map-based Search
• Rich metadata
• Download data files
• Proprietary Hold
• Password Access
• Attribution
• Links to Refs
• Data DOIs
• Download Stats
• Web Services
MGDS Search & Data CatalogMGDS Search & Data Catalog
Data files are great - if you know what to do
with them…
How can we make data quantitatively
accessible to non-specialists?
GeoMapApp
• Free Java Desktop App
• Basic GIS functionality
• Core functionality:
• GMRT Basemap
• Gridded & Tabular Data
• Linked Views
• Access online datasets (grids, shapefiles, tables)
• Attribution & links to source data
• Custom Portals
• Underway Geophysics, MB Sonar, DSDP, PetDB…
• Import & Export
• Table, Image, Grid, KMZ
http://www.geomapapp.org/
GeoMapApp default basemap: GMRT
GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
• 1992 – Ridge Multibeam
Synthesis Project
• 2003 – Expanded to
include US-funded data
from Southern Ocean
• 2004 – present --
Expanded to include
public domain data from
throughout global oceans
• ongoing growth by ~80
cruises/yr
• 2009 – G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
GMRT: Access
• Images & gridded data
• Desktop Apps
• GeoMapApp
• Virtual Ocean
• Web App
• GMRT MapTool
• iPad/iPhone App
• Earth Observer
• Web Map Services
• Images & Mask
Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff…
GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis
• Bad navigation
• Noisy outer beams
• Attitude problems
• Bad soundings
• Instrument problems
• Bad weather
• Sound velocity
• Slow speed in turns
• Quality assessment
for grid weighting
and resolution
Tracking and Managing MB
Content for GMRT
MGDS
Relational DB
MB data
files GMRT Access &
Web services
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
Cruise-level Attribution & Provenance
GMRT: Next Steps
• GMRT Version 2.6 April 2014
– MB Data from ~50 more cruises
– More Contributed grids (including LOS)
• Revise GMRT MapTool (web interface)
– more download format options
• Enhanced Web Services
– Gridded Content
– Attribution
• Enhanced Accessibility to Source Data
– DOIs on processed source data files
– Search & download multiple processed MB files
• GEBCO High-Res Effort
How can we optimize quality of the
data being preserved?
(Good data in = good data out)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
• Focus on Raw Underway Data
• Instruments permanently installed on ships
• Fleet-wide solution
~500 cruises/year
• Core Services
• Data documentation & preservation
• Programmatic Quality Assessment
• Navigation Products
• Event Logger
• Real-time MET/TSG
R2R Data Stewardship
MB Raw Data PreservationMB Raw Data Preservation
• 769 file sets
• 291,673 files
• ~ 7.6 TB
• from 671
cruises
as of Apr 15, 2014
R2R: Quality AssessmentR2R: Quality Assessment
• Programmatic post-cruise review of data
• Identify “suspicious” data
• Feedback to Operators
• Distributable Code
• Leverage existing tools where possible
(for MB data: MB-System)
• Customizable thresholds
• Generate QA Report
• Document QA procedures
• Provide info for downstream data use
R2R: QA DashboardR2R: QA Dashboard
• By Cruise
• By Ship
• By Instrument
• By Test
R2R: MB Quality AssessmentR2R: MB Quality Assessment
Lead: S. O’Hara (LDEO)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
Multibeam Advisory CommitteeMultibeam Advisory Committee
• Community of Stakeholders
• Fleet-wide Approach
– Best Practices
– Technical Resources
• Technical Teams
– Shipboard Acceptance
– Acoustic Noise
– Quality Assurance
• Help Desk
http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
MAC AccomplishmentsMAC Accomplishments
• Test Reports Gathered and Posted
• Tools
– SVP Editor Tool
– SVP Mission Planning Tool
• Best Practice Cookbooks
• Ship visits
– Acoustic Noise Testing
– Quality Assurance
– Sea Acceptance
• Assistance to Operators & Investigators
Reports from Technical TeamsReports from Technical Teams
Technical ResourcesTechnical Resources
GMRT, R2R MBQA & MAC
• How can we “lessen the burden”?
– Simple workflows, interfaces, & guidelines
• How can we engage the science community?
– High-value Content
– Reward (attribution, citation)
• How can we make data accessible to non-
specialists?
– Data synthesis
• How can we optimize data quality?
– Best practices at acquisition
Summary:
How to Navigate the
Data Life Cycle
• Know what resources are available
• Tools to make process easier
• Access existing Data
• Communicate
• Upstream
• Downstream (Data Managers)
• Plan ahead
• Document contemporaneously
• Treat data as a valuable community resource
• Participate! Input always needed for:
• Metadata & data format standards
• Usability of interfaces

Navigating the Marine Geophysical Data Life Cycle

  • 1.
    Navigating the MarineGeophysical Data Life Cycle: From Acquisition and Synthesis to Publication and Open Data Access Vicki Ferrini Lamont-Doherty Earth Observatory Columbia University
  • 2.
    Research InterestsResearch Interests •Mapping seafloor morphology to understand processes at a variety of scales – Coastal, deep sea, rivers, lakes • Techniques for remote seafloor characterization using multibeam sonar – Morphology – Backscatter intensity • Multibeam sonar data quality • Data preservation, integration and access
  • 3.
    Increasing Importance ofIncreasingImportance of Data ManagementData Management • Support science and discovery • Scientific reproducibility • Costs of acquisition • Optimizing operations • Increasing volumes of data • Data policies with increasing focus on data sharing • Data Syntheses • Data Publication
  • 4.
    How can we“lessen the burden” of data management for the science community?
  • 5.
    A community-based datafacility funded by NSF to support, sustain, and advance the geosciences by providing data services for observational solid earth data from the Ocean, Earth, and Polar Sciences. http://www.iedadata.org/ Integrated Earth Data Applications
  • 6.
    • Investigator-focused • Ensure‘Fitness for Re-use’ through data stewardship • Ensure professional data curation services – Long-term archiving & access – Persistent, unique identification – Discoverability (metadata registration) • Integrate with the ‘scholarly communication ecosystem’ Domain-specific Repositories & Services
  • 7.
    • Marine GeophysicalData • Bathymetry, sidescan, subbottom • Academic Seismic Facility (MCS, SCS) • Data from AUV, ROV, HOV, Ship • Complementary datasets • Navigation, bottom photos • Sample-based Data • Sample Registry (SESAR) • Geochemistry • Geochronology • Technical reports Data Curated in IEDA Systems
  • 9.
    Data Life Cycle:Plan • Data Management Plan Tool • Facilitate assembly • Inform Investigators • Inform down-stream repositories • Promote dialogue • Data Acquisition Plan • Metadata & data templates • Promote & facilitate contemporaneous documentation
  • 10.
    Data Life Cycle:Collect & Assure • Promote Best Practices • What to document • How to document • Tools and workflows to facilitate digital documentation • Metadata & Data Templates
  • 11.
    Data Life Cycle:Document & Preserve • Document & capture data & metadata as soon as it is available • Simple interfaces & guidelines • Sample metadata registry • Link to complementary data & metadata
  • 12.
    Data Life Cycle:Analyze • Tools to: • Support domain specialists • Make specialist data accessible to non-specialist users • Integrate & visualize data • Quantitative access to Data Syntheses • Access to complementary data & resources
  • 13.
    Data Life Cycle:Integrate & Share • Advise on what to preserve & how • Data supporting pubs • Data of value • Facilitate data prep. • Metadata requirements • Templates • Format guidelines
  • 14.
    Data Life Cycle:Document & Preserve Develop simple workflows, interfaces & templates to capture sufficient information for: • Long-term curation & access • Inclusion in syntheses • Links to scientific publications • Data Publication • Data use, discovery & re-use • Attribution & collaboration • Data Download Stats • Data Compliance Reporting
  • 15.
    Data Compliance ReportingTool • Tool for demonstrating compliance: • Award-based • Informed by DMP • Report includes: • Data Inventory • Data release Status • Links to data • Save as PDF ✔ http://www.iedadata.org/compliance/
  • 16.
    How do weengage the science community?
  • 17.
  • 18.
    MGDS Search &Data CatalogMGDS Search & Data Catalog • Text & Map-based Search • Rich metadata • Download data files • Proprietary Hold • Password Access • Attribution • Links to Refs • Data DOIs • Download Stats • Web Services
  • 19.
    MGDS Search &Data CatalogMGDS Search & Data Catalog
  • 26.
    Data files aregreat - if you know what to do with them… How can we make data quantitatively accessible to non-specialists?
  • 27.
    GeoMapApp • Free JavaDesktop App • Basic GIS functionality • Core functionality: • GMRT Basemap • Gridded & Tabular Data • Linked Views • Access online datasets (grids, shapefiles, tables) • Attribution & links to source data • Custom Portals • Underway Geophysics, MB Sonar, DSDP, PetDB… • Import & Export • Table, Image, Grid, KMZ http://www.geomapapp.org/
  • 30.
  • 31.
    GMRT Synthesis -Multi-resolution synthesis -Accessprovided to images & gridded compilation -9 resolution levels to 100 m -Dynamically maintained -Mask highlights hi-res data -Attribution to data sources & contributing scientists -Source Data Includes: -ASTER, NED, IBCAO, BEDMAP, Smith and Sandwell -Contributed grids -Swath data from > 700 cruises (public domain)
  • 32.
    • 1992 –Ridge Multibeam Synthesis Project • 2003 – Expanded to include US-funded data from Southern Ocean • 2004 – present -- Expanded to include public domain data from throughout global oceans • ongoing growth by ~80 cruises/yr • 2009 – G-cubed paper (Ryan et al., 2009) GMRTv2.6 ~780 cruises (April 2014) Global Multi-Resolution Topography http://gmrt.marine-geo.org http://gmrt.marine-geo.org/
  • 33.
    GMRT Components LDEO 100-m compilation*(raw & processed swath files in public domain) Contributed Grids (< 500 m res.) Global & Regional Grids (>= 500 m res.) e.g. GEBCO_08, IBCAO *LDEO team performs QC of ping files MB files metadata
  • 34.
    GMRT: Access • Images& gridded data • Desktop Apps • GeoMapApp • Virtual Ocean • Web App • GMRT MapTool • iPad/iPhone App • Earth Observer • Web Map Services • Images & Mask Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff…
  • 35.
    GMRT: MB DataReduction & SynthesisGMRT: MB Data Reduction & Synthesis • Bad navigation • Noisy outer beams • Attitude problems • Bad soundings • Instrument problems • Bad weather • Sound velocity • Slow speed in turns • Quality assessment for grid weighting and resolution
  • 36.
    Tracking and ManagingMB Content for GMRT MGDS Relational DB MB data files GMRT Access & Web services
  • 37.
    GMRT: Attribution &Access to Source Data
  • 38.
    GMRT: Attribution &Access to Source Data
  • 39.
    GMRT: Attribution &Access to Source Data
  • 43.
  • 44.
    GMRT: Next Steps •GMRT Version 2.6 April 2014 – MB Data from ~50 more cruises – More Contributed grids (including LOS) • Revise GMRT MapTool (web interface) – more download format options • Enhanced Web Services – Gridded Content – Attribution • Enhanced Accessibility to Source Data – DOIs on processed source data files – Search & download multiple processed MB files • GEBCO High-Res Effort
  • 45.
    How can weoptimize quality of the data being preserved? (Good data in = good data out)
  • 46.
    Complementary Fleet-Wide EffortsComplementaryFleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 47.
    • Focus onRaw Underway Data • Instruments permanently installed on ships • Fleet-wide solution ~500 cruises/year • Core Services • Data documentation & preservation • Programmatic Quality Assessment • Navigation Products • Event Logger • Real-time MET/TSG R2R Data Stewardship
  • 48.
    MB Raw DataPreservationMB Raw Data Preservation • 769 file sets • 291,673 files • ~ 7.6 TB • from 671 cruises as of Apr 15, 2014
  • 49.
    R2R: Quality AssessmentR2R:Quality Assessment • Programmatic post-cruise review of data • Identify “suspicious” data • Feedback to Operators • Distributable Code • Leverage existing tools where possible (for MB data: MB-System) • Customizable thresholds • Generate QA Report • Document QA procedures • Provide info for downstream data use
  • 50.
    R2R: QA DashboardR2R:QA Dashboard • By Cruise • By Ship • By Instrument • By Test
  • 51.
    R2R: MB QualityAssessmentR2R: MB Quality Assessment Lead: S. O’Hara (LDEO)
  • 52.
    Complementary Fleet-Wide EffortsComplementaryFleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 53.
    Multibeam Advisory CommitteeMultibeamAdvisory Committee • Community of Stakeholders • Fleet-wide Approach – Best Practices – Technical Resources • Technical Teams – Shipboard Acceptance – Acoustic Noise – Quality Assurance • Help Desk http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
  • 54.
    MAC AccomplishmentsMAC Accomplishments •Test Reports Gathered and Posted • Tools – SVP Editor Tool – SVP Mission Planning Tool • Best Practice Cookbooks • Ship visits – Acoustic Noise Testing – Quality Assurance – Sea Acceptance • Assistance to Operators & Investigators
  • 55.
    Reports from TechnicalTeamsReports from Technical Teams
  • 56.
  • 57.
  • 58.
    • How canwe “lessen the burden”? – Simple workflows, interfaces, & guidelines • How can we engage the science community? – High-value Content – Reward (attribution, citation) • How can we make data accessible to non- specialists? – Data synthesis • How can we optimize data quality? – Best practices at acquisition Summary:
  • 59.
    How to Navigatethe Data Life Cycle • Know what resources are available • Tools to make process easier • Access existing Data • Communicate • Upstream • Downstream (Data Managers) • Plan ahead • Document contemporaneously • Treat data as a valuable community resource • Participate! Input always needed for: • Metadata & data format standards • Usability of interfaces

Editor's Notes

  • #2 Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  • #32 Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  • #39 Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
  • #44 Copy right claims can not be made by data contributors for the synthesized products