The document discusses different file formats used for ocean observation data, from simple text files to more complex digital formats. It describes the evolution from unformatted text to structured files like CSV, spreadsheets, databases and finally specialized formats like NetCDF and HDF that are better suited for multi-dimensional scientific data. These standard formats help make the data discoverable and accessible via servers according to DMAC rules. The CariCOOS program uses these formats and servers to manage data from its observing assets and numerical models and share it with partners and the public.
2. Evolution of files
Text(books, logs) -> ASSEMBLY LANGUAGE ->
ascii -> Unicode -> csv -> spreadsheet-database-
relational database(digital) -> digital file
formats.
Simple text is not formatted, ascii allows to
write with symbols and punctuation.
Ex: Notepad
CSV is the beginning of formatting regular text
into tables to make sense of data.
Spreadsheet allows formatting, computations
and graphs creation to visually understand
data, then it can be called information used on
decision making.
Complex data may be inserted into databases
or better yet, relational databases but not
always works.
Some Text Formats:
• Ascii
• Unicode with any variant
UTF-8 (bits). UTF-16 and
UTF-32.
• There are also ISO, but
that is industry based.
3. Where does DATA come from?
Today, sensors used on any equipment translate
electric impulses into numbers and save them into
files. For scientific data, a different structure than CSV
must be used. Since it is difficult for human clients and
computer clients to deal with a complex set of possible
dataset structures, ERDDAP Server uses just two basic
data structures:
• Gridded data structure (for example, for satellite
data and model data) Ex:HDF,NetCDF,Grid2.
• Tabular data structure (for example, for in-situ
buoy, station, and trajectory data). Ex: CSV.
MODIS Satellite
Gomoos Buoy
4. File Formats: From CSV to Matrix
Why migrate from a simple csv file to a matrix
based and even digital formatted file?
Because on the long run its easier and
sustainable. Makes sense!
CSV= text based matrix
Kml, shapefile=gridded+layered data
Jpg, PNG, Tiff=Digital photo matrix-image
Netcdf, HDF, Grid2= Multi-dimensional matrix
for gridded data and metadata.
5. CariCOOS maintains a network of coastal buoys, meteo stations and HF Radar
observing assets that provide a constant stream of wind, wave and marine
currents information to our stakeholders. These data undergo a process of
quality assurance and control before their release to the public. Graphical
representations are then created from the various data streams to assess
changing conditions in our coastal environment. Our partners also supply
satellite data which are made available in suitable graphical formats. Finally the
numerical data are made available in various standard data formats and
services. This data may be used as input in Forecast and Ocean Modesl.
Caricoos Assets
6. From sensors to Models
Global Modeling display
global changes and forecast
but is not sufficient for
accurate regional Forecast,
therefore is used as input and
tailored to local reality.
NCOM AMSEAS
7. CariCOOS has evolved from a few workstations/servers to
include a blade cluster. We have acquired and maintained
servers to run the following CPU hungry Models:
HYCOM, HYCOM-ROMS, ROMS, WRF, SWAN.
Models generate Gigs or data on a daily basis, even
Terabytes. Data Server and NAS have been acquired to save
all this new data to be analyzed or used in future research.
WRF used to take 7 hours, now takes 4 hour*.
Other software are: Matlab, ArcGIS, SMS,Bous2D.
High Performance Computing
9. DMAC
Standard Format:
Netcdf, hdf, grid2,
others.
Discoverable:
Thredds Server
ERDDAP Server
Data Management and Communications (DMAC) is a set of rules
required to make the regional global ocean and coastal observation
data discoverable and accessible in a standard format thus supporting
improved awareness, understanding and forecasting of coastal events.
CariCOOS as part of IOOS shares the responsibility complying with
particular DMAC requirements. CariCOOS manages two sets of data
streams; from observing assets, including buoy & mesonet and derived
from forecast supporting numerical.
10. NetCDF CF Metadata Conventions
The conventions for CF (Climate and Forecast) metadata are designed
to promote the processing and sharing of files created with the
NetCDF API. The CF conventions are increasingly gaining acceptance
and have been adopted by a number of projects and groups as a
primary standard. The conventions define metadata that provide a
definitive description of what the data in each variable represents, and
the spatial and temporal properties of the data. This enables users of
data from different sources to decide which quantities are comparable,
and facilitates building applications with powerful extraction,
regridding, and display capabilities.
File Structure (Standard Structure
with CF)