© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
NEON HDF5
eddy4R-Docker-HDF5 team (IPT-EC): David Durden, Stefan Metzger, Andy Fox, Greg
Holling, Hongyan Luo, Natchaya Pingintha-Durden, Cove Sturtevant, David Weinstein
Date: 7/19/2016
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
The National Ecological Observatory Network
2
8/1/2016
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
1. To implement a fast and efficient file format for NEON data
 HDF5 file format provides high compressibility and fast efficient reading and
writing of large amounts of data
2. Develop a standardized delivery structure for NEON data
 Structured files centered around the NEON data product numbering makes it
an intuitive way to explore larger data files with interdependent data sets
3. Provide metadata with NEON data
 HDF5 attributes are a concise way to package metadata with our NEON data
Goals
3
7/19/2016
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
TIS example (Large datasets)
4
storage exchange assembly turbulent exchange assembly
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
eddy-covariance in the CI workflow
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
CI workflow
Docker image
containing
eddy4R
packages
L0 data
ParaCal
ParaEnv
ParaSite
L0p HDF5
“turbulence”
ParaProc
L0p HDF5
“storage”
Docker
container
“turbulence”
node t1
node tN
⁞
node t2
Docker
container
“storage”
node s1
node sN
⁞
node s2
Docker
container
“derived”
node d1
node dN
⁞
node d2
Data Portal
L1 – L4
HDF5 files
L1 – L4
HDF5 files
L1 – L4
HDF5 files
ingest L0
pre-
condition
L0p EC-TE
generate
HDF5 files
generate,
deploy,
control
lower-
level
instruction
ParaSens
pre-
condition
L0p EC-SE
instructions
“derived”
instructions
“storage”
instructions
“turbulence”
eddy4R-Docker-HDF5 workflow
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
NEON Data Product Naming Convention
7
NEON.DOM.SITE.DPL.PRNUM.REV.TERMS.HOR.VER.TMI
WHERE:
NEON=NEON
DOM=DOMAIN, e.g. D10
SITE=SITE, e.g. STER
DPL=DATA PRODUCT LEVEL, e.g. DP1
PRNUM = PRODUCT NUMBER =>5 digit number. Set in data products catalog.
TIS = 00000-09999
REV = REVISION, e.g 001.
TERMS=From NEON’s controlled list of terms. Index is unique across products.
HOR = HORIZONTAL INDEX. Semi-controlled; AIS and TIS use different rules.
Examples: Tower=000, Hut = 700, DFIR=900.
VER = VERTICAL INDEX. Semi-controlled; AIS and TIS use different rules.
Examples: Ground level=000, second tower level=020.
TMI=TEMPORAL INDEX. Examples: 001=1 minute, 030=30 minute, 999=irregular
intervals.
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
NEON HDF5 file structure
Collocating NEON’s long-term atmospheric measurements
and field observations
8
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
Example File
Collocating NEON’s long-term atmospheric measurements
and field observations
9
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
CI workflow
Docker image
containing
eddy4R
packages
L0 data
ParaCal
ParaEnv
ParaSite
L0p HDF5
“turbulence”
ParaProc
L0p HDF5
“storage”
Docker
container
“turbulence”
node t1
node tN
⁞
node t2
Docker
container
“storage”
node s1
node sN
⁞
node s2
Docker
container
“derived”
node d1
node dN
⁞
node d2
Data Portal
L1 – L4
HDF5 files
L1 – L4
HDF5 files
L1 – L4
HDF5 files
ingest L0
pre-
condition
L0p EC-TE
generate
HDF5 files
generate,
deploy,
control
lower-
level
instruction
ParaSens
pre-
condition
L0p EC-SE
instructions
“derived”
instructions
“storage”
instructions
“turbulence”
Metadata in HDF5
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
Metadata
Collocating NEON’s long-term atmospheric measurements
and field observations
11
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
NEON’s first fluxes from SERC!
Timeframe:
4/22/2016 -5/03/2016
File size for 1 day (4/22/2016):
Compressed = 398 MB
Uncompressed = 1.84 GB
Data Compression Ratio ~ 4.5:1
Metadata: Units and variable names
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
• test datasets approximated 1 day of L0p IRGA data
 “compound”: single dataset with each row having many numeric float values
and a single string value
 “simple”: one dataset with each row having many numeric float values,
second dataset with each row having a single string value
Performance testing
138/1/2016
Compressed Non-compressed
Read 45 secs 4.25 secs
Write 621 secs 11.25 secs
Size 78 MB 266 MB
Results for COMPOUND dataset are:
Compressed Non-compressed
Read 1.45 secs 0.75 secs
Write 21.45 secs 4 secs
Size 21 MB 266 MB
Results for SIMPLE dataset are:
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
• Implement R code into the eddy4R package to produce NEON
formatted HDF5 files
 Development is currently on Github, if interested you can join our
development efforts by signing up for one of our working groups
• Easy way to imbed EML (Ecological Metadata Language) tags into
HDF5?
 There is an ISO tag solution, but not anything for EML
Future work
14
8/1/2016
© 2012 National Ecological Observatory Network. ALL RIGHTS RESERVED.
720.746.4844 | neonscience@BattelleEcology.org | www.battelle.org/neon
Collocating NEON’s long-term 15

NEON HDF5

  • 1.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. NEON HDF5 eddy4R-Docker-HDF5 team (IPT-EC): David Durden, Stefan Metzger, Andy Fox, Greg Holling, Hongyan Luo, Natchaya Pingintha-Durden, Cove Sturtevant, David Weinstein Date: 7/19/2016
  • 2.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. The National Ecological Observatory Network 2 8/1/2016
  • 3.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. 1. To implement a fast and efficient file format for NEON data  HDF5 file format provides high compressibility and fast efficient reading and writing of large amounts of data 2. Develop a standardized delivery structure for NEON data  Structured files centered around the NEON data product numbering makes it an intuitive way to explore larger data files with interdependent data sets 3. Provide metadata with NEON data  HDF5 attributes are a concise way to package metadata with our NEON data Goals 3 7/19/2016
  • 4.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. TIS example (Large datasets) 4 storage exchange assembly turbulent exchange assembly
  • 5.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. eddy-covariance in the CI workflow
  • 6.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. CI workflow Docker image containing eddy4R packages L0 data ParaCal ParaEnv ParaSite L0p HDF5 “turbulence” ParaProc L0p HDF5 “storage” Docker container “turbulence” node t1 node tN ⁞ node t2 Docker container “storage” node s1 node sN ⁞ node s2 Docker container “derived” node d1 node dN ⁞ node d2 Data Portal L1 – L4 HDF5 files L1 – L4 HDF5 files L1 – L4 HDF5 files ingest L0 pre- condition L0p EC-TE generate HDF5 files generate, deploy, control lower- level instruction ParaSens pre- condition L0p EC-SE instructions “derived” instructions “storage” instructions “turbulence” eddy4R-Docker-HDF5 workflow
  • 7.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. NEON Data Product Naming Convention 7 NEON.DOM.SITE.DPL.PRNUM.REV.TERMS.HOR.VER.TMI WHERE: NEON=NEON DOM=DOMAIN, e.g. D10 SITE=SITE, e.g. STER DPL=DATA PRODUCT LEVEL, e.g. DP1 PRNUM = PRODUCT NUMBER =>5 digit number. Set in data products catalog. TIS = 00000-09999 REV = REVISION, e.g 001. TERMS=From NEON’s controlled list of terms. Index is unique across products. HOR = HORIZONTAL INDEX. Semi-controlled; AIS and TIS use different rules. Examples: Tower=000, Hut = 700, DFIR=900. VER = VERTICAL INDEX. Semi-controlled; AIS and TIS use different rules. Examples: Ground level=000, second tower level=020. TMI=TEMPORAL INDEX. Examples: 001=1 minute, 030=30 minute, 999=irregular intervals.
  • 8.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. NEON HDF5 file structure Collocating NEON’s long-term atmospheric measurements and field observations 8
  • 9.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. Example File Collocating NEON’s long-term atmospheric measurements and field observations 9
  • 10.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. CI workflow Docker image containing eddy4R packages L0 data ParaCal ParaEnv ParaSite L0p HDF5 “turbulence” ParaProc L0p HDF5 “storage” Docker container “turbulence” node t1 node tN ⁞ node t2 Docker container “storage” node s1 node sN ⁞ node s2 Docker container “derived” node d1 node dN ⁞ node d2 Data Portal L1 – L4 HDF5 files L1 – L4 HDF5 files L1 – L4 HDF5 files ingest L0 pre- condition L0p EC-TE generate HDF5 files generate, deploy, control lower- level instruction ParaSens pre- condition L0p EC-SE instructions “derived” instructions “storage” instructions “turbulence” Metadata in HDF5
  • 11.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. Metadata Collocating NEON’s long-term atmospheric measurements and field observations 11
  • 12.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. NEON’s first fluxes from SERC! Timeframe: 4/22/2016 -5/03/2016 File size for 1 day (4/22/2016): Compressed = 398 MB Uncompressed = 1.84 GB Data Compression Ratio ~ 4.5:1 Metadata: Units and variable names
  • 13.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. • test datasets approximated 1 day of L0p IRGA data  “compound”: single dataset with each row having many numeric float values and a single string value  “simple”: one dataset with each row having many numeric float values, second dataset with each row having a single string value Performance testing 138/1/2016 Compressed Non-compressed Read 45 secs 4.25 secs Write 621 secs 11.25 secs Size 78 MB 266 MB Results for COMPOUND dataset are: Compressed Non-compressed Read 1.45 secs 0.75 secs Write 21.45 secs 4 secs Size 21 MB 266 MB Results for SIMPLE dataset are:
  • 14.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. • Implement R code into the eddy4R package to produce NEON formatted HDF5 files  Development is currently on Github, if interested you can join our development efforts by signing up for one of our working groups • Easy way to imbed EML (Ecological Metadata Language) tags into HDF5?  There is an ISO tag solution, but not anything for EML Future work 14 8/1/2016
  • 15.
    © 2012 NationalEcological Observatory Network. ALL RIGHTS RESERVED. 720.746.4844 | neonscience@BattelleEcology.org | www.battelle.org/neon Collocating NEON’s long-term 15