SlideShare a Scribd company logo
1 of 27
Download to read offline
An Evaluation of Science Data
Formats and Their Use at the
Community Coordinated
Modeling Center
Marlo Maddox Code 587
Advanced Data Management &
Analysis Branch
HDF/HDF-EOS Workshop VII - Silver Spring, MD
September 23 – 25, 2003
The Community Coordinated Modeling Center

What the CCMC provides:
• Scientific validation
• Model coupling
• Metrics implementations
• Advanced visualization
• Model runs on request
Covering the Entire Domain
Space Weather Models

patch-panel architecture
Challenges
• No rules for standard model interfaces
• Each new model has unique output format
• Developer/user needs to become familiar with
internal structure of each output file
• Custom read routines to access model data
• Data is not self describing
• Reduces portability and reuse of
– Data output itself
– Tools created to analyze data
Every Models Output Is Unique
Environment Without Standard

• Specialized I/O
routines required for
every interface
• Unsuitable for use in
flexible model chain
• No commonality
between data passing
through interfaces

model 1
Storage

model 2

each models output in
different formats

model 3
model n

n x m interfaces required

n input modules
m Advanced
Visualization
Tools
Every Models Output Is Unique
Standardized Environment
•

•
•
•

model 1
model 2

data format
converter

•

Original output can be
preserved
Standard format for storage,
coupling, & visualization
Model developers continue to
have freedom of choice
Ensures compatibility between
models for coupling
Ground work for which
standard, reusable interfaces
and tools can be developed

Storage
each models output in
one standard format

model 3
model n

n + m interfaces required

one input module
m Advanced
Visualization
Tools
Model Selected for Testing
• Block-adaptive-tree-Solarwind-roe-upwind-scheme
( BATSRUS ) global magnetosphere MHD model
– Developed by CSEM at university of Michigan
– Uses MPI and Fortran 90 standard
– Executes on massively parallel computer systems
– Adaptive grid of blocks arranged in varying degrees of
spatial refinement levels
– Solves 3D MHD equations in finite volume form using
numerical methods related to roe’s approximate Riemann
solver
– Attached to an ionospheric potential solver that provides
electric potentials and conductances in the ionosphere
Understanding the BATSRUS
Models Output
General Scientific Output
• magnetospheric plasma parameters
–
–
–
–
–

Atomic mass unit density
Pressure
Velocity
Magnetic field
Electric currents

• ionospheric parameters
– Electric potential
– Hall and Pedersen conductances
BATSRUS .OUT File
byte
1
2
3
4
5

value

units

number of bytes n for next record

time step information

n bytes containing units for variables
n
n+1
n+2
n+3
n+4

R amu/cm3 km/s nT nPa J/m3 uA/m2

number of bytes n for previous record

dimension sizes
special parameters
data variables names
grid information
variable values
BATSRUS .OUT File
units
time step information
•general information
•static non-variant data

dimension sizes
special parameters
data variables names
grid information
variable values
BATSRUS .OUT File
4 byte record buffer
all x positions values
all y positions values
all z positions values
4 byte record buffer

units
time step information
dimension sizes
special parameters
data variables names
grid information
variable values
BATSRUS .OUT File
rho
ux
uy
uz
bx

4 byte record buffer

by
bz

all b1x values

b1x

4 byte record buffer

b1y
b1z
p
e
jx
jy
jz

units
time step information
dimension sizes
special parameters
data variables names
grid information
variable values
Designing the CDF
• CDF files have two main components
– Attributes – metadata describing contents of CDF
• Global – describe CDF as a whole
• Variable – describe specific characteristics of the variables

– Records – collections of variables
• Scalar
• Vector
• N-dimensional arrays ( where n <= 10 )

• Identify potential metadata ( or any static data ) from
original output file
• Include this data in the global attributes portion of the CDF
CDF Variables
• CDFs contain two types of variables
– rVariables – all have the same dimensionality
– zVariables – can each have different
dimensionalities

• CDF Dimensionality
– a variable with one dimension is like an array
– number of elements in array correspond to the
dimension size
CCMC CDF Variables
rho

x
y
z

ux
uy
uz
bx
by
bz
b1x
b1y
b1z
p
e
jx
jy
jz

• BATSRUS model contains 18
dynamic variables
– 3 position variables
– 15 plot variables

• 18 CDF rVariables
–
–
–
–

one record per variable
one dimensional variables
dimension size = number of cells in grid
18 records vs. 10.4 million in previous
scheme
BATRUS .OUT to CDF
first column indicates
current record number

column two references the current records element
index – each element of the record stores a value for
the current variable

1:[1] = -251.0

1:[9] = -235.0

1:[17] = -219.0

1:[1283401] = -251.0

1:[2] = -243.0

1:[10] = -227.0

1:[18] = -211.0

1:[1283402] = -243.0

1:[3] = -235.0

1:[11] = -219.0

1:[19] = -251.0

1:[1283403] = -235.0

1:[4] = -227.0

1:[12] = -211.0

1:[20] = -243.0

1:[1283404] = -227.0

1:[5] = -219.0

1:[13] = -251.0

1:[21] = -235.0

1:[1283405] = -219.0

1:[6] = -211.0

1:[14] = -243.0

1:[22] = -227.0

1:[1283406] = -211.0

1:[7] = -251.0

1:[15] = -235.0

1:[23] = -219.0

1:[1283407] = -251.0

1:[8] = -243.0

1:[16] = -227.0

1:[24] = -211.0

1:[1283408] = -243.0
CDF Attributes
!
!
!
!

Skeleton table for the "bats_2_cdf_OUTPUT.cdf" CDF.
Generated: Monday, 22-Sep-2003 17:06:08
CDF created/modified by CDF V2.7.1
Skeleton table created by CDF V2.7.1

#header
CDF NAME:
DATA ENCODING:
MAJORITY:
FORMAT:
! Variables
! --------18/0

G.Attributes
-----------22

bats_2_cdf_OUTPUT.cdf
NETWORK
ROW
SINGLE

V.Attributes
-----------4

Records
------1/z

Dims
---1

Sizes
------1293408

#GLOBALattributes
! Attribute
! Name
! ---------

Entry
Number
------

Data
Type
----

"Project"

1:

CDF_CHAR

{ "CCMC" } .

"Disclamer"

1:

CDF_CHAR

{ "INSERT TERMS OF USAGE HERE" } .

"Generated_By"

1:

CDF_CHAR

{ "Marlo Maddox" } .

"Generation_Date"

1:

CDF_CHAR

{ "3/27/2003" } .

1:

CDF_CHAR

{ "BATSRUS" } .

Value
-----

"Simulation_Model"
CDF Attributes
"Elapsed_Time_In_Seconds"
1:

CDF_FLOAT

{ 4200.16 } .

CDF_INT4

{ -3 } .

"Number_Of_Special_Parameters"
1:
CDF_INT4

{ 10 } .

"Number_Of_Dimensions"
1:

"Special_Parameters"
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT
CDF_FLOAT

{
{
{
{
{
{
{
{
{
{

CDF_INT4

{ 15 } .

1:

CDF_INT4

{ 1293408 } .

1:

CDF_INT4

{ 1 } .

1:

CDF_INT4

{ 1 } .

"Current_Iteration_Step"
1:

CDF_INT4

{ 22924 } .

"Number_Of_Plot_Variables"
1:

1.66667 }
2248.43 }
-0.368162 }
3.0 }
1.0 }
1.0 }
3.0 }
6.0 }
6.0 }
6.0 } .

"X_Dimension_Size"

"Y_Dimension_Size"

"Z_Dimension_Size"
CDF Variables
#variables
! Variable
! Name
! -------"x"
! Attribute
! Name
! --------

Data
Type
----

Number
Elements
--------

CDF_FLOAT
Data
Type
----

1

Record
Variance
-------T

Dimension
Variances
--------T

Value
-----

"Description"
CDF_CHAR
"Dictionary_Key"
CDF_CHAR
"Valid_Min"
CDF_FLOAT
"Valid_Max"
CDF_FLOAT

{ "X position for center of cell in grid..." }
{ "CCMC/SWMF Data Dictionary Entry" }
{ -100000.0 }
{ 100000.0 } .

! RV values were not requested.
Compression Performance Tests
BATSRUS .OUT to .CDF Conversion - Com pression Results
100

File Size ( MB )

95
90
85
80
75
70
65
60
original .OUT
file

no
com pression

Run-Length
Encoding

Huffm an

Compression Algorithm

Adaptive
Huffm an

GZIP level 7
Compression Performance Tests
BATSRUS .OUT to .CDF Conversion - Wall Clock Time Results
300

time ( seconds )

250
200
150
100
50
0
no com pression

Run-Length
Encoding

Huffm an

Com pression Algorithm

Adaptive
Huffm an

GZIP level 7
(original_file_size)

Performance Score

(cdf_file_size)

1

*

t

BATSRUS .OUT to CDF Conversion Performance Scores
0.7
0.6

0.653498023

0.5
0.4
0.3
0.2
0.1
0.01177182

0
NoCompression

Run Length
Encoding

0.006078335

Huffman

Compression Algorithm

0.005258168

Adaptive
Huffman

0.013757529

GZIP Level 7
Performance Results
• Optimal CDF storage format
– Single one-record rVariables
– Dimension size equal to number of cells in grid

• Uncompressed CDF creation time of 1.5 seconds
• CDF file size virtually the same as original
BATSRUS output file size
• Method could be applied to additional models in
similar fashion
Conclusion
• BATRUS .Out to CDF conversion results promising
– 1.5 second uncompressed CDF creation time
– Resulting file size virtually unchanged

• OpenDx successfully imported CDF data using
standard input module (only had to specify input file name)
– Requires minimal initial development to correctly
categorize imported data

• Closer to establishing a data format standard within
the CCMC
Future Work
• Research HDF 5 data standard
• Test BATRUS output conversion
performance with HDF 5
• Compare CDF vs. HDF 5 performance
• Propose use of either or both
• Develop standard naming conventions for
variables ( similar to ISTP program )
generic attributes list (.h)

model specific
attributes list (.h)

Conversion Software Architecture

global/file
attributes

main
conversion
routine

generic/default
variable attributes list (.h)

model specific
variable attributes list (.h)

Model Variable List

variable
attributes

main read driver
read model a routine
read model b routine
read model n routine

assembled
standard model
components

MAP

main write driver
convert to cdf
convert to hdf5

Registered Variables List
CCMC_name
x
y

native/alias
x_pos, xp
y_pos, yp

variable
names

standard data file with
common attributes and
variable names for each
registered model

More Related Content

Viewers also liked

行销管理,S3 ac4(改1)
行销管理,S3 ac4(改1)行销管理,S3 ac4(改1)
行销管理,S3 ac4(改1)
saysiewnoi
 
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
Imagem_Oficial
 
Evolución del computador
Evolución del computadorEvolución del computador
Evolución del computador
aleja1516
 

Viewers also liked (14)

Health Datapalooza 2013: Data Rich, Data Poor - Mark Headd
Health Datapalooza 2013: Data Rich, Data Poor - Mark HeaddHealth Datapalooza 2013: Data Rich, Data Poor - Mark Headd
Health Datapalooza 2013: Data Rich, Data Poor - Mark Headd
 
Analysis of FILEZILLA
Analysis of FILEZILLA Analysis of FILEZILLA
Analysis of FILEZILLA
 
Nov 2011 HUG: HParser
Nov 2011 HUG: HParserNov 2011 HUG: HParser
Nov 2011 HUG: HParser
 
Nbn——>cloud computing
Nbn——>cloud computingNbn——>cloud computing
Nbn——>cloud computing
 
MyCT 4th Issue 2014
MyCT 4th Issue 2014MyCT 4th Issue 2014
MyCT 4th Issue 2014
 
JvR Sampler 2012
JvR Sampler 2012JvR Sampler 2012
JvR Sampler 2012
 
2014 Sycamore Brochure
2014 Sycamore Brochure2014 Sycamore Brochure
2014 Sycamore Brochure
 
行销管理,S3 ac4(改1)
行销管理,S3 ac4(改1)行销管理,S3 ac4(改1)
行销管理,S3 ac4(改1)
 
高性能数据库
高性能数据库高性能数据库
高性能数据库
 
MITE6323 Raymond+Joe Group Project
MITE6323 Raymond+Joe Group ProjectMITE6323 Raymond+Joe Group Project
MITE6323 Raymond+Joe Group Project
 
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
Eu Esri 2011 - Escola Monteiro Lobato (Teresa Cristina)
 
Evolución del computador
Evolución del computadorEvolución del computador
Evolución del computador
 
Activity 1
Activity 1Activity 1
Activity 1
 
2 04 magali garcia
2 04 magali garcia2 04 magali garcia
2 04 magali garcia
 

Similar to An Evaluation of Science Data Formats and Their Use at the Community Coordinated Modeling Center

feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2
Ani Sridhar
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SnehaLatha68
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
pepeborja
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
MongoDB
 

Similar to An Evaluation of Science Data Formats and Their Use at the Community Coordinated Modeling Center (20)

MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2
 
4 lockheed
4 lockheed4 lockheed
4 lockheed
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image Compression
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
 
IBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOMEIBM and ASTRON 64bit μServer for DOME
IBM and ASTRON 64bit μServer for DOME
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Esd module1
Esd module1Esd module1
Esd module1
 
CADD meeting 08-30-2016
CADD meeting 08-30-2016CADD meeting 08-30-2016
CADD meeting 08-30-2016
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
A Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model CheckingA Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model Checking
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 

More from The HDF-EOS Tools and Information Center

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

An Evaluation of Science Data Formats and Their Use at the Community Coordinated Modeling Center

  • 1. An Evaluation of Science Data Formats and Their Use at the Community Coordinated Modeling Center Marlo Maddox Code 587 Advanced Data Management & Analysis Branch HDF/HDF-EOS Workshop VII - Silver Spring, MD September 23 – 25, 2003
  • 2. The Community Coordinated Modeling Center What the CCMC provides: • Scientific validation • Model coupling • Metrics implementations • Advanced visualization • Model runs on request
  • 5. Challenges • No rules for standard model interfaces • Each new model has unique output format • Developer/user needs to become familiar with internal structure of each output file • Custom read routines to access model data • Data is not self describing • Reduces portability and reuse of – Data output itself – Tools created to analyze data
  • 6. Every Models Output Is Unique Environment Without Standard • Specialized I/O routines required for every interface • Unsuitable for use in flexible model chain • No commonality between data passing through interfaces model 1 Storage model 2 each models output in different formats model 3 model n n x m interfaces required n input modules m Advanced Visualization Tools
  • 7. Every Models Output Is Unique Standardized Environment • • • • model 1 model 2 data format converter • Original output can be preserved Standard format for storage, coupling, & visualization Model developers continue to have freedom of choice Ensures compatibility between models for coupling Ground work for which standard, reusable interfaces and tools can be developed Storage each models output in one standard format model 3 model n n + m interfaces required one input module m Advanced Visualization Tools
  • 8. Model Selected for Testing • Block-adaptive-tree-Solarwind-roe-upwind-scheme ( BATSRUS ) global magnetosphere MHD model – Developed by CSEM at university of Michigan – Uses MPI and Fortran 90 standard – Executes on massively parallel computer systems – Adaptive grid of blocks arranged in varying degrees of spatial refinement levels – Solves 3D MHD equations in finite volume form using numerical methods related to roe’s approximate Riemann solver – Attached to an ionospheric potential solver that provides electric potentials and conductances in the ionosphere
  • 9. Understanding the BATSRUS Models Output General Scientific Output • magnetospheric plasma parameters – – – – – Atomic mass unit density Pressure Velocity Magnetic field Electric currents • ionospheric parameters – Electric potential – Hall and Pedersen conductances
  • 10. BATSRUS .OUT File byte 1 2 3 4 5 value units number of bytes n for next record time step information n bytes containing units for variables n n+1 n+2 n+3 n+4 R amu/cm3 km/s nT nPa J/m3 uA/m2 number of bytes n for previous record dimension sizes special parameters data variables names grid information variable values
  • 11. BATSRUS .OUT File units time step information •general information •static non-variant data dimension sizes special parameters data variables names grid information variable values
  • 12. BATSRUS .OUT File 4 byte record buffer all x positions values all y positions values all z positions values 4 byte record buffer units time step information dimension sizes special parameters data variables names grid information variable values
  • 13. BATSRUS .OUT File rho ux uy uz bx 4 byte record buffer by bz all b1x values b1x 4 byte record buffer b1y b1z p e jx jy jz units time step information dimension sizes special parameters data variables names grid information variable values
  • 14. Designing the CDF • CDF files have two main components – Attributes – metadata describing contents of CDF • Global – describe CDF as a whole • Variable – describe specific characteristics of the variables – Records – collections of variables • Scalar • Vector • N-dimensional arrays ( where n <= 10 ) • Identify potential metadata ( or any static data ) from original output file • Include this data in the global attributes portion of the CDF
  • 15. CDF Variables • CDFs contain two types of variables – rVariables – all have the same dimensionality – zVariables – can each have different dimensionalities • CDF Dimensionality – a variable with one dimension is like an array – number of elements in array correspond to the dimension size
  • 16. CCMC CDF Variables rho x y z ux uy uz bx by bz b1x b1y b1z p e jx jy jz • BATSRUS model contains 18 dynamic variables – 3 position variables – 15 plot variables • 18 CDF rVariables – – – – one record per variable one dimensional variables dimension size = number of cells in grid 18 records vs. 10.4 million in previous scheme
  • 17. BATRUS .OUT to CDF first column indicates current record number column two references the current records element index – each element of the record stores a value for the current variable 1:[1] = -251.0 1:[9] = -235.0 1:[17] = -219.0 1:[1283401] = -251.0 1:[2] = -243.0 1:[10] = -227.0 1:[18] = -211.0 1:[1283402] = -243.0 1:[3] = -235.0 1:[11] = -219.0 1:[19] = -251.0 1:[1283403] = -235.0 1:[4] = -227.0 1:[12] = -211.0 1:[20] = -243.0 1:[1283404] = -227.0 1:[5] = -219.0 1:[13] = -251.0 1:[21] = -235.0 1:[1283405] = -219.0 1:[6] = -211.0 1:[14] = -243.0 1:[22] = -227.0 1:[1283406] = -211.0 1:[7] = -251.0 1:[15] = -235.0 1:[23] = -219.0 1:[1283407] = -251.0 1:[8] = -243.0 1:[16] = -227.0 1:[24] = -211.0 1:[1283408] = -243.0
  • 18. CDF Attributes ! ! ! ! Skeleton table for the "bats_2_cdf_OUTPUT.cdf" CDF. Generated: Monday, 22-Sep-2003 17:06:08 CDF created/modified by CDF V2.7.1 Skeleton table created by CDF V2.7.1 #header CDF NAME: DATA ENCODING: MAJORITY: FORMAT: ! Variables ! --------18/0 G.Attributes -----------22 bats_2_cdf_OUTPUT.cdf NETWORK ROW SINGLE V.Attributes -----------4 Records ------1/z Dims ---1 Sizes ------1293408 #GLOBALattributes ! Attribute ! Name ! --------- Entry Number ------ Data Type ---- "Project" 1: CDF_CHAR { "CCMC" } . "Disclamer" 1: CDF_CHAR { "INSERT TERMS OF USAGE HERE" } . "Generated_By" 1: CDF_CHAR { "Marlo Maddox" } . "Generation_Date" 1: CDF_CHAR { "3/27/2003" } . 1: CDF_CHAR { "BATSRUS" } . Value ----- "Simulation_Model"
  • 19. CDF Attributes "Elapsed_Time_In_Seconds" 1: CDF_FLOAT { 4200.16 } . CDF_INT4 { -3 } . "Number_Of_Special_Parameters" 1: CDF_INT4 { 10 } . "Number_Of_Dimensions" 1: "Special_Parameters" 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT CDF_FLOAT { { { { { { { { { { CDF_INT4 { 15 } . 1: CDF_INT4 { 1293408 } . 1: CDF_INT4 { 1 } . 1: CDF_INT4 { 1 } . "Current_Iteration_Step" 1: CDF_INT4 { 22924 } . "Number_Of_Plot_Variables" 1: 1.66667 } 2248.43 } -0.368162 } 3.0 } 1.0 } 1.0 } 3.0 } 6.0 } 6.0 } 6.0 } . "X_Dimension_Size" "Y_Dimension_Size" "Z_Dimension_Size"
  • 20. CDF Variables #variables ! Variable ! Name ! -------"x" ! Attribute ! Name ! -------- Data Type ---- Number Elements -------- CDF_FLOAT Data Type ---- 1 Record Variance -------T Dimension Variances --------T Value ----- "Description" CDF_CHAR "Dictionary_Key" CDF_CHAR "Valid_Min" CDF_FLOAT "Valid_Max" CDF_FLOAT { "X position for center of cell in grid..." } { "CCMC/SWMF Data Dictionary Entry" } { -100000.0 } { 100000.0 } . ! RV values were not requested.
  • 21. Compression Performance Tests BATSRUS .OUT to .CDF Conversion - Com pression Results 100 File Size ( MB ) 95 90 85 80 75 70 65 60 original .OUT file no com pression Run-Length Encoding Huffm an Compression Algorithm Adaptive Huffm an GZIP level 7
  • 22. Compression Performance Tests BATSRUS .OUT to .CDF Conversion - Wall Clock Time Results 300 time ( seconds ) 250 200 150 100 50 0 no com pression Run-Length Encoding Huffm an Com pression Algorithm Adaptive Huffm an GZIP level 7
  • 23. (original_file_size) Performance Score (cdf_file_size) 1 * t BATSRUS .OUT to CDF Conversion Performance Scores 0.7 0.6 0.653498023 0.5 0.4 0.3 0.2 0.1 0.01177182 0 NoCompression Run Length Encoding 0.006078335 Huffman Compression Algorithm 0.005258168 Adaptive Huffman 0.013757529 GZIP Level 7
  • 24. Performance Results • Optimal CDF storage format – Single one-record rVariables – Dimension size equal to number of cells in grid • Uncompressed CDF creation time of 1.5 seconds • CDF file size virtually the same as original BATSRUS output file size • Method could be applied to additional models in similar fashion
  • 25. Conclusion • BATRUS .Out to CDF conversion results promising – 1.5 second uncompressed CDF creation time – Resulting file size virtually unchanged • OpenDx successfully imported CDF data using standard input module (only had to specify input file name) – Requires minimal initial development to correctly categorize imported data • Closer to establishing a data format standard within the CCMC
  • 26. Future Work • Research HDF 5 data standard • Test BATRUS output conversion performance with HDF 5 • Compare CDF vs. HDF 5 performance • Propose use of either or both • Develop standard naming conventions for variables ( similar to ISTP program )
  • 27. generic attributes list (.h) model specific attributes list (.h) Conversion Software Architecture global/file attributes main conversion routine generic/default variable attributes list (.h) model specific variable attributes list (.h) Model Variable List variable attributes main read driver read model a routine read model b routine read model n routine assembled standard model components MAP main write driver convert to cdf convert to hdf5 Registered Variables List CCMC_name x y native/alias x_pos, xp y_pos, yp variable names standard data file with common attributes and variable names for each registered model