SlideShare a Scribd company logo
Storing High-Energy Physics data in DAOS
Javier López Gómez – CERN fellow
<javier.lopez.gomez@cern.ch>
DUG ’20, 19th November 2020
ROOT project,
EP-SFT (SoFTware Development for Experiments),
CERN
http://root.cern/
ContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContents
1 Introduction
2 RNTuple 101
3 RNTuple DAOS backend
4 First evaluation
5 Conclusions
1/15
Introduction
High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)
High-Energy Physics studies laws governing our universe at the smallest
scale: fundamental particles, forces and its carriers, mass, etc. The
“Standard model” describes these particles/interactions.
CERN experiments observe particle interactions (typically by colliding
particles at high-energies).
HEP data = detector observations.
2/15
Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)
Figure 1: Graphical representation of a CMS event.1
LHC collides protons that move in opposite directions. Detectors are
similar to a 100 MP camera taking a picture every 25 ns.
109
collisions/sec generating ∼ 10 TB/s.
Processing:
- Online: filtering step. Part of the detector read-out.
- Offline: distributed; disk storage at different LHC compute centers around
the globe.
1
http://opendata.cern.ch/visualise/events/cms 3/15
ROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT project
ROOT: open-source data analysis framework written in C++. Provides C++
interpretation, object serialization (I/O), statistics, graphics, and much
more.
PyROOT provides dynamic C++ ↔ Python bindings.
ROOT I/O: row-wise/column-wise storage of C++ objects.
4/15
TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple
HEP data analysis often only requires
access to a subset of the properties of
each event.
Row-wise storage is inefficient. TTree
organizes the dataset in columns that
contain any type of C++ object.
1+ EB of HEP data stored in TTree ROOT
files.
TTree has been there for 25 years.
RNTuple is the R&D project to replace
TTree for the next 30 years.
Object stores are first-class.
x y z mass
...
...
...
...
0.423 1.123 3.744 23.1413
...
...
...
...
...
...
...
...
5/15
TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple
HEP data analysis often only requires
access to a subset of the properties of
each event.
Row-wise storage is inefficient. TTree
organizes the dataset in columns that
contain any type of C++ object.
1+ EB of HEP data stored in TTree ROOT
files.
TTree has been there for 25 years.
RNTuple is the R&D project to replace
TTree for the next 30 years.
Object stores are first-class.
x y z mass
...
...
...
...
0.423 1.123 3.744 23.1413
...
...
...
...
...
...
...
...
5/15
RNTuple 101
RNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architecture
Storage layer / byte ranges
POSIX files, object stores, …
Primitives layer / simple types
“Columns” containing elements of fundamental types (float,
int, …) grouped into (compressed) pages and clusters
Logical layer / C++ objects
Mapping of C++ types onto columns, e.g.
std::vector<float> → index column and a value column
Event iteration
Looping over events for reading/writing
Storage layer: access to the header (= schema), the pages, and the footer (=
location of pages).
6/15
File backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk format
… …
Anchor Header Page
Cluster
Footer
struct Event {
int fId;
vector<Particle> fPtcls;
};
struct Particle {
float fE;
vector<int> fIds;
};
To put it simple…
Anchor: specifies the offset and size of the header and footer sections.
Header: schema information.2
Footer: location of pages and clusters.2
Pages: little-endian fundamental types (possibly packed, e.g. bit-fields)
—typically in the order of tens of KiB.2
2
This element may be compressed or not.
7/15
RNTuple DAOS backend
libdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classes
To simplify resource management, we wrote C++ wrappers for part of
libdaos functionality.
auto pool = std::make_shared<RDaosPool>(
"e6f8e503-e409-4b08-8eeb-7e4d77cce6bb", "1");
RDaosContainer cont(pool, "b4f6d9fc-e081-41d4-91ae-41adf800b537");
std::string s("foo bar baz");
cont.WriteObject(daos_obj_id_t{0xcafe4a11deadbeef, 0}, s.data(), s.size()
, /*dkey =*/ 0, /*akey =*/ 0);
8/15
DAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objects
… …
Anchor Header Page
Cluster
Footer
struct Event {
int fId;
vector<Particle> fPtcls;
};
struct Particle {
float fE;
vector<int> fIds;
};
Each RNTuple page is stored in a separate object. The UUID is
sequential starting from 00000000-0000-0000-0000-000000000000 .
Header, Footer, and Anchor are stored in three different objects with
reserved UUIDs.
9/15
Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS
From the user’s perspective…
auto model = RNTupleModel::Create();
auto ntuple = RNTupleReader::Open(std::move(model),
"DecayTree",
"./B2HHH~zstd.ntuple");
auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon");
auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon");
auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon");
10/15
Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS
From the user’s perspective…
auto model = RNTupleModel::Create();
auto ntuple = RNTupleReader::Open(std::move(model),
"b4f6d9fc-e081-41d4-91ae-41adf800b537",
"daos://e6f8e503-e409-4b08-8eeb-7e4d77cce6bb/1");
auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon");
auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon");
auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon");
10/15
First evaluation
Test environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environment
Our evaluation ran on CERN OpenLab DAOS test machines:
3 DAOS servers, 1 DAOS head node.
interconnected by an Omni-Path Edge Switch 100 Series | 24 ports.
Figure 2: Server nodes HW (olcsl-*)
Figure 3: Client node HW (olsky-03)
11/15
dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)
! These results are preliminary and might not be reliable.
Block size
4K 8K 16K 512K 1M 4M
Seq. Write 7.62 14.42 27.44 189.21 205.10 225.62
Seq. Read 2.62 5.04 9.21 116.86 147.7 9 188.90
Random Write 7.30 14.67 27.63 199.68 209.17 211.40
Random Read 2.16 4.20 7.92 120.91 162.7 0 211.12
Table 1: dfuse read/write benchmark (in MiB/s)
Far from the 34.2 Gbits/sec (4.275 GiB/s) achieved by iperf.
Path lookup not bad; around 700+ open()/creat() calls/s.
12/15
RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)
! These results are preliminary and might not be reliable.
No compression zstd
0
100
200
300
400
10.4
70
54.8
105.7
172.2
358.8
Runtime(s)
(a) gen_lhcb (write RNTuple)
Local file dfuse libdaos
No compression zstd
2,000
4,000
6,000
8,000
777
2,689
7,834
6,009
1,427
2,515
Runtime(ms)
(b) lhcb (read RNTuple)
Local file dfuse libdaos
Figure 4: RNTuple benchmark on LHCb data (ofi+sockets).23
2
Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd).
3
https://github.com/jblomer/iotools
13/15
RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)
! These results are preliminary and might not be reliable.
No compression zstd
20
40
60
80
10.4
70
34.29
70.1
14.5
61.6
Runtime(s)
(a) gen_lhcb (write RNTuple)
Local file dfuse libdaos
No compression zstd
2,000
4,000
777
2,689
4,989
3,479
1,281
2,854
Runtime(ms)
(b) lhcb (read RNTuple)
Local file dfuse libdaos
Figure 5: RNTuple benchmark on LHCb data (ofi+PSM2).45
4
Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd).
5
https://github.com/jblomer/iotools
14/15
Conclusions
ConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusions
1+ EB of HEP data in ROOT files (TTree). RNTuples replaces TTree
columnar storage for the next 30 years.
RNTuple architecture decouples storage from
serialization/representation. Object stores are first-class.
First prototype implementation of an Intel DAOS backend. Currently
“1 Page == 1 Object” + constant dkey. Still some performance issues.
Next Questions:
1. How to maximize throughput (bulk reading/writing of pages)?
2. How to distribute pages appropriately, e.g. put together pages
corresponding to the same data member?
15/15
Storing High-Energy Physics data in DAOS
Javier López Gómez – CERN fellow
<javier.lopez.gomez@cern.ch>
DUG ’20, 19th November 2020
ROOT project,
EP-SFT (SoFTware Development for Experiments),
CERN
http://root.cern/

More Related Content

What's hot

Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Fujio Turner
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
Steve Loughran
 
March 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compilerMarch 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compiler
Yahoo Developer Network
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
Databricks
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
The HDF-EOS Tools and Information Center
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
Jeffrey Breen
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
Rinke Hoekstra
 
The MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 InterfaceThe MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 Interface
The HDF-EOS Tools and Information Center
 
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLVisualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
The HDF-EOS Tools and Information Center
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
Yahoo Developer Network
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record Python
Paul Tarjan
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
Alexis Seigneurin
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RML
Christophe Debruyne
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
Fujio Turner
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
The HDF-EOS Tools and Information Center
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter
 
Hadoop
HadoopHadoop
Connecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata StandardsConnecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata Standards
The HDF-EOS Tools and Information Center
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 

What's hot (20)

Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
March 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compilerMarch 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compiler
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
The MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 InterfaceThe MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 Interface
 
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLVisualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record Python
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RML
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Hadoop
HadoopHadoop
Hadoop
 
Connecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata StandardsConnecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata Standards
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 

Similar to DUG'20: 07 - Storing High-Energy Physics data in DAOS

Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with Hadoop
Eric Sammer
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
DataWorks Summit
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian Huston
PyData
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Databricks
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
PyData
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
Yanchang Zhao
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Mark Wilkinson
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
Aidan Hogan
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
Ferran Galí Reniu
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
Minio
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
shravanthium111
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
Joseph Niemiec
 

Similar to DUG'20: 07 - Storing High-Energy Physics data in DAOS (20)

Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with Hadoop
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian Huston
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 

More from Andrey Kudryavtsev

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
Andrey Kudryavtsev
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Andrey Kudryavtsev
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
Andrey Kudryavtsev
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
Andrey Kudryavtsev
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
Andrey Kudryavtsev
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
Andrey Kudryavtsev
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN Openlab
Andrey Kudryavtsev
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
Andrey Kudryavtsev
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
Andrey Kudryavtsev
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
Andrey Kudryavtsev
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
Andrey Kudryavtsev
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
Andrey Kudryavtsev
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
Andrey Kudryavtsev
 

More from Andrey Kudryavtsev (13)

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN Openlab
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 

DUG'20: 07 - Storing High-Energy Physics data in DAOS

  • 1. Storing High-Energy Physics data in DAOS Javier López Gómez – CERN fellow <javier.lopez.gomez@cern.ch> DUG ’20, 19th November 2020 ROOT project, EP-SFT (SoFTware Development for Experiments), CERN http://root.cern/
  • 4. High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP) High-Energy Physics studies laws governing our universe at the smallest scale: fundamental particles, forces and its carriers, mass, etc. The “Standard model” describes these particles/interactions. CERN experiments observe particle interactions (typically by colliding particles at high-energies). HEP data = detector observations. 2/15
  • 5. Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC) Figure 1: Graphical representation of a CMS event.1 LHC collides protons that move in opposite directions. Detectors are similar to a 100 MP camera taking a picture every 25 ns. 109 collisions/sec generating ∼ 10 TB/s. Processing: - Online: filtering step. Part of the detector read-out. - Offline: distributed; disk storage at different LHC compute centers around the globe. 1 http://opendata.cern.ch/visualise/events/cms 3/15
  • 6. ROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT project ROOT: open-source data analysis framework written in C++. Provides C++ interpretation, object serialization (I/O), statistics, graphics, and much more. PyROOT provides dynamic C++ ↔ Python bindings. ROOT I/O: row-wise/column-wise storage of C++ objects. 4/15
  • 7. TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple HEP data analysis often only requires access to a subset of the properties of each event. Row-wise storage is inefficient. TTree organizes the dataset in columns that contain any type of C++ object. 1+ EB of HEP data stored in TTree ROOT files. TTree has been there for 25 years. RNTuple is the R&D project to replace TTree for the next 30 years. Object stores are first-class. x y z mass ... ... ... ... 0.423 1.123 3.744 23.1413 ... ... ... ... ... ... ... ... 5/15
  • 8. TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple HEP data analysis often only requires access to a subset of the properties of each event. Row-wise storage is inefficient. TTree organizes the dataset in columns that contain any type of C++ object. 1+ EB of HEP data stored in TTree ROOT files. TTree has been there for 25 years. RNTuple is the R&D project to replace TTree for the next 30 years. Object stores are first-class. x y z mass ... ... ... ... 0.423 1.123 3.744 23.1413 ... ... ... ... ... ... ... ... 5/15
  • 10. RNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architecture Storage layer / byte ranges POSIX files, object stores, … Primitives layer / simple types “Columns” containing elements of fundamental types (float, int, …) grouped into (compressed) pages and clusters Logical layer / C++ objects Mapping of C++ types onto columns, e.g. std::vector<float> → index column and a value column Event iteration Looping over events for reading/writing Storage layer: access to the header (= schema), the pages, and the footer (= location of pages). 6/15
  • 11. File backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk format … … Anchor Header Page Cluster Footer struct Event { int fId; vector<Particle> fPtcls; }; struct Particle { float fE; vector<int> fIds; }; To put it simple… Anchor: specifies the offset and size of the header and footer sections. Header: schema information.2 Footer: location of pages and clusters.2 Pages: little-endian fundamental types (possibly packed, e.g. bit-fields) —typically in the order of tens of KiB.2 2 This element may be compressed or not. 7/15
  • 13. libdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classes To simplify resource management, we wrote C++ wrappers for part of libdaos functionality. auto pool = std::make_shared<RDaosPool>( "e6f8e503-e409-4b08-8eeb-7e4d77cce6bb", "1"); RDaosContainer cont(pool, "b4f6d9fc-e081-41d4-91ae-41adf800b537"); std::string s("foo bar baz"); cont.WriteObject(daos_obj_id_t{0xcafe4a11deadbeef, 0}, s.data(), s.size() , /*dkey =*/ 0, /*akey =*/ 0); 8/15
  • 14. DAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objects … … Anchor Header Page Cluster Footer struct Event { int fId; vector<Particle> fPtcls; }; struct Particle { float fE; vector<int> fIds; }; Each RNTuple page is stored in a separate object. The UUID is sequential starting from 00000000-0000-0000-0000-000000000000 . Header, Footer, and Anchor are stored in three different objects with reserved UUIDs. 9/15
  • 15. Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS From the user’s perspective… auto model = RNTupleModel::Create(); auto ntuple = RNTupleReader::Open(std::move(model), "DecayTree", "./B2HHH~zstd.ntuple"); auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon"); auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon"); auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon"); 10/15
  • 16. Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS From the user’s perspective… auto model = RNTupleModel::Create(); auto ntuple = RNTupleReader::Open(std::move(model), "b4f6d9fc-e081-41d4-91ae-41adf800b537", "daos://e6f8e503-e409-4b08-8eeb-7e4d77cce6bb/1"); auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon"); auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon"); auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon"); 10/15
  • 18. Test environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environment Our evaluation ran on CERN OpenLab DAOS test machines: 3 DAOS servers, 1 DAOS head node. interconnected by an Omni-Path Edge Switch 100 Series | 24 ports. Figure 2: Server nodes HW (olcsl-*) Figure 3: Client node HW (olsky-03) 11/15
  • 19. dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets) ! These results are preliminary and might not be reliable. Block size 4K 8K 16K 512K 1M 4M Seq. Write 7.62 14.42 27.44 189.21 205.10 225.62 Seq. Read 2.62 5.04 9.21 116.86 147.7 9 188.90 Random Write 7.30 14.67 27.63 199.68 209.17 211.40 Random Read 2.16 4.20 7.92 120.91 162.7 0 211.12 Table 1: dfuse read/write benchmark (in MiB/s) Far from the 34.2 Gbits/sec (4.275 GiB/s) achieved by iperf. Path lookup not bad; around 700+ open()/creat() calls/s. 12/15
  • 20. RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets) ! These results are preliminary and might not be reliable. No compression zstd 0 100 200 300 400 10.4 70 54.8 105.7 172.2 358.8 Runtime(s) (a) gen_lhcb (write RNTuple) Local file dfuse libdaos No compression zstd 2,000 4,000 6,000 8,000 777 2,689 7,834 6,009 1,427 2,515 Runtime(ms) (b) lhcb (read RNTuple) Local file dfuse libdaos Figure 4: RNTuple benchmark on LHCb data (ofi+sockets).23 2 Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd). 3 https://github.com/jblomer/iotools 13/15
  • 21. RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2) ! These results are preliminary and might not be reliable. No compression zstd 20 40 60 80 10.4 70 34.29 70.1 14.5 61.6 Runtime(s) (a) gen_lhcb (write RNTuple) Local file dfuse libdaos No compression zstd 2,000 4,000 777 2,689 4,989 3,479 1,281 2,854 Runtime(ms) (b) lhcb (read RNTuple) Local file dfuse libdaos Figure 5: RNTuple benchmark on LHCb data (ofi+PSM2).45 4 Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd). 5 https://github.com/jblomer/iotools 14/15
  • 23. ConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusions 1+ EB of HEP data in ROOT files (TTree). RNTuples replaces TTree columnar storage for the next 30 years. RNTuple architecture decouples storage from serialization/representation. Object stores are first-class. First prototype implementation of an Intel DAOS backend. Currently “1 Page == 1 Object” + constant dkey. Still some performance issues. Next Questions: 1. How to maximize throughput (bulk reading/writing of pages)? 2. How to distribute pages appropriately, e.g. put together pages corresponding to the same data member? 15/15
  • 24. Storing High-Energy Physics data in DAOS Javier López Gómez – CERN fellow <javier.lopez.gomez@cern.ch> DUG ’20, 19th November 2020 ROOT project, EP-SFT (SoFTware Development for Experiments), CERN http://root.cern/