SlideShare a Scribd company logo
1 of 24
Download to read offline
Storing High-Energy Physics data in DAOS
Javier López Gómez – CERN fellow
<javier.lopez.gomez@cern.ch>
DUG ’20, 19th November 2020
ROOT project,
EP-SFT (SoFTware Development for Experiments),
CERN
http://root.cern/
ContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContentsContents
1 Introduction
2 RNTuple 101
3 RNTuple DAOS backend
4 First evaluation
5 Conclusions
1/15
Introduction
High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)
High-Energy Physics studies laws governing our universe at the smallest
scale: fundamental particles, forces and its carriers, mass, etc. The
“Standard model” describes these particles/interactions.
CERN experiments observe particle interactions (typically by colliding
particles at high-energies).
HEP data = detector observations.
2/15
Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)
Figure 1: Graphical representation of a CMS event.1
LHC collides protons that move in opposite directions. Detectors are
similar to a 100 MP camera taking a picture every 25 ns.
109
collisions/sec generating ∼ 10 TB/s.
Processing:
- Online: filtering step. Part of the detector read-out.
- Offline: distributed; disk storage at different LHC compute centers around
the globe.
1
http://opendata.cern.ch/visualise/events/cms 3/15
ROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT project
ROOT: open-source data analysis framework written in C++. Provides C++
interpretation, object serialization (I/O), statistics, graphics, and much
more.
PyROOT provides dynamic C++ ↔ Python bindings.
ROOT I/O: row-wise/column-wise storage of C++ objects.
4/15
TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple
HEP data analysis often only requires
access to a subset of the properties of
each event.
Row-wise storage is inefficient. TTree
organizes the dataset in columns that
contain any type of C++ object.
1+ EB of HEP data stored in TTree ROOT
files.
TTree has been there for 25 years.
RNTuple is the R&D project to replace
TTree for the next 30 years.
Object stores are first-class.
x y z mass
...
...
...
...
0.423 1.123 3.744 23.1413
...
...
...
...
...
...
...
...
5/15
TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple
HEP data analysis often only requires
access to a subset of the properties of
each event.
Row-wise storage is inefficient. TTree
organizes the dataset in columns that
contain any type of C++ object.
1+ EB of HEP data stored in TTree ROOT
files.
TTree has been there for 25 years.
RNTuple is the R&D project to replace
TTree for the next 30 years.
Object stores are first-class.
x y z mass
...
...
...
...
0.423 1.123 3.744 23.1413
...
...
...
...
...
...
...
...
5/15
RNTuple 101
RNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architecture
Storage layer / byte ranges
POSIX files, object stores, …
Primitives layer / simple types
“Columns” containing elements of fundamental types (float,
int, …) grouped into (compressed) pages and clusters
Logical layer / C++ objects
Mapping of C++ types onto columns, e.g.
std::vector<float> → index column and a value column
Event iteration
Looping over events for reading/writing
Storage layer: access to the header (= schema), the pages, and the footer (=
location of pages).
6/15
File backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk format
… …
Anchor Header Page
Cluster
Footer
struct Event {
int fId;
vector<Particle> fPtcls;
};
struct Particle {
float fE;
vector<int> fIds;
};
To put it simple…
Anchor: specifies the offset and size of the header and footer sections.
Header: schema information.2
Footer: location of pages and clusters.2
Pages: little-endian fundamental types (possibly packed, e.g. bit-fields)
—typically in the order of tens of KiB.2
2
This element may be compressed or not.
7/15
RNTuple DAOS backend
libdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classes
To simplify resource management, we wrote C++ wrappers for part of
libdaos functionality.
auto pool = std::make_shared<RDaosPool>(
"e6f8e503-e409-4b08-8eeb-7e4d77cce6bb", "1");
RDaosContainer cont(pool, "b4f6d9fc-e081-41d4-91ae-41adf800b537");
std::string s("foo bar baz");
cont.WriteObject(daos_obj_id_t{0xcafe4a11deadbeef, 0}, s.data(), s.size()
, /*dkey =*/ 0, /*akey =*/ 0);
8/15
DAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objects
… …
Anchor Header Page
Cluster
Footer
struct Event {
int fId;
vector<Particle> fPtcls;
};
struct Particle {
float fE;
vector<int> fIds;
};
Each RNTuple page is stored in a separate object. The UUID is
sequential starting from 00000000-0000-0000-0000-000000000000 .
Header, Footer, and Anchor are stored in three different objects with
reserved UUIDs.
9/15
Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS
From the user’s perspective…
auto model = RNTupleModel::Create();
auto ntuple = RNTupleReader::Open(std::move(model),
"DecayTree",
"./B2HHH~zstd.ntuple");
auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon");
auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon");
auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon");
10/15
Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS
From the user’s perspective…
auto model = RNTupleModel::Create();
auto ntuple = RNTupleReader::Open(std::move(model),
"b4f6d9fc-e081-41d4-91ae-41adf800b537",
"daos://e6f8e503-e409-4b08-8eeb-7e4d77cce6bb/1");
auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon");
auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon");
auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon");
10/15
First evaluation
Test environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environment
Our evaluation ran on CERN OpenLab DAOS test machines:
3 DAOS servers, 1 DAOS head node.
interconnected by an Omni-Path Edge Switch 100 Series | 24 ports.
Figure 2: Server nodes HW (olcsl-*)
Figure 3: Client node HW (olsky-03)
11/15
dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)
! These results are preliminary and might not be reliable.
Block size
4K 8K 16K 512K 1M 4M
Seq. Write 7.62 14.42 27.44 189.21 205.10 225.62
Seq. Read 2.62 5.04 9.21 116.86 147.7 9 188.90
Random Write 7.30 14.67 27.63 199.68 209.17 211.40
Random Read 2.16 4.20 7.92 120.91 162.7 0 211.12
Table 1: dfuse read/write benchmark (in MiB/s)
Far from the 34.2 Gbits/sec (4.275 GiB/s) achieved by iperf.
Path lookup not bad; around 700+ open()/creat() calls/s.
12/15
RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)
! These results are preliminary and might not be reliable.
No compression zstd
0
100
200
300
400
10.4
70
54.8
105.7
172.2
358.8
Runtime(s)
(a) gen_lhcb (write RNTuple)
Local file dfuse libdaos
No compression zstd
2,000
4,000
6,000
8,000
777
2,689
7,834
6,009
1,427
2,515
Runtime(ms)
(b) lhcb (read RNTuple)
Local file dfuse libdaos
Figure 4: RNTuple benchmark on LHCb data (ofi+sockets).23
2
Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd).
3
https://github.com/jblomer/iotools
13/15
RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)
! These results are preliminary and might not be reliable.
No compression zstd
20
40
60
80
10.4
70
34.29
70.1
14.5
61.6
Runtime(s)
(a) gen_lhcb (write RNTuple)
Local file dfuse libdaos
No compression zstd
2,000
4,000
777
2,689
4,989
3,479
1,281
2,854
Runtime(ms)
(b) lhcb (read RNTuple)
Local file dfuse libdaos
Figure 5: RNTuple benchmark on LHCb data (ofi+PSM2).45
4
Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd).
5
https://github.com/jblomer/iotools
14/15
Conclusions
ConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusions
1+ EB of HEP data in ROOT files (TTree). RNTuples replaces TTree
columnar storage for the next 30 years.
RNTuple architecture decouples storage from
serialization/representation. Object stores are first-class.
First prototype implementation of an Intel DAOS backend. Currently
“1 Page == 1 Object” + constant dkey. Still some performance issues.
Next Questions:
1. How to maximize throughput (bulk reading/writing of pages)?
2. How to distribute pages appropriately, e.g. put together pages
corresponding to the same data member?
15/15
Storing High-Energy Physics data in DAOS
Javier López Gómez – CERN fellow
<javier.lopez.gomez@cern.ch>
DUG ’20, 19th November 2020
ROOT project,
EP-SFT (SoFTware Development for Experiments),
CERN
http://root.cern/

More Related Content

What's hot

Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsFujio Turner
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureDatabricks
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonYahoo Developer Network
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record PythonPaul Tarjan
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Alexis Seigneurin
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLChristophe Debruyne
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsFujio Turner
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 

What's hot (20)

Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
March 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compilerMarch 2012 HUG: JuteRC compiler
March 2012 HUG: JuteRC compiler
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
The MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 InterfaceThe MATLAB Low-Level HDF5 Interface
The MATLAB Low-Level HDF5 Interface
 
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLVisualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record Python
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RML
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Hadoop
HadoopHadoop
Hadoop
 
Connecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata StandardsConnecting HDF with ISO Metadata Standards
Connecting HDF with ISO Metadata Standards
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 

Similar to DUG'20: 07 - Storing High-Energy Physics data in DAOS

Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with HadoopEric Sammer
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionDataWorks Summit
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonPyData
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD VivaAidan Hogan
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 

Similar to DUG'20: 07 - Storing High-Energy Physics data in DAOS (20)

Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with Hadoop
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
 
Massively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian HustonMassively Parallel Process with Prodedural Python by Ian Huston
Massively Parallel Process with Prodedural Python by Ian Huston
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 

More from Andrey Kudryavtsev

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesAndrey Kudryavtsev
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateAndrey Kudryavtsev
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingAndrey Kudryavtsev
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabAndrey Kudryavtsev
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedAndrey Kudryavtsev
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateAndrey Kudryavtsev
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSAndrey Kudryavtsev
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateAndrey Kudryavtsev
 

More from Andrey Kudryavtsev (13)

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN Openlab
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

DUG'20: 07 - Storing High-Energy Physics data in DAOS

  • 1. Storing High-Energy Physics data in DAOS Javier López Gómez – CERN fellow <javier.lopez.gomez@cern.ch> DUG ’20, 19th November 2020 ROOT project, EP-SFT (SoFTware Development for Experiments), CERN http://root.cern/
  • 4. High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP)High-Energy Physics (HEP) High-Energy Physics studies laws governing our universe at the smallest scale: fundamental particles, forces and its carriers, mass, etc. The “Standard model” describes these particles/interactions. CERN experiments observe particle interactions (typically by colliding particles at high-energies). HEP data = detector observations. 2/15
  • 5. Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC)Large Hadron Collider (LHC) Figure 1: Graphical representation of a CMS event.1 LHC collides protons that move in opposite directions. Detectors are similar to a 100 MP camera taking a picture every 25 ns. 109 collisions/sec generating ∼ 10 TB/s. Processing: - Online: filtering step. Part of the detector read-out. - Offline: distributed; disk storage at different LHC compute centers around the globe. 1 http://opendata.cern.ch/visualise/events/cms 3/15
  • 6. ROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT projectROOT project ROOT: open-source data analysis framework written in C++. Provides C++ interpretation, object serialization (I/O), statistics, graphics, and much more. PyROOT provides dynamic C++ ↔ Python bindings. ROOT I/O: row-wise/column-wise storage of C++ objects. 4/15
  • 7. TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple HEP data analysis often only requires access to a subset of the properties of each event. Row-wise storage is inefficient. TTree organizes the dataset in columns that contain any type of C++ object. 1+ EB of HEP data stored in TTree ROOT files. TTree has been there for 25 years. RNTuple is the R&D project to replace TTree for the next 30 years. Object stores are first-class. x y z mass ... ... ... ... 0.423 1.123 3.744 23.1413 ... ... ... ... ... ... ... ... 5/15
  • 8. TTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTupleTTree and RNTuple HEP data analysis often only requires access to a subset of the properties of each event. Row-wise storage is inefficient. TTree organizes the dataset in columns that contain any type of C++ object. 1+ EB of HEP data stored in TTree ROOT files. TTree has been there for 25 years. RNTuple is the R&D project to replace TTree for the next 30 years. Object stores are first-class. x y z mass ... ... ... ... 0.423 1.123 3.744 23.1413 ... ... ... ... ... ... ... ... 5/15
  • 10. RNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architectureRNTuple architecture Storage layer / byte ranges POSIX files, object stores, … Primitives layer / simple types “Columns” containing elements of fundamental types (float, int, …) grouped into (compressed) pages and clusters Logical layer / C++ objects Mapping of C++ types onto columns, e.g. std::vector<float> → index column and a value column Event iteration Looping over events for reading/writing Storage layer: access to the header (= schema), the pages, and the footer (= location of pages). 6/15
  • 11. File backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk formatFile backend: on-disk format … … Anchor Header Page Cluster Footer struct Event { int fId; vector<Particle> fPtcls; }; struct Particle { float fE; vector<int> fIds; }; To put it simple… Anchor: specifies the offset and size of the header and footer sections. Header: schema information.2 Footer: location of pages and clusters.2 Pages: little-endian fundamental types (possibly packed, e.g. bit-fields) —typically in the order of tens of KiB.2 2 This element may be compressed or not. 7/15
  • 13. libdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classeslibdaos C++ interface classes To simplify resource management, we wrote C++ wrappers for part of libdaos functionality. auto pool = std::make_shared<RDaosPool>( "e6f8e503-e409-4b08-8eeb-7e4d77cce6bb", "1"); RDaosContainer cont(pool, "b4f6d9fc-e081-41d4-91ae-41adf800b537"); std::string s("foo bar baz"); cont.WriteObject(daos_obj_id_t{0xcafe4a11deadbeef, 0}, s.data(), s.size() , /*dkey =*/ 0, /*akey =*/ 0); 8/15
  • 14. DAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objectsDAOS backend: mapping things to objects … … Anchor Header Page Cluster Footer struct Event { int fId; vector<Particle> fPtcls; }; struct Particle { float fE; vector<int> fIds; }; Each RNTuple page is stored in a separate object. The UUID is sequential starting from 00000000-0000-0000-0000-000000000000 . Header, Footer, and Anchor are stored in three different objects with reserved UUIDs. 9/15
  • 15. Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS From the user’s perspective… auto model = RNTupleModel::Create(); auto ntuple = RNTupleReader::Open(std::move(model), "DecayTree", "./B2HHH~zstd.ntuple"); auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon"); auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon"); auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon"); 10/15
  • 16. Usage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOSUsage: RNTuple/file vs. RNTuple/DAOS From the user’s perspective… auto model = RNTupleModel::Create(); auto ntuple = RNTupleReader::Open(std::move(model), "b4f6d9fc-e081-41d4-91ae-41adf800b537", "daos://e6f8e503-e409-4b08-8eeb-7e4d77cce6bb/1"); auto viewH1IsMuon = ntuple->GetView<int>("H1_isMuon"); auto viewH2IsMuon = ntuple->GetView<int>("H2_isMuon"); auto viewH3IsMuon = ntuple->GetView<int>("H3_isMuon"); 10/15
  • 18. Test environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environmentTest environment Our evaluation ran on CERN OpenLab DAOS test machines: 3 DAOS servers, 1 DAOS head node. interconnected by an Omni-Path Edge Switch 100 Series | 24 ports. Figure 2: Server nodes HW (olcsl-*) Figure 3: Client node HW (olsky-03) 11/15
  • 19. dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets)dfuse simple benchmark (ofi+sockets) ! These results are preliminary and might not be reliable. Block size 4K 8K 16K 512K 1M 4M Seq. Write 7.62 14.42 27.44 189.21 205.10 225.62 Seq. Read 2.62 5.04 9.21 116.86 147.7 9 188.90 Random Write 7.30 14.67 27.63 199.68 209.17 211.40 Random Read 2.16 4.20 7.92 120.91 162.7 0 211.12 Table 1: dfuse read/write benchmark (in MiB/s) Far from the 34.2 Gbits/sec (4.275 GiB/s) achieved by iperf. Path lookup not bad; around 700+ open()/creat() calls/s. 12/15
  • 20. RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets)RNTuple (ofi+sockets) ! These results are preliminary and might not be reliable. No compression zstd 0 100 200 300 400 10.4 70 54.8 105.7 172.2 358.8 Runtime(s) (a) gen_lhcb (write RNTuple) Local file dfuse libdaos No compression zstd 2,000 4,000 6,000 8,000 777 2,689 7,834 6,009 1,427 2,515 Runtime(ms) (b) lhcb (read RNTuple) Local file dfuse libdaos Figure 4: RNTuple benchmark on LHCb data (ofi+sockets).23 2 Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd). 3 https://github.com/jblomer/iotools 13/15
  • 21. RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2)RNTuple (ofi+psm2) ! These results are preliminary and might not be reliable. No compression zstd 20 40 60 80 10.4 70 34.29 70.1 14.5 61.6 Runtime(s) (a) gen_lhcb (write RNTuple) Local file dfuse libdaos No compression zstd 2,000 4,000 777 2,689 4,989 3,479 1,281 2,854 Runtime(ms) (b) lhcb (read RNTuple) Local file dfuse libdaos Figure 5: RNTuple benchmark on LHCb data (ofi+PSM2).45 4 Input data size: 1.5 GiB (uncompressed) / 1007 MiB (zstd). 5 https://github.com/jblomer/iotools 14/15
  • 23. ConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusionsConclusions 1+ EB of HEP data in ROOT files (TTree). RNTuples replaces TTree columnar storage for the next 30 years. RNTuple architecture decouples storage from serialization/representation. Object stores are first-class. First prototype implementation of an Intel DAOS backend. Currently “1 Page == 1 Object” + constant dkey. Still some performance issues. Next Questions: 1. How to maximize throughput (bulk reading/writing of pages)? 2. How to distribute pages appropriately, e.g. put together pages corresponding to the same data member? 15/15
  • 24. Storing High-Energy Physics data in DAOS Javier López Gómez – CERN fellow <javier.lopez.gomez@cern.ch> DUG ’20, 19th November 2020 ROOT project, EP-SFT (SoFTware Development for Experiments), CERN http://root.cern/