SlideShare a Scribd company logo
Jakub Wozniak, CERN
Next CERN Accelerator
Logging Service
Architecture
#EUent9
Agenda
• What is (Next) CALS?
• NXCALS Architecture
• Meta-data Service & Ingestion API
• Spark Extraction API
#EUent9
Controls Data Logging
• Provide access to current & historical device state
– Monitoring & controls of the machines
– Improve machine/beam performance
– Various studies (new beam types, experiments, machines)
• Required to deliver quality beam to experiments
• Not physics data from experiments!
#EUent9
CERN Accelerator Logging Service
• Old system (CALS) based on Oracle (2 DBs)
– ~20,000 devices (from ~120,000 devices)
– 1,500,000 signals
– 5,000,000 extractions per day
– 71,000,000,000 records per day
• 2 TB / day (unfiltered data, 2 DBs)
– 1 PB of total data storage (heavily filtered up to 95%)
#EUent9
Current Controls Data Storage
Run 1 Run 2LS 1
900 GB/day
#EUent9
Current Issues With CALS
• Performance / scalability problems
– Difficult to scale horizontally
– “… to extract 24h of data takes 12h”
• Other issues
– Problems with big payloads (payloads vary from KB to GB)
– Limited & rigid table structure & limited types (no nested types)
– Limited integration with heterogeneous analytics tools (Python, Matlab,
R, Java,…)
• CALS & tools not ready for Big Data!
– Have to extract data to do analysis!
#EUent9
Big Data
For Controls?
#EUent9
CALS on Oracle
Impala
Kudu
?Next CALS
(NXCALS)
Next
CERN Accelerator Logging Service
(Kafka, Hadoop, Spark)
#EUent9
Controls Data
#EUent9
Readings from devices / properties (with fields inside)
Timeseries of records
Device X / Property Y (time & values): t0: { f1, f2, f3 } (schema 1)
t1: { f1, f2, f3 }
t2: { f1, f2, f3 }
…
t3: { f1, f2, f3, f4 } (schema 2)
t4: { f1, f2, f3, f4 }
…
tN: { f1, f2, f3, f5, …, fN } (schema N)
Devices get updated so …
… schema changes over time!
Generic Storage System
• Different Controls Systems for different domains
• Not only Device/Property model
Let’s generalize and define some abstraction
Call it Entity…
…and just arbitrary Records
Record: Key -> Values (with timestamp & partition)
Not limited to Controls nor CERN!
#EUent9
Some Requirements
• Discover entities from records
– Avoids static / offline registration in advance
• Allow to search for entity meta-data
– What are the known entities?
– How they are partitioned?
– With what schemas?
• Store & extract data
• Data access
– Online monitoring (simple extraction but must have low latency data access)
– Offline analysis (provide visualization tools for more complex analysis)
#EUent9
NXCALS Architecture
Spark
Lo
g.
Pr
oc.
Datasources
12
Jupyter
Old API
NXCALS
API
ETL
Kafka
HBase
HDFS
Avro
Parquet
Hadoop
API
Meta-data service DB
Scientists
Programmers
Applications
Clients
Design Choices
• Why Hadoop
– Service at CERN (IT/DB group)
• Why Kafka?
– Redundancy & data safety (if Hadoop not available)
– Low latency streaming API for extraction
• Why Hbase?
– Fast, low latency for online monitoring queries
– Gives time for data deduplication & compaction into Parquet files
• Why Parquet as final storage?
– Open, columnar, storage efficient format with good compression
– Good performance for extraction
• predicate push down
• column projection
– Easy to understand, access (even outside of the system), backup, etc
#EUent9
Data Flow
• Ingestion API to send data to Kafka (as Avro)
• ETL extracts it from Kafka towards
– HDFS (as Avro, into staging folders)
– HBase (as Avro, for low latency)
• Avro files is deduplicated & compacted
• Into larger Parquet files (with Spark)
• Hadood-friendly process, avoids many small files
• Spark Extraction API for data access
• Meta-data service knows location of objects in files
– Avoids scanning many files
– “Replacement” for missing indexes
#EUent9
Devops?
• Microservice architecture
• Monitoring is crucial, done using
– Prometheus
– Alertmanager
– Grafana
– Logs send to Elastic (outside)
• Fully automated CI/CD with
– Jenkins pipelines
– Ansible deployment
#EUent9
Meta-data Service
#EUent9
Data Types
• Data (records):
– Kafka -> Hadoop (HBase, HDFS)
• Meta-data (info about data)
– RDBMS (Oracle)
#EUent9
Domain Description
• System stores changes of state of abstract entities in form of records
– Data identified by entity keys and timestamp
– “Extended” timeseries data
• Record = { f1=v1 ,…, fn=vn } (at t1)
– Any fields
– Some fields are special (entity keys, partition keys, timestamp)
– Set of fields => Schema
• Records are split (grouped in different files on disk) by:
– Time, partition (classifier), schema
• Fields can change over time {f1…fm} (at tx)
– History of record structure changes (schema changes)
#EUent9
Meta Data Objects
• ENTITY – abstract object we store data for
– Identified by known record fields (primary key)
• PARTITION –classifier to store data on disk in files
– Identified by known record fields (primary key)
• SCHEMA – given set of all record’s fields
#EUent9
Meta Data Objects
• SYSTEM – defines record type (special fields)
– Field names identifying ENTITY
– Field names identifying PARTITION
– Field names identifying TIMESTAMP
• ENTITY-HISTORY – history of SCHEMA & PARTITION changes of ENTITY over
time
• VARIABLE – alias for ENTITY
– whole record
– field in record
• VARIABLE-HISTORY – VARIABLE configuration over time
– Pointer (alias) to entity and field with time information
#EUent9
Java Ingestion API Example
// Create data publisher
Publisher<ImmutableData> publisher =
PublisherFactory.newInstance().createPublisher(“MOCK-SYSTEM”,(d)-> d);
// Create data (ImmutableData == Map<String,Object>)
ImmutableData data = ImmutableData.builder()
.add("device", ”NXCALS_MONITORING_DEV1")
.add(”property", ”Setting")
.add(“class”,”MONITORING”)
.add(“timestamp”,Instant.now())
.add("byteField1", (byte) 2)
.add("shortField1", (short) 1).build();
// Publish data
CompletableFuture<Void> future = publisher.publish(data);
// Handle Future completion or error
future.whenComplete((v,e)->{if(e != null) //handle errors });
#EUent9
Entity Key
Partition Key
Timestamp Key
Data Partitioning
System [sid], { entity_keys, partition_keys, timestamp, field1…fieldN } = record
hdfs: /// project / nxcals / sid / partition_id / schema_id / date / data.parquet
schema
Meta
A simple example for device domain (CMW)
• System CMW which defines:
• Entity keys as device, property
• Partition keys as class, property
• Timestamp keys (acq or cycle stamp)
So one data.parquet file will contain
data for devices from the same
class/property.
A file has always records of
the same schema!#EUent9
Meta Store Efficiency
• Meta-data is cached
• Ingestion API calls the meta-store only on:
– Entity creation
– Entity change (schema change / rename / …)
– Cache misses
• So rarely compared to the data rate
– Calls to meta store expensive (10-50ms)
#EUent9
Meta-Store Features
• Entities are created dynamically from records
• Schemas are discovered and saved with history
• Records (entities) can change schemas over time
• Schema changes handled at extraction
– using history from meta-data service
#EUent9
Spark Extraction API
#EUent9
API for Spark Extraction
• Extension to Spark sources package
– Extends BaseRelation, implements PrunedFilteredScan
– sparkSession.read().format("cern.accsoft.nxcals.data.access.api”).load()
• Hides data source & implementation details
– Hbase for most recent data (<36 hours)
– HDFS for older data (>36 hours due to compaction)
• Merges schemas using schema history
• Greatly simplifies data access
#EUent9
Spark Extraction Example
SparkSession sparkSession = … // create session
Dataset<Row> dataset = DataAccessQueryBuilder
.system("MOCK-SYSTEM")
.keyValue("device", ”NXCALS_MONITORING_DEV1")
.keyValue(”property", ”Setting")
.startTime("2017-10-10 00:00:00.0")
.duration(Duration.ofDays(2))
.fields("device", "intField1", "doubleField")
.buildDataset(sparkSession);
#EUent9
Entity Key
Time Window
Record Schema, Spark Default
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
Can you quickly extract & union datasets containing those records?
org.apache.spark.sql.AnalysisException:
Union can only be performed on tables with the same number of columns
Can be done but troublesome for scientists!
Entity A evolves over time:
#EUent9
Schema Merging
Schema: {acqStamp(long), field1 (double), field2 (integer), field21 (long), field3 (double)}
Record1
Record2
Record3
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
#EUent9
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
… With Field Aliases
… and new_field as alias of field2 and field21
Schema {acqStamp (long), field1 (double), new_field (long), field3 (double)}
Record1
Record2
Record3
#EUent9
Variables
• Pointer to field in entity record in time window
• Can point to different entities over time
• No need for real entity
• Useful for abstractions (“LHC_Beam_Intensity”)
#EUent9
Variable Extraction API
#EUent9
SparkSession sparkSession = … // create session
Dataset<Row> dataset = VariableQueryBuilder
.variable(”NXCALS_MONITORING_VARIABLE")
.startTime("2017-10-10 00:00:00.0")
.duration(Duration.ofDays(2))
.buildDataset(sparkSession);
Variables Configuration
Schema: {variable (String), acqStamp(long), value (double)}
Entity 1: {acqStamp, field1 (float), field21 (long)}
Entity 2: {acqStamp, field2 (double)}
Entity 3: {aqcStamp, field1(array2D), field3 (float)}
Variable configuration
changes over time
#EUent9
Why Simplified Extraction?
• Data producers ≠ data consumers
• At CERN different groups do
– Equipment & Device / Property design (low level)
– Physics & Beam-oriented analysis (high level)
#EUent9
Summary
• NXCALS is a generic Big Data storage system
• Timeseries-like records of changing structure
– Arbitrary entity & partition keys
• Java Ingestion API
• Spark Extraction API (Java, Python, Scala)
#EUent9
Questions?
• NXCALS code:
– https://gitlab.cern.ch/acc-logging-team/nxcals
• Contact us:
– jakub.wozniak@cern.ch
– acc-logging-team@cern.ch
#EUent9

More Related Content

What's hot

Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
DataWorks Summit
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Maximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19cMaximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19c
Glen Hawkins
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
DataStax
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Spark Summit
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
Databricks
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
Databricks
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Spark Summit
 

What's hot (20)

Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Maximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19cMaximum Availability Architecture - Best Practices for Oracle Database 19c
Maximum Availability Architecture - Best Practices for Oracle Database 19c
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 

Similar to Next CERN Accelerator Logging Service with Jakub Wozniak

Présentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhPrésentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo Wazuh
Aurélie Henriot
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
FAschenbrenner
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
NextGenML
NextGenML NextGenML
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Bobby Curtis
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
Jonathan Katz
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
Brian Hughes
 

Similar to Next CERN Accelerator Logging Service with Jakub Wozniak (20)

Présentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhPrésentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo Wazuh
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
NextGenML
NextGenML NextGenML
NextGenML
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 

Next CERN Accelerator Logging Service with Jakub Wozniak

  • 1. Jakub Wozniak, CERN Next CERN Accelerator Logging Service Architecture #EUent9
  • 2. Agenda • What is (Next) CALS? • NXCALS Architecture • Meta-data Service & Ingestion API • Spark Extraction API #EUent9
  • 3. Controls Data Logging • Provide access to current & historical device state – Monitoring & controls of the machines – Improve machine/beam performance – Various studies (new beam types, experiments, machines) • Required to deliver quality beam to experiments • Not physics data from experiments! #EUent9
  • 4. CERN Accelerator Logging Service • Old system (CALS) based on Oracle (2 DBs) – ~20,000 devices (from ~120,000 devices) – 1,500,000 signals – 5,000,000 extractions per day – 71,000,000,000 records per day • 2 TB / day (unfiltered data, 2 DBs) – 1 PB of total data storage (heavily filtered up to 95%) #EUent9
  • 5. Current Controls Data Storage Run 1 Run 2LS 1 900 GB/day #EUent9
  • 6. Current Issues With CALS • Performance / scalability problems – Difficult to scale horizontally – “… to extract 24h of data takes 12h” • Other issues – Problems with big payloads (payloads vary from KB to GB) – Limited & rigid table structure & limited types (no nested types) – Limited integration with heterogeneous analytics tools (Python, Matlab, R, Java,…) • CALS & tools not ready for Big Data! – Have to extract data to do analysis! #EUent9
  • 7. Big Data For Controls? #EUent9 CALS on Oracle Impala Kudu ?Next CALS (NXCALS)
  • 8. Next CERN Accelerator Logging Service (Kafka, Hadoop, Spark) #EUent9
  • 9. Controls Data #EUent9 Readings from devices / properties (with fields inside) Timeseries of records Device X / Property Y (time & values): t0: { f1, f2, f3 } (schema 1) t1: { f1, f2, f3 } t2: { f1, f2, f3 } … t3: { f1, f2, f3, f4 } (schema 2) t4: { f1, f2, f3, f4 } … tN: { f1, f2, f3, f5, …, fN } (schema N) Devices get updated so … … schema changes over time!
  • 10. Generic Storage System • Different Controls Systems for different domains • Not only Device/Property model Let’s generalize and define some abstraction Call it Entity… …and just arbitrary Records Record: Key -> Values (with timestamp & partition) Not limited to Controls nor CERN! #EUent9
  • 11. Some Requirements • Discover entities from records – Avoids static / offline registration in advance • Allow to search for entity meta-data – What are the known entities? – How they are partitioned? – With what schemas? • Store & extract data • Data access – Online monitoring (simple extraction but must have low latency data access) – Offline analysis (provide visualization tools for more complex analysis) #EUent9
  • 13. Design Choices • Why Hadoop – Service at CERN (IT/DB group) • Why Kafka? – Redundancy & data safety (if Hadoop not available) – Low latency streaming API for extraction • Why Hbase? – Fast, low latency for online monitoring queries – Gives time for data deduplication & compaction into Parquet files • Why Parquet as final storage? – Open, columnar, storage efficient format with good compression – Good performance for extraction • predicate push down • column projection – Easy to understand, access (even outside of the system), backup, etc #EUent9
  • 14. Data Flow • Ingestion API to send data to Kafka (as Avro) • ETL extracts it from Kafka towards – HDFS (as Avro, into staging folders) – HBase (as Avro, for low latency) • Avro files is deduplicated & compacted • Into larger Parquet files (with Spark) • Hadood-friendly process, avoids many small files • Spark Extraction API for data access • Meta-data service knows location of objects in files – Avoids scanning many files – “Replacement” for missing indexes #EUent9
  • 15. Devops? • Microservice architecture • Monitoring is crucial, done using – Prometheus – Alertmanager – Grafana – Logs send to Elastic (outside) • Fully automated CI/CD with – Jenkins pipelines – Ansible deployment #EUent9
  • 17. Data Types • Data (records): – Kafka -> Hadoop (HBase, HDFS) • Meta-data (info about data) – RDBMS (Oracle) #EUent9
  • 18. Domain Description • System stores changes of state of abstract entities in form of records – Data identified by entity keys and timestamp – “Extended” timeseries data • Record = { f1=v1 ,…, fn=vn } (at t1) – Any fields – Some fields are special (entity keys, partition keys, timestamp) – Set of fields => Schema • Records are split (grouped in different files on disk) by: – Time, partition (classifier), schema • Fields can change over time {f1…fm} (at tx) – History of record structure changes (schema changes) #EUent9
  • 19. Meta Data Objects • ENTITY – abstract object we store data for – Identified by known record fields (primary key) • PARTITION –classifier to store data on disk in files – Identified by known record fields (primary key) • SCHEMA – given set of all record’s fields #EUent9
  • 20. Meta Data Objects • SYSTEM – defines record type (special fields) – Field names identifying ENTITY – Field names identifying PARTITION – Field names identifying TIMESTAMP • ENTITY-HISTORY – history of SCHEMA & PARTITION changes of ENTITY over time • VARIABLE – alias for ENTITY – whole record – field in record • VARIABLE-HISTORY – VARIABLE configuration over time – Pointer (alias) to entity and field with time information #EUent9
  • 21. Java Ingestion API Example // Create data publisher Publisher<ImmutableData> publisher = PublisherFactory.newInstance().createPublisher(“MOCK-SYSTEM”,(d)-> d); // Create data (ImmutableData == Map<String,Object>) ImmutableData data = ImmutableData.builder() .add("device", ”NXCALS_MONITORING_DEV1") .add(”property", ”Setting") .add(“class”,”MONITORING”) .add(“timestamp”,Instant.now()) .add("byteField1", (byte) 2) .add("shortField1", (short) 1).build(); // Publish data CompletableFuture<Void> future = publisher.publish(data); // Handle Future completion or error future.whenComplete((v,e)->{if(e != null) //handle errors }); #EUent9 Entity Key Partition Key Timestamp Key
  • 22. Data Partitioning System [sid], { entity_keys, partition_keys, timestamp, field1…fieldN } = record hdfs: /// project / nxcals / sid / partition_id / schema_id / date / data.parquet schema Meta A simple example for device domain (CMW) • System CMW which defines: • Entity keys as device, property • Partition keys as class, property • Timestamp keys (acq or cycle stamp) So one data.parquet file will contain data for devices from the same class/property. A file has always records of the same schema!#EUent9
  • 23. Meta Store Efficiency • Meta-data is cached • Ingestion API calls the meta-store only on: – Entity creation – Entity change (schema change / rename / …) – Cache misses • So rarely compared to the data rate – Calls to meta store expensive (10-50ms) #EUent9
  • 24. Meta-Store Features • Entities are created dynamically from records • Schemas are discovered and saved with history • Records (entities) can change schemas over time • Schema changes handled at extraction – using history from meta-data service #EUent9
  • 26. API for Spark Extraction • Extension to Spark sources package – Extends BaseRelation, implements PrunedFilteredScan – sparkSession.read().format("cern.accsoft.nxcals.data.access.api”).load() • Hides data source & implementation details – Hbase for most recent data (<36 hours) – HDFS for older data (>36 hours due to compaction) • Merges schemas using schema history • Greatly simplifies data access #EUent9
  • 27. Spark Extraction Example SparkSession sparkSession = … // create session Dataset<Row> dataset = DataAccessQueryBuilder .system("MOCK-SYSTEM") .keyValue("device", ”NXCALS_MONITORING_DEV1") .keyValue(”property", ”Setting") .startTime("2017-10-10 00:00:00.0") .duration(Duration.ofDays(2)) .fields("device", "intField1", "doubleField") .buildDataset(sparkSession); #EUent9 Entity Key Time Window
  • 28. Record Schema, Spark Default Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 Can you quickly extract & union datasets containing those records? org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the same number of columns Can be done but troublesome for scientists! Entity A evolves over time: #EUent9
  • 29. Schema Merging Schema: {acqStamp(long), field1 (double), field2 (integer), field21 (long), field3 (double)} Record1 Record2 Record3 Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 #EUent9
  • 30. Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 … With Field Aliases … and new_field as alias of field2 and field21 Schema {acqStamp (long), field1 (double), new_field (long), field3 (double)} Record1 Record2 Record3 #EUent9
  • 31. Variables • Pointer to field in entity record in time window • Can point to different entities over time • No need for real entity • Useful for abstractions (“LHC_Beam_Intensity”) #EUent9
  • 32. Variable Extraction API #EUent9 SparkSession sparkSession = … // create session Dataset<Row> dataset = VariableQueryBuilder .variable(”NXCALS_MONITORING_VARIABLE") .startTime("2017-10-10 00:00:00.0") .duration(Duration.ofDays(2)) .buildDataset(sparkSession);
  • 33. Variables Configuration Schema: {variable (String), acqStamp(long), value (double)} Entity 1: {acqStamp, field1 (float), field21 (long)} Entity 2: {acqStamp, field2 (double)} Entity 3: {aqcStamp, field1(array2D), field3 (float)} Variable configuration changes over time #EUent9
  • 34. Why Simplified Extraction? • Data producers ≠ data consumers • At CERN different groups do – Equipment & Device / Property design (low level) – Physics & Beam-oriented analysis (high level) #EUent9
  • 35. Summary • NXCALS is a generic Big Data storage system • Timeseries-like records of changing structure – Arbitrary entity & partition keys • Java Ingestion API • Spark Extraction API (Java, Python, Scala) #EUent9
  • 36. Questions? • NXCALS code: – https://gitlab.cern.ch/acc-logging-team/nxcals • Contact us: – jakub.wozniak@cern.ch – acc-logging-team@cern.ch #EUent9