SlideShare a Scribd company logo
1 of 52
Download to read offline
Time Series Analysis
Using an Event Streaming Platform
Virtual Meetup - September 8-th 2020
Dr. Mirko Kämpf - Solution Architect @ Confluent, Inc. - Team CEMEA
Abstract
Time Series Analysis using an Event Streaming Platform
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful
and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to
reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like
supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in
the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent
cloud.
This presentation is about linking:
- Time-Series-Analysis (TSA)
- Network- or Graph-Analysis Confluent Platform
- Complex Event Processing (CEP).
Research work ends often with nice charts, scientific papers, and conference talks.
But, many published results can’t be reproduced -
often because the setup it is simply too complicated ...
Question:
How can we integrate data streams,
experiments, and decision making better?
Why not using batch processing?
Study anatomy … ● Batch processing is fine:
○ as long as your data
doesn’t change.
○ in PoCs for method
development in a lab.
○ for research in a fixed scope.
Why using Kafka?
Study the anatomy … Study and influence
the living system ...
● Stream processing is better:
○ for real time business in
changing environments.
○ iterative (research)
projects.
○ repeatable experiments on
replayed data.
Content:
(1) Intro
Typical types of event
How to identify hidden events?
(2) The Challenge
(3) Approach
Time Series Analytics &
Network Analytics in Kafka
Create time series from events
Create graphs from time series pairs
(4) Demo
(5) Architecture & Building Blocks:
(6) Unified Domain Model
Events - 1
Business events
- transaction records
- discrete observation
A Sale
A Trade
An Invoice
A Customer
Experience
JUTS SIMPLE
OBSERVATION &
DATA CAPTURING
How to handle events?
Events
Events - 2
Well defined events
- in known context
Sometimes: SIMPLE
Sometimes: DATA ANALYSIS
How to identify events?
directly observed
“observed” during
analysis of an episode
Univariate TSA: single episodes are processed
- Extreme values
- Patterns
- Distribution of values
- Fluctuation properties
- Long-term correlations
(memory effects)
Events
Events - 3
Extreme Events
- “outliers” in
unknown context
ADVANCED
DATA ANALYSIS (& ML)
How to handle?
Multivariate TSA: pairs / tuples of episodes are processed
- Comparison
Similarity
measures
for link
creation
1212
Reality is Complex:
We should simplify a bit!
Simplification in our method
can lead to isolation:
- DATA SILOS
- OPERATIONAL SILOS
SOLUTION:
GRAPHS capture structure.
TIME SERIES capture
properties over time (history).
relations as graph
or matrix:
objects in groups:
WHAT?
WHY?
Z value is a
property of the
network’s topology
which evolves
over time.
Special procedures
are established.
The events which make
people & the market
happy :-)
Complex Event AnalysisEvent Driven Architecture
Recap:
IT Operations
- Server crash
- Cyber crime
Business Events
- Big deal won
- Technical issue solved
Transactions (in business)
- orders placed
- products shipped
- bills paid
Extreme Events:
- Service slow down due
to emerging bottlenecks
- Increased demand in a
resource
What events are
and how to process
event-data can be
misunderstood
or simply unclear.
It all depends on our view
and our goals!
The Challenge:
How can you combine unbound data assets and scientific methods?
● You bring data to a place where it can be processed easily,
for example to a cloud system or into special purpose systems, such a event
processing platform.
● You integrate important algorithms as early as possible in your processing
pipeline to get results fast, using streaming processing capabilities.
Things can become complicated:
Complex Event Analysis
Integration Across Domains
Extraction of Hidden Event
Complex Event Analysis
- time series analysis and ML reveal hidden events
- multi-stage processing is usually needed
METHODOLOGY
Integration Across Domains
- distributed event processing systems are used
- apps consume and produce events of different flavors
- event-types and data structures my change over time
ORGANIZATION & OPERATIONS
Problems on ORGANIZATION level:
Many legacy systems can’t be integrated without additional expensive servers.
Often, this data is unreachable for externals.
Business data is managed by different teams using different technologies.
Data scientists use data in the cloud, and they all do really L❤VE notebooks.
But often, they don’t use any automation.
Extraction of Hidden Events
- requires Applied Data Analysis & Data Science
- embedding of Complex Algorithms in IT landscape
- integration of GPU/HPC and streaming data pipelines
TECHNOLOGY & SCIENCE
Kafka and its Ecosystem …
- are considered to be middleware, managed by IT people:
- researchers do not plan nor build their experiments around
this technology yet.
- don’t offer ML / AI components:
- many people think, that a model has to be executed on an edge-device or in the cloud.
Yes, Apache Kafka can support
agile (data)experiments!
- Kafka APIs give access to data (flows) in real time.
- allows replay of experiments at a later point in time
- Kafka allows variation of analysis without redoing a simulation or ingestion
by simply reusing persisted event-streams again.
- Kafka Streams and KSQL allow data processing in place:
- this allows faster iterations, for example because plausibility checks can be done in place
- the streaming API gives freedom for extension of core logic
- DSL and KSQL will sav you a lot of implementation time
Why not building something great using the right tools ???
ADVANCED
TIME SERIES ANALYSIS &
NETWORK ANALYSIS
… how does it work?
Network Reconstruction & Topology Analysis:
The Approach
OpenTSx
Network Reconstruction & Topology Analysis:
A Standardized Event Processing Pipeline
ADVANCED
TIME SERIES ANALYSIS &
NETWORK ANALYSIS
… and how does this fit into Kafka?
28
The following slides will show
how TSA concepts
and Kafka concepts
fit together.
Table Stream Duality
Create Time Series from Event Streams:
By Aggregation, Grouping, and Sorting
Events /
Observations
event series
time series
From Table of Events to - Time Series
Table Stream Duality ⇒ Time Series and Graphs
A time series is a table of ordered observations
in a fixed context.
A graph can be seen as a table of node- and link-
properties - stored in two tables.
Create Networks from Event Streams:
By Aggregation, Grouping, and Sorting
Events /
Observations
node properties
link properties
Multi Layer Stream Processing:
TSA to Reveal Hidden System Properties
Events /
Observations
event
series
time
series
Node
properties
Link
properties
Node
properties
Link
properties
Static
aspects of
system
topology
Dynamic
aspects of
system
topology
Multivariate
TSA
Univariate
TSA
Dynamic & Static aspects of system topology
Complex Event Processing: For Complex Systems
System
Events &
Emerging
Patterns
Node
properties
Link
properties
Topology
Analysis
Use the Table-Network Analogy: Kafka Graphs
https://github.com/rayokota/kafka-graphs
large durable graphs:
Persisted in Kafka topic
Sliding Windows: Define the Temporal Graphs
In some use cases, we don’t want to keep the node and link data in topics:
- nodes aren’t always linked
- changes are very fast
- focus on activation patterns,
rather than on underlying structure
It is fine to calculate the correlation links
and the topology metrics on the fly,
just for a given time window.
t
1
t
2
Back to
Streams
39
Demo ...
The project will provide
reusable streaming apps
so that developers
implement just their
specific logic.
Finally, we can compose a complex flow ...
Demo: OpenTSx
Generate some events and some episodes.
Apply some time series processing procedures on:
- a stream of episodes.
- a stream of events
Complex procedures are composed from a set of fundamental building blocks.
Visualize the flows and dependencies.
https://github.com/kamir/OpenTSx
47
Let’s Remember ...
Kafka Connect
UDFs:
Kafka Connect
Data Assets
We use 4 building blocks ...
Source Connectors
integrate sources …
Legacy and Future Systems
Sink Connectors
integrate external targets …
Special Purpose Systems
Domain specific logic is
implemented in small
reusable components:
Domain Driven Design
Data flows are no
longer transient.
The event log acts as
single source of truth.
Paradigm Shift in
Data Management
Confluent Platform
Kafka Consumer API
Kafka Connect
KSQL & Kafka Streams application
KSQL UDFs
Streaming
Applications
Primary Data
Kafka Producer API
Kafka Connect
Derived Data
… for our Time Series Processing Platform
Summary:
Because Kafka is a scalable & extensible platform it fits well for
complex event processing in any industry on premise and in the cloud.
Kafka ecosystem provides extension points for any kind of domain specific
or custom functionality - from advanced analytics to real time data enrichment.
Complex solutions are composed from a few fundamental building blocks:
What to do next?
(A) Identify relevant main flows and processing patterns in your project.
(B) Identify or implement source / sink connectors and establish 1st flow.
(C) Implement custom transformations as Kafka independent components.
(D) Integrate the processing topology as Kafka Streams application:
(a) Do you apply standard transformations and joins (for enrichment)?
(b) Is a special treatment required (advanced analysis)?
(c) Do you need special hardware / external services (AI/ML for classification)?
(E) Share your connectors and UDFs with the growing Kafka community.
(F) Iterate, add more flows and more topologies to your environment.
END
Thank you !
mirko@confluent.io
@semanpix

More Related Content

What's hot

From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleFrom Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleDr. Mirko Kämpf
 
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...HostedbyConfluent
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...HostedbyConfluent
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalHostedbyConfluent
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Timothy Spann
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafkaconfluent
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sectorconfluent
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...HostedbyConfluent
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional ChangesDatabricks
 

What's hot (20)

From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleFrom Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
 
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sector
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional Changes
 

Similar to Time Series Analysis Using an Event Streaming Platform

Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalconfluent
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2confluent
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Keynote 1 the rise of stream processing for data management & micro serv...
Keynote 1  the rise of stream processing for data management & micro serv...Keynote 1  the rise of stream processing for data management & micro serv...
Keynote 1 the rise of stream processing for data management & micro serv...Sabri Skhiri
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft Private Cloud
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2Bill Liu
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...MSAdvAnalytics
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Paris Carbone
 

Similar to Time Series Analysis Using an Event Streaming Platform (20)

Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
NextGenML
NextGenML NextGenML
NextGenML
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Keynote 1 the rise of stream processing for data management & micro serv...
Keynote 1  the rise of stream processing for data management & micro serv...Keynote 1  the rise of stream processing for data management & micro serv...
Keynote 1 the rise of stream processing for data management & micro serv...
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 

More from Dr. Mirko Kämpf

IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the CloudsDr. Mirko Kämpf
 
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)Dr. Mirko Kämpf
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentationDr. Mirko Kämpf
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapDr. Mirko Kämpf
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4Dr. Mirko Kämpf
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationDr. Mirko Kämpf
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems ResearchDr. Mirko Kämpf
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
 

More from Dr. Mirko Kämpf (9)

IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the Clouds
 
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale  (Strata Data NYC)Improving computer vision models at scale  (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
 
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation OptimizationInformation Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 

Recently uploaded

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 

Recently uploaded (20)

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 

Time Series Analysis Using an Event Streaming Platform

  • 1. Time Series Analysis Using an Event Streaming Platform Virtual Meetup - September 8-th 2020 Dr. Mirko Kämpf - Solution Architect @ Confluent, Inc. - Team CEMEA
  • 2. Abstract Time Series Analysis using an Event Streaming Platform Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats. In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks. The first case is relevant for anomaly detection and to protect safety. Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research. With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
  • 3. This presentation is about linking: - Time-Series-Analysis (TSA) - Network- or Graph-Analysis Confluent Platform - Complex Event Processing (CEP). Research work ends often with nice charts, scientific papers, and conference talks. But, many published results can’t be reproduced - often because the setup it is simply too complicated ... Question: How can we integrate data streams, experiments, and decision making better?
  • 4. Why not using batch processing? Study anatomy … ● Batch processing is fine: ○ as long as your data doesn’t change. ○ in PoCs for method development in a lab. ○ for research in a fixed scope.
  • 5. Why using Kafka? Study the anatomy … Study and influence the living system ... ● Stream processing is better: ○ for real time business in changing environments. ○ iterative (research) projects. ○ repeatable experiments on replayed data.
  • 6. Content: (1) Intro Typical types of event How to identify hidden events? (2) The Challenge (3) Approach Time Series Analytics & Network Analytics in Kafka Create time series from events Create graphs from time series pairs (4) Demo (5) Architecture & Building Blocks: (6) Unified Domain Model
  • 7. Events - 1 Business events - transaction records - discrete observation A Sale A Trade An Invoice A Customer Experience JUTS SIMPLE OBSERVATION & DATA CAPTURING How to handle events?
  • 8. Events Events - 2 Well defined events - in known context Sometimes: SIMPLE Sometimes: DATA ANALYSIS How to identify events? directly observed “observed” during analysis of an episode
  • 9. Univariate TSA: single episodes are processed - Extreme values - Patterns - Distribution of values - Fluctuation properties - Long-term correlations (memory effects)
  • 10. Events Events - 3 Extreme Events - “outliers” in unknown context ADVANCED DATA ANALYSIS (& ML) How to handle?
  • 11. Multivariate TSA: pairs / tuples of episodes are processed - Comparison Similarity measures for link creation
  • 12. 1212 Reality is Complex: We should simplify a bit! Simplification in our method can lead to isolation: - DATA SILOS - OPERATIONAL SILOS SOLUTION: GRAPHS capture structure. TIME SERIES capture properties over time (history). relations as graph or matrix: objects in groups:
  • 13. WHAT? WHY? Z value is a property of the network’s topology which evolves over time.
  • 14. Special procedures are established. The events which make people & the market happy :-) Complex Event AnalysisEvent Driven Architecture Recap: IT Operations - Server crash - Cyber crime Business Events - Big deal won - Technical issue solved Transactions (in business) - orders placed - products shipped - bills paid Extreme Events: - Service slow down due to emerging bottlenecks - Increased demand in a resource What events are and how to process event-data can be misunderstood or simply unclear. It all depends on our view and our goals!
  • 15. The Challenge: How can you combine unbound data assets and scientific methods? ● You bring data to a place where it can be processed easily, for example to a cloud system or into special purpose systems, such a event processing platform. ● You integrate important algorithms as early as possible in your processing pipeline to get results fast, using streaming processing capabilities.
  • 16. Things can become complicated: Complex Event Analysis Integration Across Domains Extraction of Hidden Event
  • 17. Complex Event Analysis - time series analysis and ML reveal hidden events - multi-stage processing is usually needed METHODOLOGY
  • 18. Integration Across Domains - distributed event processing systems are used - apps consume and produce events of different flavors - event-types and data structures my change over time ORGANIZATION & OPERATIONS
  • 19. Problems on ORGANIZATION level: Many legacy systems can’t be integrated without additional expensive servers. Often, this data is unreachable for externals. Business data is managed by different teams using different technologies. Data scientists use data in the cloud, and they all do really L❤VE notebooks. But often, they don’t use any automation.
  • 20. Extraction of Hidden Events - requires Applied Data Analysis & Data Science - embedding of Complex Algorithms in IT landscape - integration of GPU/HPC and streaming data pipelines TECHNOLOGY & SCIENCE
  • 21. Kafka and its Ecosystem … - are considered to be middleware, managed by IT people: - researchers do not plan nor build their experiments around this technology yet. - don’t offer ML / AI components: - many people think, that a model has to be executed on an edge-device or in the cloud.
  • 22. Yes, Apache Kafka can support agile (data)experiments! - Kafka APIs give access to data (flows) in real time. - allows replay of experiments at a later point in time - Kafka allows variation of analysis without redoing a simulation or ingestion by simply reusing persisted event-streams again. - Kafka Streams and KSQL allow data processing in place: - this allows faster iterations, for example because plausibility checks can be done in place - the streaming API gives freedom for extension of core logic - DSL and KSQL will sav you a lot of implementation time
  • 23. Why not building something great using the right tools ???
  • 24. ADVANCED TIME SERIES ANALYSIS & NETWORK ANALYSIS … how does it work?
  • 25. Network Reconstruction & Topology Analysis: The Approach
  • 26. OpenTSx Network Reconstruction & Topology Analysis: A Standardized Event Processing Pipeline
  • 27. ADVANCED TIME SERIES ANALYSIS & NETWORK ANALYSIS … and how does this fit into Kafka?
  • 28. 28 The following slides will show how TSA concepts and Kafka concepts fit together.
  • 30. Create Time Series from Event Streams: By Aggregation, Grouping, and Sorting Events / Observations event series time series
  • 31. From Table of Events to - Time Series
  • 32. Table Stream Duality ⇒ Time Series and Graphs A time series is a table of ordered observations in a fixed context. A graph can be seen as a table of node- and link- properties - stored in two tables.
  • 33. Create Networks from Event Streams: By Aggregation, Grouping, and Sorting Events / Observations node properties link properties
  • 34. Multi Layer Stream Processing: TSA to Reveal Hidden System Properties Events / Observations event series time series Node properties Link properties Node properties Link properties Static aspects of system topology Dynamic aspects of system topology Multivariate TSA Univariate TSA
  • 35. Dynamic & Static aspects of system topology Complex Event Processing: For Complex Systems System Events & Emerging Patterns Node properties Link properties Topology Analysis
  • 36. Use the Table-Network Analogy: Kafka Graphs https://github.com/rayokota/kafka-graphs large durable graphs: Persisted in Kafka topic
  • 37. Sliding Windows: Define the Temporal Graphs In some use cases, we don’t want to keep the node and link data in topics: - nodes aren’t always linked - changes are very fast - focus on activation patterns, rather than on underlying structure It is fine to calculate the correlation links and the topology metrics on the fly, just for a given time window. t 1 t 2
  • 40. The project will provide reusable streaming apps so that developers implement just their specific logic.
  • 41.
  • 42.
  • 43.
  • 44. Finally, we can compose a complex flow ...
  • 45.
  • 46. Demo: OpenTSx Generate some events and some episodes. Apply some time series processing procedures on: - a stream of episodes. - a stream of events Complex procedures are composed from a set of fundamental building blocks. Visualize the flows and dependencies. https://github.com/kamir/OpenTSx
  • 48. Kafka Connect UDFs: Kafka Connect Data Assets We use 4 building blocks ... Source Connectors integrate sources … Legacy and Future Systems Sink Connectors integrate external targets … Special Purpose Systems Domain specific logic is implemented in small reusable components: Domain Driven Design Data flows are no longer transient. The event log acts as single source of truth. Paradigm Shift in Data Management
  • 49. Confluent Platform Kafka Consumer API Kafka Connect KSQL & Kafka Streams application KSQL UDFs Streaming Applications Primary Data Kafka Producer API Kafka Connect Derived Data … for our Time Series Processing Platform
  • 50. Summary: Because Kafka is a scalable & extensible platform it fits well for complex event processing in any industry on premise and in the cloud. Kafka ecosystem provides extension points for any kind of domain specific or custom functionality - from advanced analytics to real time data enrichment. Complex solutions are composed from a few fundamental building blocks:
  • 51. What to do next? (A) Identify relevant main flows and processing patterns in your project. (B) Identify or implement source / sink connectors and establish 1st flow. (C) Implement custom transformations as Kafka independent components. (D) Integrate the processing topology as Kafka Streams application: (a) Do you apply standard transformations and joins (for enrichment)? (b) Is a special treatment required (advanced analysis)? (c) Do you need special hardware / external services (AI/ML for classification)? (E) Share your connectors and UDFs with the growing Kafka community. (F) Iterate, add more flows and more topologies to your environment.