SlideShare a Scribd company logo
Complex Event Processing platform handling
millions of users
Krzysztof Zarzycki - CTO @ Getindata
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
About us
Founded in 2014 by
ex-Spotify engineers.
Focus only on Big Data and
Cloud (from day 1)
Community builders (Big
Data Tech Warsaw
organizers)
50+ Big Data engineers
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What is it?
The application logic, analytics, and queries exist continuously, and data flows through them continuously.
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Why is it important for business?
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Actionable insights
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Processing
Why is it important for engineering?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
It’s NOT only about real-time
It’s just natural - data comes continuously.
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Processing
User sessions spanning minutes, hours, or days
Batch boundaries are often artificial.
♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[9:00 - 10:00) [10:00-11:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Complex Event Processing
● Analyze patterns,relations, cause-and-effect
○ If A & B then C
● Infer business-relevant events from raw technical stream
● .. and cascade extraction of even higher-level events
● Alerts, triggers, workflow automation
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Complex Event Processing
● behavioral marketing
● product analytics
● business activity monitoring
● technical monitoring and anomaly detection
● IoT
● fraud detection
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ESP vs CEP
● Difference is blurry and diminishing
● Traditional CEP
○ Complex proc, low latency, single-machine
○ high-level language like SQL
● Traditional ESP
○ straightforward, high-throughput, distributed
○ Broader, more generic and low-level
● NOW: Best of both!
○ Often called “Streaming Analytics”
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The need
● Streaming model and real-time
● On par with batch
○ Enrichment, Joins
○ Aggregation
○ Reprocessing of historical data
○ Machine Learning scoring, inference
○ Complex Event Processing
● large scale, high-throughput
● correctness and fault tolerance
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The solution
Apache Flink
open-source stateful processor over massive data streams
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Who uses Flink
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
We use Flink!
Banks Telcos Automotive Adtech
Commiters to Flink
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late & Out-of-sequence
events
Breaks correctness.
Often handled with very tedious user code.
Or solved in batch by “waiting enough” and “processing twice”.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late & Out-of-sequence
events
Handled by Flink in the framework
Based on watermarks heuristics, that marks the progress of event time in the stream.
Asserts that all earlier events have probably arrived.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local State
Operational state obligatory for analytics
Used for accumulators, windows, source offsets, tracking patterns, ...
6
sum
1
3
2
4
1
1
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local State
Persistent operational state local to computation
Maximize performance with millions of updates per second & core.
Enable out-of-core (more than RAM) processing, with RocksDb
State
Task 1
Logic State
Task N
Logic
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Fault tolerance
(Checkpointing)
State survives abrupt crashes or just maintenance
Checkpointed regularly to resilient external storage.
Accurate - keeps stream offsets, accumulators or windows in perfect sync, consistency.
Efficient - almost no impact on the processing.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink Cluster
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
APIs
Java/Scala
high-level Table API
mid-level dataflow
low-level advanced for tricky cases
Developer Data Scientist Analyst
SQL
Incl. analytical functions
MATCH_RECOGNIZE
and UDF extensibility
Python
Based on Table API
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Big picture
Assisting Millions of User in Real-Time
Kcell case
About Kcell
Kcell has a strong software
development team and lots
of experience in building
services and products
We like innovations
> 10 000 000 subscribers
Largest GSM operator
in Kazakhstan
4G (40%), 3G (73%), 2G
(96%) population
Great network
coverage
There is the ongoing
process of company digital
transformation
Not only telco
Business needs
Assisting Millions of Users in Real-Time
SMS events
Voice usage events
Data usage events
Roaming events
Location events
Input Process Actions
Use Cases
Use case scenarios. Just few of many.
Case
If subscriber top-ups her balance too often in
short period of time. We can offer her a less
expensive tariff or auto-payment services.
Balance Top Up Case
Trigger UI
Roaming
Fraud
Trigger to Marketing Platform if subscriber
visited X country OR/AND registered in Y
visited mobile network and his device's type
is Z
Roaming case
Send an email to the anti-fraud unit if
subscriber registered in roaming but his
balance at the moment is equal to 0.
This situation is impossible in standard case.
Fraud case in roaming
Personalized Notifications
Business Automation
Regulatory
Future Work
We have already done a lot. But more great things are coming.
2020 Q3 2020 Q4 2021 Q1 Bright Future
More Data Sources
More Triggers
Geolocation data
Equipment logs
Commoditize Machine Learning
Extract value from ML company-wide!
Enable easy ML training and productization
of models in real-time
Real-time BI
Intraday view on business and
operations
Monetize valuable insights from
our combined rich data sources.
Data Monetization
Predictive maintenance
Network Optimization
To lower operational costs
And make better investments
And many more...
Create behavioral profile of the
customers for better
personalised serving
Customer 360 view
Old System
Why did we start to look for the new solution?
External Vendor
Solution
Blackbox Solution
Scalability issues
Not reliable
1
2
3
Kcell Developers can’t fix, tweak or optimize it
Limited to ~2000 events / sec
Can’t support all needed data sources
Multiple accidents which took too much time to resolve
Scale
Required system throughput
500K
Events / second
10M
Subscribers
40
TB / month
New Solution
Real-time Stream Processing
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
New Solution
Real-time Stream Processing
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
flink flink
New Solution (Operations)
Web UI, Monitoring, Security
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Admin UI
(Triggers workbench)
Monitoring
Loki - logs
Prometheus/Grafana -
metrics
Security
FreeIPA
Kerberos
LDAP/AD
API (kafka based)
flink flink
New Solution (Data Lake)
Data Lake and Sub-second OLAP Analytics
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Data Lake
Historical Storage (HDFS)
Batch (Spark) SQL (Hive)
Keep history, Report, Explore
Column-oriented
Data store
OLAP (Druid)
Interactive BI
flink flink
Processing Flow
Real-time Stream Processing
raw call events
data usage events
transform
transformed events
transform
transformed events
local state
RocksDB
control topic
Admin UI
HTTP
calls
notification
events
outgestion
ingestion
ingestion
submit/stop
triggers
Dynamic Rules
Design
Some treats for Squirrels
Dynamic Rules Design
Key Points
● We want to run 100s of triggers/business rules
● A typical approach: job per rule
● Won’t work in our case:
○ Run 100s of topologies/jobs = multiplied resources cost
○ Pull data from Kafka 100s of times
○ State (user features) replicated 100s times
○ Starting rule requires deployment of the job
Dynamic Rules Design
Key Points
● Our approach: One job to run all triggers/rules
○ And to consume all the sources
● Trigger “templates” still coded with java
● adding/removing rules without restarting application
● 100s of rules running efficiently
Dynamic Rules Design
The Overview
billing events
roaming
Sort by time
control topic
notification
events
Deduplicate Router
Late events
Trigger 1
Trigger 2
State
Updater
Apply Triggers (CoProcess Function)
Keyed by User
Dynamic Rules Design
Pros and Cons
Shared resources and costs
● CPU, RAM, state, shuffle
● Pulling data from Kafka
One bad rule affects whole system
● Watermarks are shared
● Failures are shared
No job restart on start of new rules
● Rules started by business, no IT
involved
Still need to code rule template in
the job
● No way to use SQL, Table API, CEP
Sharing of state
● Build customer features, that can
be seen by all rules
Can be tricky to debug
● Code is shared
● Code paths enabled externally
Dynamic Rules Design
Issue: lagging sources slow down all rules
Source A:
highly unordered, late
Source B:
Ordered, low latency
Late notifications
Low latency
notifications
Triggers
Triggers
Group 1
Triggers
Group 2
Source A:
highly unordered, late
Source B:
Ordered, low latency
Triggers
Group
Late notifications
Problem Solution
Flink Changes Wishlist
What could be even better?
attach new branch to existing topology
that receives the same data
Dynamic Topologies
Cheaper topologies
● Graph of topologies that pass
data locally in Flink
● Other words: Local
Proxying/fan-out of Kafka traffic
Share inputs between topologies
Dynamic SQL
SQL
{ }
Decisions made
Some decisions our team made before or during project implementation
Streaming-first approach
Apache Kafka for event hub
Apache Flink
Powerful Real-Time Analytics
Apache Avro
Keep state local to the process
Ingest reference data for local joins and
enrichment
● No need to query external systems
while processing
● Data time correlation correctness
Performance
transformed
events
transformed
events
Subscriber profile data
(events)
Local State
Not at >100K
events / sec
Nifi for data ingestion (no coding)
● but not for CEP
Web UI for configuring triggers
Ease of Use
Flink on YARN, with HDFS
HA for redundancy and running ~24/7
Prometheus & Grafana for monitoring &
alerting
Loki for logs collection and aggregation
Reliability and battle-tested techniques
Kerberos and AD thanks to FreeIPA
Apache Ranger for authorization
Security
One platform for the whole Enterprise
Batch (adhoc) queries too
● Spark, Hive/Presto
Online analytics
● OLAP
Extensiveness
HDP
Open-source technologies
HDP as a licence-free distribution
Just start with a bunch of servers
Cost-Efficiency
Testing
def "should notify when user's balance drops below threshold"() {
given:
BalanceDropTrigger trigger =balanceDropTrigger()
.threshold(
50.0)
.outgestionSystem(
'campaignSystem')
.build()
admin.createsTrigger(trigger)
and:
user.withBalance(
60.0)
when:
user.makesCall(
phoneCall().amountSpent(
20.0))
then:
wait(allowedEventLateness)
and:
List<Notification> actualNotifications =
campaignSystem.getNotifications(
user, trigger)
and:
actualNotifications.size() ==1
assertThat(actualNotifications.first())
.hasMsisdn(
user.msisdn)
.hasBalanceAfter(
40.0)
cleanup:
admin.deletesTrigger(trigger)
}
flink
ingestion outgestion
events
hub
events
processing
Fake
Campaign
System
HTTP
push/pull
FTP
NFS
MQ
Test event generators
Preprod environment
Our Collaboration
Two heads are better than one
Joint development team
Not a vendor solution
Development as one team
Code quality
Code review and
automated tools for
code quality control
Agile Practices
Distant geographic
locations, but
everyday standups
Go live quickly!
<4 months to first
production case
running 24/7!
Deliver
DevOps/Automation
Knowledge sharing
Constant knowledge
exchange in areas of
expertise
Testing
Separate testing
environment
Automated Unit/E2E tests
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A

More Related Content

What's hot

Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Databricks
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
Albert Wong
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Databricks
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Trieu Nguyen
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Spain
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session
Splunk
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
VMware Tanzu Korea
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
Trieu Nguyen
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Databricks
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
InfluxData
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
Databricks
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Sri Ambati
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
Elasticsearch
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Impetus Technologies
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
Elasticsearch
 
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Spark Summit
 

What's hot (20)

Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
 

Similar to Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData

Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
GetInData
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...
Masahiko Umeno
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
confluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
actifio
 
Approaches to Network Automation
Approaches to Network AutomationApproaches to Network Automation
Approaches to Network Automation
APNIC
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
Karthik Murugesan
 
Deliver agile flow presentation (1)
Deliver agile   flow presentation (1)Deliver agile   flow presentation (1)
Deliver agile flow presentation (1)
James Urquhart
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
DevOps.com
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
JunSeok Seo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
 

Similar to Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData (20)

Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Approaches to Network Automation
Approaches to Network AutomationApproaches to Network Automation
Approaches to Network Automation
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Deliver agile flow presentation (1)
Deliver agile   flow presentation (1)Deliver agile   flow presentation (1)
Deliver agile flow presentation (1)
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

More from GetInData

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
GetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
GetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
GetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
GetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
GetInData
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
GetInData
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
GetInData
 

More from GetInData (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 

Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData

  • 1. Complex Event Processing platform handling millions of users Krzysztof Zarzycki - CTO @ Getindata
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. About us Founded in 2014 by ex-Spotify engineers. Focus only on Big Data and Cloud (from day 1) Community builders (Big Data Tech Warsaw organizers) 50+ Big Data engineers
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What is it? The application logic, analytics, and queries exist continuously, and data flows through them continuously. Stream Processing
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Why is it important for business? Stream Processing
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Actionable insights Stream Processing
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Processing Why is it important for engineering?
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. It’s NOT only about real-time It’s just natural - data comes continuously. Stream Processing
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Processing User sessions spanning minutes, hours, or days Batch boundaries are often artificial. ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [9:00 - 10:00) [10:00-11:00)
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Complex Event Processing ● Analyze patterns,relations, cause-and-effect ○ If A & B then C ● Infer business-relevant events from raw technical stream ● .. and cascade extraction of even higher-level events ● Alerts, triggers, workflow automation
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Complex Event Processing ● behavioral marketing ● product analytics ● business activity monitoring ● technical monitoring and anomaly detection ● IoT ● fraud detection
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ESP vs CEP ● Difference is blurry and diminishing ● Traditional CEP ○ Complex proc, low latency, single-machine ○ high-level language like SQL ● Traditional ESP ○ straightforward, high-throughput, distributed ○ Broader, more generic and low-level ● NOW: Best of both! ○ Often called “Streaming Analytics”
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The need ● Streaming model and real-time ● On par with batch ○ Enrichment, Joins ○ Aggregation ○ Reprocessing of historical data ○ Machine Learning scoring, inference ○ Complex Event Processing ● large scale, high-throughput ● correctness and fault tolerance
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The solution Apache Flink open-source stateful processor over massive data streams
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Who uses Flink
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. We use Flink! Banks Telcos Automotive Adtech Commiters to Flink
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late & Out-of-sequence events Breaks correctness. Often handled with very tedious user code. Or solved in batch by “waiting enough” and “processing twice”.
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late & Out-of-sequence events Handled by Flink in the framework Based on watermarks heuristics, that marks the progress of event time in the stream. Asserts that all earlier events have probably arrived.
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Local State Operational state obligatory for analytics Used for accumulators, windows, source offsets, tracking patterns, ... 6 sum 1 3 2 4 1 1
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Local State Persistent operational state local to computation Maximize performance with millions of updates per second & core. Enable out-of-core (more than RAM) processing, with RocksDb State Task 1 Logic State Task N Logic
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Fault tolerance (Checkpointing) State survives abrupt crashes or just maintenance Checkpointed regularly to resilient external storage. Accurate - keeps stream offsets, accumulators or windows in perfect sync, consistency. Efficient - almost no impact on the processing.
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Flink Cluster
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. APIs Java/Scala high-level Table API mid-level dataflow low-level advanced for tricky cases Developer Data Scientist Analyst SQL Incl. analytical functions MATCH_RECOGNIZE and UDF extensibility Python Based on Table API
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Big picture
  • 24. Assisting Millions of User in Real-Time Kcell case
  • 25. About Kcell Kcell has a strong software development team and lots of experience in building services and products We like innovations > 10 000 000 subscribers Largest GSM operator in Kazakhstan 4G (40%), 3G (73%), 2G (96%) population Great network coverage There is the ongoing process of company digital transformation Not only telco
  • 26. Business needs Assisting Millions of Users in Real-Time SMS events Voice usage events Data usage events Roaming events Location events Input Process Actions
  • 27. Use Cases Use case scenarios. Just few of many. Case If subscriber top-ups her balance too often in short period of time. We can offer her a less expensive tariff or auto-payment services. Balance Top Up Case Trigger UI
  • 28. Roaming Fraud Trigger to Marketing Platform if subscriber visited X country OR/AND registered in Y visited mobile network and his device's type is Z Roaming case Send an email to the anti-fraud unit if subscriber registered in roaming but his balance at the moment is equal to 0. This situation is impossible in standard case. Fraud case in roaming
  • 30. Future Work We have already done a lot. But more great things are coming. 2020 Q3 2020 Q4 2021 Q1 Bright Future More Data Sources More Triggers Geolocation data Equipment logs Commoditize Machine Learning Extract value from ML company-wide! Enable easy ML training and productization of models in real-time Real-time BI Intraday view on business and operations Monetize valuable insights from our combined rich data sources. Data Monetization Predictive maintenance Network Optimization To lower operational costs And make better investments And many more... Create behavioral profile of the customers for better personalised serving Customer 360 view
  • 31. Old System Why did we start to look for the new solution? External Vendor Solution Blackbox Solution Scalability issues Not reliable 1 2 3 Kcell Developers can’t fix, tweak or optimize it Limited to ~2000 events / sec Can’t support all needed data sources Multiple accidents which took too much time to resolve
  • 32. Scale Required system throughput 500K Events / second 10M Subscribers 40 TB / month
  • 33. New Solution Real-time Stream Processing ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  • 34. New Solution Real-time Stream Processing flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ flink flink
  • 35. New Solution (Operations) Web UI, Monitoring, Security flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Admin UI (Triggers workbench) Monitoring Loki - logs Prometheus/Grafana - metrics Security FreeIPA Kerberos LDAP/AD API (kafka based) flink flink
  • 36. New Solution (Data Lake) Data Lake and Sub-second OLAP Analytics flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Data Lake Historical Storage (HDFS) Batch (Spark) SQL (Hive) Keep history, Report, Explore Column-oriented Data store OLAP (Druid) Interactive BI flink flink
  • 37. Processing Flow Real-time Stream Processing raw call events data usage events transform transformed events transform transformed events local state RocksDB control topic Admin UI HTTP calls notification events outgestion ingestion ingestion submit/stop triggers
  • 39. Dynamic Rules Design Key Points ● We want to run 100s of triggers/business rules ● A typical approach: job per rule ● Won’t work in our case: ○ Run 100s of topologies/jobs = multiplied resources cost ○ Pull data from Kafka 100s of times ○ State (user features) replicated 100s times ○ Starting rule requires deployment of the job
  • 40. Dynamic Rules Design Key Points ● Our approach: One job to run all triggers/rules ○ And to consume all the sources ● Trigger “templates” still coded with java ● adding/removing rules without restarting application ● 100s of rules running efficiently
  • 41. Dynamic Rules Design The Overview billing events roaming Sort by time control topic notification events Deduplicate Router Late events Trigger 1 Trigger 2 State Updater Apply Triggers (CoProcess Function) Keyed by User
  • 42. Dynamic Rules Design Pros and Cons Shared resources and costs ● CPU, RAM, state, shuffle ● Pulling data from Kafka One bad rule affects whole system ● Watermarks are shared ● Failures are shared No job restart on start of new rules ● Rules started by business, no IT involved Still need to code rule template in the job ● No way to use SQL, Table API, CEP Sharing of state ● Build customer features, that can be seen by all rules Can be tricky to debug ● Code is shared ● Code paths enabled externally
  • 43. Dynamic Rules Design Issue: lagging sources slow down all rules Source A: highly unordered, late Source B: Ordered, low latency Late notifications Low latency notifications Triggers Triggers Group 1 Triggers Group 2 Source A: highly unordered, late Source B: Ordered, low latency Triggers Group Late notifications Problem Solution
  • 44. Flink Changes Wishlist What could be even better? attach new branch to existing topology that receives the same data Dynamic Topologies Cheaper topologies ● Graph of topologies that pass data locally in Flink ● Other words: Local Proxying/fan-out of Kafka traffic Share inputs between topologies Dynamic SQL SQL { }
  • 45. Decisions made Some decisions our team made before or during project implementation Streaming-first approach Apache Kafka for event hub Apache Flink Powerful Real-Time Analytics
  • 46. Apache Avro Keep state local to the process Ingest reference data for local joins and enrichment ● No need to query external systems while processing ● Data time correlation correctness Performance transformed events transformed events Subscriber profile data (events) Local State Not at >100K events / sec
  • 47. Nifi for data ingestion (no coding) ● but not for CEP Web UI for configuring triggers Ease of Use
  • 48. Flink on YARN, with HDFS HA for redundancy and running ~24/7 Prometheus & Grafana for monitoring & alerting Loki for logs collection and aggregation Reliability and battle-tested techniques Kerberos and AD thanks to FreeIPA Apache Ranger for authorization Security
  • 49. One platform for the whole Enterprise Batch (adhoc) queries too ● Spark, Hive/Presto Online analytics ● OLAP Extensiveness HDP Open-source technologies HDP as a licence-free distribution Just start with a bunch of servers Cost-Efficiency
  • 50. Testing def "should notify when user's balance drops below threshold"() { given: BalanceDropTrigger trigger =balanceDropTrigger() .threshold( 50.0) .outgestionSystem( 'campaignSystem') .build() admin.createsTrigger(trigger) and: user.withBalance( 60.0) when: user.makesCall( phoneCall().amountSpent( 20.0)) then: wait(allowedEventLateness) and: List<Notification> actualNotifications = campaignSystem.getNotifications( user, trigger) and: actualNotifications.size() ==1 assertThat(actualNotifications.first()) .hasMsisdn( user.msisdn) .hasBalanceAfter( 40.0) cleanup: admin.deletesTrigger(trigger) } flink ingestion outgestion events hub events processing Fake Campaign System HTTP push/pull FTP NFS MQ Test event generators Preprod environment
  • 51. Our Collaboration Two heads are better than one Joint development team Not a vendor solution Development as one team Code quality Code review and automated tools for code quality control Agile Practices Distant geographic locations, but everyday standups Go live quickly! <4 months to first production case running 24/7! Deliver DevOps/Automation Knowledge sharing Constant knowledge exchange in areas of expertise Testing Separate testing environment Automated Unit/E2E tests
  • 52. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A