SlideShare a Scribd company logo
1 of 35
Download to read offline
1
User behavior analysis with
Session Windows and
Apache Kafka’s Streams API
Michael G. Noll
Product Manager
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
3
Kafka Streams API: to build real-time apps that power your core business
Key benefits
• Makes your Java apps highly scalable,
elastic, fault-tolerant, stateful, distributed
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully integrated
from Kafka
Example Use Cases
• Microservices
• Reactive applications
• Continuous queries
• Continuous transformations
• Event-triggered processes
Streams
API
App Instance 1
Kafka
Cluster
Streams
API
App Instance N
Your
App ...
4
Use case examples
Industry Use case examples
Travel Build applications with the Kafka Streams API to make real-time decisions to find
best suitable pricing for individual customers, to cross-sell additional services,
and to process bookings and reservations
Finance Build applications to aggregate data sources for real-time views of potential
exposures and for detecting and minimizing fraudulent transactions
Logistics Build applications to track shipments fast, reliably, and in real-time
Retail Build applications to decide in real-time on next best offers, personalized
promotions, pricing, and inventory management
Automotive,
Manufacturing
Build applications to ensure their production lines perform optimally, to gain real-
time insights into supply chains, and to monitor telemetry data from connected
cars to decide if an inspection is needed
And many more …
5
Some public use cases in the wild
• Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming
• http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/
• Applying Kafka’s Streams API for social messaging at LINE Corp.
• http://developers.linecorp.com/blog/?p=3960
• Production pipeline at LINE, a social platform based in Japan with 220+ million users
• Microservices and Reactive Applications at Capital One
• https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams
• Containerized Kafka Streams applications in Scala, by Hive Streaming
• https://www.madewithtea.com/processing-tweets-with-kafka-streams.html
• Geo-spatial data analysis
• http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
• Language classification with machine learning
• https://dzone.com/articles/machine-learning-with-kafka-streams
6
Kafka Summit NYC, May 09
Here, the community will share
latest Kafka Streams use cases.
http://kafka-summit.org/
7
Agenda
• Why are session windows so important?
• Recap: What is windowing?
• Session windows – example use case
• Session windows – how they work
• Session windows – API
8
Why are session windows so important?
• We want to analyze user behavior, which is a very common use case area
• To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc.
• AND tailor the analysis to the individual user
• Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go?
• Achieved through a per-user sessionization step on the input data.
• AND this tailoring must be convenient and scalable
• Achieved through automating the sessionization step, i.e. auto-discovery of sessions
• Session-based analyses can range from simple metrics (e.g. count of user visits on a news website
or social platform) to more complex metrics (e.g. customer conversion funnel and event flows).
9
What is windowing?
• Aggregations such as “counting things” are key-based operations
• Before you can aggregate your input data, it must first be grouped by key
event-time8 AM7 AM6 AM event-time
Alice
Bob
Dave
8 AM7 AM6 AM
10
What is windowing?
• Aggregations such as “counting things” are key-based operations
Alice: 10 movies
Bob: 11 movies
Dave: 8 movies
“Let me COUNT how many movies each user has watched (IN TOTAL)”
event-time
Alice
Bob
Dave
Feb 7Feb 6Feb 5
11
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
“Let me COUNT how many movies each user has watched PER DAY”
Alice: 4 movies
Bob: 3 movies
Dave: 2 movies
Feb 5
Feb 7Feb 6Feb 5
12
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 1 movie
Bob: 2 movies
Dave: 4 movies
Feb 6
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
13
What is windowing?
• Windowing allows you to further “sub-group” the input data for each user
event-time
Alice
Bob
Dave
Alice: 4 movies
Bob: 4 movies
Dave: 1 movie
Feb 7
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER DAY”
14
Session windows: use case
• Session windows allow for “how many X in one go?” analyses, tailored to each key
• Sessions are auto-discovered from the input data (we see how later)
event-time
Alice
Bob
Dave
Alice: 1, 4, 1, 4 movies
(4 sessions)
Bob: 4, 6 movies
(2 sessions)
Dave: 3, 5 movies
(2 sessions)
Feb 7Feb 6Feb 5
“Let me COUNT how many movies each user has watched PER SESSION”
15
Comparing results
• Let’s compare how results differ
Alice
Bob
Dave
IN TOTAL
10
11
8
PER DAY
3.0 (avg)
3.0 (avg)
2.3 (avg)
time windows
PER SESSION
2.5 (avg)
5.0 (avg)
4.0 (avg)
session windowsno windows
16
Comparing results
• Let’s compare how results differ if we our task was to rank the top users
Alice
Bob
Dave
IN TOTAL
#2
#1
#3
PER DAY
#1
#1
#3
time windows
PER SESSION
#3
#1
#2
session windowsno windows
17Confidential
Session windows: how they work
18
Session windows: how they work
• Definition of a session in Kafka Streams API is based on a configurable period of inactivity
• Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new session!”
Inactivity period
19
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
20
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
Inactivity period (e.g. 3 hours)
21
Auto-discovering sessions, per user
event-time
Alice
Bob
Dave
… …
… …
… …
Example: How many movies does Alice watch on average per session?”
22
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
23
Late-arriving data: example
Users with mobile phones enter
airplane, lose Internet connectivity
Emails are being written
during the 8h flight
Internet connectivity is restored,
phones will send queued emails now,
though with an 8h delay
Bob writes Alice an
email at 2 P.M.
Bob’s email is finally
being sent at 10 P.M.
24
Late-arriving data is handled transparently
• Handling of late-arriving data is important because, in practice, a lot of data arrives late
• Good news: late-arriving data is handled transparently and efficiently for you
• Also, in your applications, you can define a grace period after which late-arriving data will be
discarded (default: 1 day), and you can define this granularly per windowed operation
• Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving
data should be discarded if it is more than 12 hours late”
25
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
• Late-arriving data may (1) create new sessions or (2) merge existing sessions
26
Sessions potentially merge as new events arrive
Session Window
27
Sessions potentially merge as new events arrive
Session Window
28
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
29
Late-arriving data is handled transparently
event-time
Alice
Bob
Dave
… …
… …
… …
30Confidential
Session windows: API
31Confidential
Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2
// A session window with an inactivity gap of 3h; discard data that is 12h late
SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12));
Defining a session window
// Key (String) is user, value (Avro record) is the movie view event for that user.
KStream<String, GenericRecord> movieViews = ...;
// Count movie views per session, per user
KTable<Windowed<String>, Long> sessionizedMovieCounts =
movieViews
.groupByKey(Serdes.String(), genericAvroSerde)
.count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)), "views-per-session");
Full example: aggregating with session windows
More details with documentation and examples at:
http://docs.confluent.io/current/streams/developer-guide.html#session-windows
https://github.com/confluentinc/examples
32Confidential
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
UP
NEXT
33
Why Confluent? More than just enterprise software
Confluent Platform
The only enterprise open
source streaming platform
based entirely on Apache
Kafka
Professional Services
Best practice consultation for
future Kafka deployments and
optimize for performance and
scalability of existing ones
Enterprise Support
24x7 support for the entire
Apache Kafka project, not just
a portion of it
Complete support across the entire adoption lifecycle
Kafka Training
Comprehensive hands-on
courses for developers and
operators from the Apache
Kafka experts
34
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality
assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
35
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by

More Related Content

What's hot

What's hot (20)

Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
GraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajevGraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajev
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Quarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniquesQuarkus tips, tricks, and techniques
Quarkus tips, tricks, and techniques
 
Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
 
Anypoint monitoring capabilities
Anypoint monitoring capabilitiesAnypoint monitoring capabilities
Anypoint monitoring capabilities
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020MuleSoft Online meetup -  An expert's guide to Runtime fabric - August 2020
MuleSoft Online meetup - An expert's guide to Runtime fabric - August 2020
 
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 

Similar to Using Apache Kafka to Analyze Session Windows

Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

Similar to Using Apache Kafka to Analyze Session Windows (20)

Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified log
 
Achieving a Serverless Development Experience
Achieving a Serverless Development ExperienceAchieving a Serverless Development Experience
Achieving a Serverless Development Experience
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS Deployments
 
Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analytics
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 

More from confluent

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 

Using Apache Kafka to Analyze Session Windows

  • 1. 1 User behavior analysis with Session Windows and Apache Kafka’s Streams API Michael G. Noll Product Manager
  • 2. 2 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 3. 3 Kafka Streams API: to build real-time apps that power your core business Key benefits • Makes your Java apps highly scalable, elastic, fault-tolerant, stateful, distributed • No additional cluster • Easy to run as a service • Supports large aggregations and joins • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Reactive applications • Continuous queries • Continuous transformations • Event-triggered processes Streams API App Instance 1 Kafka Cluster Streams API App Instance N Your App ...
  • 4. 4 Use case examples Industry Use case examples Travel Build applications with the Kafka Streams API to make real-time decisions to find best suitable pricing for individual customers, to cross-sell additional services, and to process bookings and reservations Finance Build applications to aggregate data sources for real-time views of potential exposures and for detecting and minimizing fraudulent transactions Logistics Build applications to track shipments fast, reliably, and in real-time Retail Build applications to decide in real-time on next best offers, personalized promotions, pricing, and inventory management Automotive, Manufacturing Build applications to ensure their production lines perform optimally, to gain real- time insights into supply chains, and to monitor telemetry data from connected cars to decide if an inspection is needed And many more …
  • 5. 5 Some public use cases in the wild • Why Kafka Streams: towards a real-time streaming architecture, by Sky Betting and Gaming • http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/ • Applying Kafka’s Streams API for social messaging at LINE Corp. • http://developers.linecorp.com/blog/?p=3960 • Production pipeline at LINE, a social platform based in Japan with 220+ million users • Microservices and Reactive Applications at Capital One • https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams • Containerized Kafka Streams applications in Scala, by Hive Streaming • https://www.madewithtea.com/processing-tweets-with-kafka-streams.html • Geo-spatial data analysis • http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ • Language classification with machine learning • https://dzone.com/articles/machine-learning-with-kafka-streams
  • 6. 6 Kafka Summit NYC, May 09 Here, the community will share latest Kafka Streams use cases. http://kafka-summit.org/
  • 7. 7 Agenda • Why are session windows so important? • Recap: What is windowing? • Session windows – example use case • Session windows – how they work • Session windows – API
  • 8. 8 Why are session windows so important? • We want to analyze user behavior, which is a very common use case area • To analyze user behavior on newspapers, social platforms, video sharing sites, booking sites, etc. • AND tailor the analysis to the individual user • Specifically, analyses of the type “how many X in one go?” – how many movies watched in one go? • Achieved through a per-user sessionization step on the input data. • AND this tailoring must be convenient and scalable • Achieved through automating the sessionization step, i.e. auto-discovery of sessions • Session-based analyses can range from simple metrics (e.g. count of user visits on a news website or social platform) to more complex metrics (e.g. customer conversion funnel and event flows).
  • 9. 9 What is windowing? • Aggregations such as “counting things” are key-based operations • Before you can aggregate your input data, it must first be grouped by key event-time8 AM7 AM6 AM event-time Alice Bob Dave 8 AM7 AM6 AM
  • 10. 10 What is windowing? • Aggregations such as “counting things” are key-based operations Alice: 10 movies Bob: 11 movies Dave: 8 movies “Let me COUNT how many movies each user has watched (IN TOTAL)” event-time Alice Bob Dave Feb 7Feb 6Feb 5
  • 11. 11 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave “Let me COUNT how many movies each user has watched PER DAY” Alice: 4 movies Bob: 3 movies Dave: 2 movies Feb 5 Feb 7Feb 6Feb 5
  • 12. 12 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 1 movie Bob: 2 movies Dave: 4 movies Feb 6 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 13. 13 What is windowing? • Windowing allows you to further “sub-group” the input data for each user event-time Alice Bob Dave Alice: 4 movies Bob: 4 movies Dave: 1 movie Feb 7 Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER DAY”
  • 14. 14 Session windows: use case • Session windows allow for “how many X in one go?” analyses, tailored to each key • Sessions are auto-discovered from the input data (we see how later) event-time Alice Bob Dave Alice: 1, 4, 1, 4 movies (4 sessions) Bob: 4, 6 movies (2 sessions) Dave: 3, 5 movies (2 sessions) Feb 7Feb 6Feb 5 “Let me COUNT how many movies each user has watched PER SESSION”
  • 15. 15 Comparing results • Let’s compare how results differ Alice Bob Dave IN TOTAL 10 11 8 PER DAY 3.0 (avg) 3.0 (avg) 2.3 (avg) time windows PER SESSION 2.5 (avg) 5.0 (avg) 4.0 (avg) session windowsno windows
  • 16. 16 Comparing results • Let’s compare how results differ if we our task was to rank the top users Alice Bob Dave IN TOTAL #2 #1 #3 PER DAY #1 #1 #3 time windows PER SESSION #3 #1 #2 session windowsno windows
  • 18. 18 Session windows: how they work • Definition of a session in Kafka Streams API is based on a configurable period of inactivity • Example: “If Alice hasn’t watched another movie in the past 3 hours, then next movie = new session!” Inactivity period
  • 19. 19 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … …
  • 20. 20 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?” Inactivity period (e.g. 3 hours)
  • 21. 21 Auto-discovering sessions, per user event-time Alice Bob Dave … … … … … … Example: How many movies does Alice watch on average per session?”
  • 22. 22 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late
  • 23. 23 Late-arriving data: example Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 8h flight Internet connectivity is restored, phones will send queued emails now, though with an 8h delay Bob writes Alice an email at 2 P.M. Bob’s email is finally being sent at 10 P.M.
  • 24. 24 Late-arriving data is handled transparently • Handling of late-arriving data is important because, in practice, a lot of data arrives late • Good news: late-arriving data is handled transparently and efficiently for you • Also, in your applications, you can define a grace period after which late-arriving data will be discarded (default: 1 day), and you can define this granularly per windowed operation • Example: “I want to sessionize the input data based on 15-min inactivity periods, and late-arriving data should be discarded if it is more than 12 hours late”
  • 25. 25 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … … • Late-arriving data may (1) create new sessions or (2) merge existing sessions
  • 26. 26 Sessions potentially merge as new events arrive Session Window
  • 27. 27 Sessions potentially merge as new events arrive Session Window
  • 28. 28 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 29. 29 Late-arriving data is handled transparently event-time Alice Bob Dave … … … … … …
  • 31. 31Confidential Session windows: API in Confluent 3.2 / Apache Kafka 0.10.2 // A session window with an inactivity gap of 3h; discard data that is 12h late SessionWindows.with(TimeUnit.HOURS.toMillis(3)).until(TimeUnit.HOURS.toMillis(12)); Defining a session window // Key (String) is user, value (Avro record) is the movie view event for that user. KStream<String, GenericRecord> movieViews = ...; // Count movie views per session, per user KTable<Windowed<String>, Long> sessionizedMovieCounts = movieViews .groupByKey(Serdes.String(), genericAvroSerde) .count(SessionWindows.with(TimeUnit.HOURS.toMillis(3)), "views-per-session"); Full example: aggregating with session windows More details with documentation and examples at: http://docs.confluent.io/current/streams/developer-guide.html#session-windows https://github.com/confluentinc/examples
  • 32. 32Confidential Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing UP NEXT
  • 33. 33 Why Confluent? More than just enterprise software Confluent Platform The only enterprise open source streaming platform based entirely on Apache Kafka Professional Services Best practice consultation for future Kafka deployments and optimize for performance and scalability of existing ones Enterprise Support 24x7 support for the entire Apache Kafka project, not just a portion of it Complete support across the entire adoption lifecycle Kafka Training Comprehensive hands-on courses for developers and operators from the Apache Kafka experts
  • 34. 34 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  • 35. 35 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by