Scalable, real-time Machine Learning using
Apache Kafka
Agenda
● Traditional model deployment process
● 90 seconds to WoW
● Let’s process the incoming stream
● Demo
● What’s more?
2
$ whoami
● Personalisation lead at Hotstar
● Led Data Infrastructure team at Grofers and TinyOwl
● Kafka fanboy
● Usually rant on twitter @jayeshsidhwani
3
Machine Learning @ Hotstar
● ~150 mn users
● 4.8 mn peak concurrency
● 120K peak recommendation requests per
second
● Diverse content in diverse languages
4
Traditional model deployment process
5
Model
Training
Data Lake
Serialized
Model
Batch
Predictions
Recommendation
APIs
Offline Online
● One-day /
few-hours batch
pre-compute
● Slow time to
react
Sense of urgency?
6
● 90 seconds to convert a new user
● To power his experience, we need to know
user’s gender, interests and more
● Need an always-thinking machine
Thinking streams
7
Data at Rest Data in motion
● Slow
● Batch-y
● Fast
● Sub-second
Enter Apache Kafka
8
● Kafka is a scalable,
fault-tolerant, distributed message
queue
● Producers and Consumers
● Uses
○ Real-time applications such as:
intelligent notifications, anomaly etc.
○ Asynchronous communication in
event-driven architectures
Diagram credits: http://kafka.apache.org
Real-time infrastructure at Hotstar
9
● All clickstream data pushed
into Apache Kafka
● Apache Kafka Streams to
process events as they happen
● Incoming data available for
everyone
Intelligence
Apple
TV
iOS ANDROID Roku
STREAM PROCESSING FRAMEWORK
Filter
Window
Join
Anomaly
Machine
Learning
Demo
Predict whether a flight is delayed in real-time
10
How to process a stream?
11
ML
Advanced use-cases
12
page-clicksProcessor nodes
Source / Sink nodes
video-plays
predict-gender
predict-interest 5-min trending
videos
Recommended
for You
Hotstar Streaming Platform
Questions?
13
tech.hotstar.com

Build intelligent, real-time applications using Machine Learning

  • 1.
    Scalable, real-time MachineLearning using Apache Kafka
  • 2.
    Agenda ● Traditional modeldeployment process ● 90 seconds to WoW ● Let’s process the incoming stream ● Demo ● What’s more? 2
  • 3.
    $ whoami ● Personalisationlead at Hotstar ● Led Data Infrastructure team at Grofers and TinyOwl ● Kafka fanboy ● Usually rant on twitter @jayeshsidhwani 3
  • 4.
    Machine Learning @Hotstar ● ~150 mn users ● 4.8 mn peak concurrency ● 120K peak recommendation requests per second ● Diverse content in diverse languages 4
  • 5.
    Traditional model deploymentprocess 5 Model Training Data Lake Serialized Model Batch Predictions Recommendation APIs Offline Online ● One-day / few-hours batch pre-compute ● Slow time to react
  • 6.
    Sense of urgency? 6 ●90 seconds to convert a new user ● To power his experience, we need to know user’s gender, interests and more ● Need an always-thinking machine
  • 7.
    Thinking streams 7 Data atRest Data in motion ● Slow ● Batch-y ● Fast ● Sub-second
  • 8.
    Enter Apache Kafka 8 ●Kafka is a scalable, fault-tolerant, distributed message queue ● Producers and Consumers ● Uses ○ Real-time applications such as: intelligent notifications, anomaly etc. ○ Asynchronous communication in event-driven architectures Diagram credits: http://kafka.apache.org
  • 9.
    Real-time infrastructure atHotstar 9 ● All clickstream data pushed into Apache Kafka ● Apache Kafka Streams to process events as they happen ● Incoming data available for everyone Intelligence Apple TV iOS ANDROID Roku STREAM PROCESSING FRAMEWORK Filter Window Join Anomaly Machine Learning
  • 10.
    Demo Predict whether aflight is delayed in real-time 10
  • 11.
    How to processa stream? 11 ML
  • 12.
    Advanced use-cases 12 page-clicksProcessor nodes Source/ Sink nodes video-plays predict-gender predict-interest 5-min trending videos Recommended for You Hotstar Streaming Platform
  • 13.