This talk shows how to build Machine Learning models at extreme scale and how to productionize the built models in mission-critical real time applications by leveraging open source components like TensorFlow and the Apache Kafka open source ecosystem in the public cloud - and why this is a great fit for machine learning at extreme scale. A live demo shows sensor analytics for predictive alerting in real time.
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka and TensorFlow - Codemotion Berin 2018
1. 1Apache Kafka and Machine Learning – Kai Waehner
Unleashing Apache Kafka and TensorFlow in the Cloud
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2. 3Apache Kafka and Machine Learning – Kai Waehner
Disclaimer: This is a fictional story (but not far from reality)…
3. 4Apache Kafka and Machine Learning – Kai Waehner
Global automotive company builds connected car infrastructure
Digital Transformation
• Improve customer experience
• Increase revenue
• Reduce risk
Time
Today 2 years later3 years ago
Project begins Connected car
infrastructure in production
for first simple use cases
Improved processes
leveraging machine learning
4. 5Apache Kafka and Machine Learning – Kai Waehner
Analyze and act on critical business moments
Seconds Minutes Hours
Real Time
Tracking
Predictive
Maintenance
Fraud
Detection
Cross Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Windows of Opportunity
5. 6Apache Kafka and Machine Learning – Kai Waehner
Machine Learning (ML)
...allows computers to find hidden insights without being explicitly
programmed where to look.
Machine Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep Learning
• CNN
• RNN
• Autoencoder
• Etc.
6. 7Apache Kafka and Machine Learning – Kai Waehner
The First Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero downtime?
7. 8Apache Kafka and Machine Learning – Kai Waehner
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
8. 9Apache Kafka and Machine Learning – Kai Waehner
Scalable, Technology-Agnostic ML Infrastructures
https://www.infoq.com/presentations/netflix-ml-meson
https://eng.uber.com/michelangelo
https://www.infoq.com/presentations/paypal-data-service-fraud
What is this
thing used everywhere?
9. 10Apache Kafka and Machine Learning – Kai Waehner
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka—The Rise of a Streaming Platform
10. 11Apache Kafka and Machine Learning – Kai Waehner
Apache Kafka at Scale
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63921
https://qconlondon.com/london2018/presentation/cloud-native-and-scalable-kafka-architecture
(2018)
(2018)
11. 12Apache Kafka and Machine Learning – Kai Waehner
Apache Kafka’s Open Source Ecosystem as Infrastructure for ML
12. 13Apache Kafka and Machine Learning – Kai Waehner
Apache Kafka’s Open Source Ecosystem as Infrastructure for ML
Kafka
Streams
Kafka
Connect
Rest Proxy
Schema Registry
Go/.NET /Python
Kafka Producer
KSQL
Kafka
Streams
13. 14Apache Kafka and Machine Learning – Kai Waehner
Getting Started
Okay, let’s build our own
ML infrastructure step by
step. Where do we start?
14. 15Apache Kafka and Machine Learning – Kai Waehner
Connected Car Infrastructure in Production on AWS
Kafka BrokerKafka BrokerKafka Broker
MQTT
ProxyMQTT
DevicesDevicesDevicesGateways
DevicesDevicesDevicesDevices MQTT
Real time tracking of the cars
to enable new, innovative digital services
The big data team
has the data already.
15. 17Apache Kafka and Machine Learning – Kai Waehner
Replication of IoT Data from AWS to GCP
Confluent
Replicator
(via Control Center UI)
DevicesDevicesDevicesDevicesDevices
Analytics
We should also use
Kafka, but—oh no…GCP
is the strategic cloud
for the analytics team!
16. 19Apache Kafka and Machine Learning – Kai Waehner
Data Preprocessing
Preprocessing
Filter, transform, anonymize, feature extraction
Data needs to be
preprocessed at
scale and reusable!
Streams
• Use KSQL to preprocess data at scale without coding
• Use SQL statements for interactive analysis
+ deployment to production at scale
• Leverage e.g. Python with KSQL REST interface
Data Ready
for
Model Training
17. 20Apache Kafka and Machine Learning – Kai Waehner
Preprocessing with KSQL
SELECT car_id, event_id, car_model_id, sensor_input
FROM car_sensor c
LEFT JOIN car_models m ON c.car_model_id =
m.car_model_id
WHERE m.car_model_type ='Audi_A8';
18. 21Apache Kafka and Machine Learning – Kai Waehner
Data Ingestion
Connect
• “Kafka Benefits Under the Hood”
• Out-of-the-box connectivity
• Data format conversion
• Single message transformation
(including error-handling)
Preprocessed
Data
There isn’t just
one ML solution.
We need to be
flexible!
19. 22Apache Kafka and Machine Learning – Kai Waehner
Model Training
Let’s build some models
at extreme scale using
TensorFlow and TPUs!
Analytic Model
20. 24Apache Kafka and Machine Learning – Kai Waehner
Replayability — a log never forgets!
Time
Model B Model XModel A
Producer
Distributed Commit Log
Different models with same data
Different ML frameworks
AutoML compatible
A/B testing
Google Cloud Storage HDFS
21. 25Apache Kafka and Machine Learning – Kai Waehner
The Need for Local Data Processing
Confluent
Replicator
PII data Local Processing
We are ready to use our
models for predictions,
BUT all the PII data needs to
be processed in our local data
center!
CLOUD
ON PREMISE
Analytic
Model
22. 27Apache Kafka and Machine Learning – Kai Waehner
Self managed on premise deployment for model deployment and monitoring
Oh no…self-managed
Kubernetes + Kafka ecosystem
= operations nightmare
What about scaling brokers,
external clients, persistent
volumes, failover, rolling
upgrades, and so on?
Confluent Operator takes over the challenge of operating Kafka on Kubernetes!
(Automated provisioning, scaling, fail-over, partition rebalancing, rolling updates, monitoring, …)
23. 28Apache Kafka and Machine Learning – Kai Waehner
Stream Processing vs. Request-Response for Model Serving
Okay, we deploy locally.
But how to do the model inference?
Can Kafka and Kubernetes
help here?
24. 29Apache Kafka and Machine Learning – Kai Waehner
Option 1: gRPC communication to do model inference
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC
25. 30Apache Kafka and Machine Learning – Kai Waehner
Option 2: Model interference natively integrated into the App
Streams
Input Event
Prediction
26. 32Apache Kafka and Machine Learning – Kai Waehner
Monitoring the infrastructure for ML
Kafka
Streams
Kafka
Connect
Rest Proxy
Schema Registry
Go / .NET / Python
Kafka Producer
KSQL
Kafka
Streams
Control Center
Build vs. Buy
Hosted vs. Managed
Basic vs. Advanced
27. 33Apache Kafka and Machine Learning – Kai Waehner
Example: Anomaly Detection System to Predict Defects in Car Engine
MQTT
Proxy
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Car Sensors
Kafka Ecosystem
Other Components
Real Time
Emergency
System
All Data
PotentialDefect
Apply
Analytic
Model
Filter
Anomalies
On premise DC: Kubernetes + Confluent OperatorAt the edge
28. 34Apache Kafka and Machine Learning – Kai Waehner
KSQL and Deep Learning (Auto Encoder) for Fraud Detection
“CREATE STREAM FraudDetection AS
SELECT payment_id, applyFraudModel(payment_input)
FROM payment_engine;“
User Defined Function (UDF)
The model performs well
and scales as needed,
because we use the same
integration and processing
pipeline for training AND
deployment of the model!
29. 35Apache Kafka and Machine Learning – Kai Waehner
Live Demo: Machine Learning at Scale in Hybrid Deployments
30. 36Apache Kafka and Machine Learning – Kai Waehner
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data
https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot
31. 37Apache Kafka and Machine Learning – Kai Waehner
Comparing our current project status to others
Well, we are not there yet,
but getting closer every
month!
32. 40Apache Kafka and Machine Learning – Kai Waehner
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
www.confluent.io
LinkedIn
Questions? Feedback?
Please contact me!
33. 41Apache Kafka and Machine Learning – Kai Waehner
Confluent Cloud
Apache Kafka
Connect / Pub-Sub / Streams
Development & Connectivity
Clients / Connectors / REST Proxy / KSQL
Fully-Managed Service
Monitoring / Replication / Data Balancing / Rolling Upgrades
34. 42Apache Kafka and Machine Learning – Kai Waehner
Confluent Cloud
(Q1 2019)
Why Confluent Cloud?
• No operations burden
• 99.95 Enterprise SLA, guaranteed high throughput,
10ms latency end-to-end
• Confluent Ecosystem, Multi-Cloud + on premise
Deployments
• End-to-End monitoring with Confluent Control Center
• Confluent IP + Support
35. 43Apache Kafka and Machine Learning – Kai Waehner
Hybrid and Multi-Cloud Data Pipelines
Confluent
Cloud
Confluent
Cloud
Datacenter
Multi-CloudDataPipeline
Confluent
Cloud
36. 44Apache Kafka and Machine Learning – Kai Waehner
From Dev to Production
Cloud Professional Cloud Enterprise
● Max 5 MB/s read and write
● Max 30 day retentionScale
● Configurable Thruput (no limit)
● Unlimited retention
● Best effort uptime
● Single zone only
● 99.95% uptime SLA
● Up to 3 availability zones
● Community
● Gold Support: 24x7 access
with 1 hour response time
Availability
Support
● Not available ● OptionalVPC Peering