SlideShare a Scribd company logo
1 of 17
Download to read offline
Machine Learning for
Real Time Streaming Data with
Kafka and TensorFlow
Yong Tang
Maintainer & SIG IO Lead,
TensorFlow
GitHub: yongtang
Stream Data Processing & Machine Learning
Apache Kafka & Event Driven Microservices
Service
Service
Service
Service
Service
TensorFlow & 2.0
• Most popular machine learning framework
• 1.x
• 2.0
• Keras high level API
• Eager execution
• Data processing: tf.data
• Distribution Strategy
Training Workflow (TensorFlow 2.0)
Read and Preprocess
Data
Model Building Training Saving
tf.data
tf.keras
tf.estimator
eager execution
autograph
distribution strategy
saved model
# <= read data into tf.data
dataset = …. , test_dataset = ….
# <= build keras model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax) ])
model.compile(optimizer='adam’, loss='sparse_categorical_crossentropy’,
metrics=['accuracy’])
# <= train the model
model.fit(dataset, epochs=5)
# <= inference
predictions = model.predict(test_dataset, callbacks=[callback])
Steam Data + Machine Learning: Challenges
Streaming & big data
• Hadoop/Zookeeper
• Kafka, Spark, Flink
• Java/Scala
• Parquet/Avro/Arrow
Machine leaning
• CPU/GPU/TPU
• Python (C++, CUDA)
• Numpy
• TFRecord/CSV/Text/JPEG/PNG
TensorFlow TFRecord Format
• Sequence of binary strings
• Protocol buffer for serialization
• Flexible and simple
• ONLY used by TensorFlow
• NO ecosystem support like parquet, etc.
Infrastructure Solution
Service
Service
TensorFlow
Convert To
TFRecord File
Write Back to
Kafka
Persistent
Storage
/ File System
tf.data in TensorFlow
• Part of the graph in TensorFlow
• Support Eager and Graph mode
• Focus on data transformation
tf.data.TFRecordDataset
tf.data.TextLineDataset
tf.data.FixedLengthRecordDataset
KafkaDataset for TensorFlow
• Subclass of tf.data.Dataset
• Written in C++ (linked with librdkafka)
• Part of the graph in TensorFlow
• Support Eager and Graph mode
KafkaDataset for TensorFlow
• Part of tensorflow package in TF 1.x
• tf.contrib.kafka
• Part of tensorflow-io package in TF 2.0
• KafkaDataset (Read)
• KafkaOutputSequence (Write)
• Python and R (other language support possible)
$ pip install tensorflow-io
import tensorflow_io.kafka as kafka_io
dataset = kafka_io.KafkaDataset( 'topic', server='localhost', group= '')
# <= pre-process data (if needed), batch/repeat/etc.
dataset = dataset.map(
lambda x: ……)
# <= build keras model
model = tf.keras.models…
model.compile(…)
# <= train the model
model.fit(dataset, epochs=5)
# continue with inference…
# build keras callback
class OutputCallback(tf.keras.callbacks.Callback):
def __init__(self, batch_size, topic, servers):
self._sequence = kafka_io.KafkaOutputSequence(
topic=topic, servers=servers)
self._batch_size = batch_size
def on_predict_batch_end(self, batch, logs=None):
...
self._sequence.setitem(index, class_names[np.argmax(output)])
# <= inference with callback for streaming input and output
model.predict(
test_dataset, callbacks=[OutputCallback(32, 'topic', 'localhost')])
KafkaDataset for TensorFlow
Service
Service
TensorFlow
Service
TensorFlow SIG I/O
• https://github.com/tensorflow/io
• Special Interest Group under TensorFlow org
• Focus on I/O, streaming, and data formats
• Streaming:
• Apache Kafka, AWS Kinesis, Google Cloud PubSub
• Memory, caching and serialization
• Apache Ignite, Apache Arrow, Apache Parquet
• File formats:
• MNIST, WebP/TIFF/etc, Video, Audio
• Cloud Storage
• GCS, S3, Azure
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Kafka Summit NYC 2019

More Related Content

What's hot

Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 

What's hot (20)

Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Linux command ppt
Linux command pptLinux command ppt
Linux command ppt
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" Overview
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 

Similar to Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Kafka Summit NYC 2019

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Raffi Khatchadourian
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 

Similar to Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Kafka Summit NYC 2019 (20)

Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
Keras and TensorFlow
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlow
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptx
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Introduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and KerasIntroduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and Keras
 
A Tour of Tensorflow's APIs
A Tour of Tensorflow's APIsA Tour of Tensorflow's APIs
A Tour of Tensorflow's APIs
 
Boosting machine learning workflow with TensorFlow 2.0
Boosting machine learning workflow with TensorFlow 2.0Boosting machine learning workflow with TensorFlow 2.0
Boosting machine learning workflow with TensorFlow 2.0
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 

Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Kafka Summit NYC 2019

  • 1. Machine Learning for Real Time Streaming Data with Kafka and TensorFlow Yong Tang Maintainer & SIG IO Lead, TensorFlow GitHub: yongtang
  • 2. Stream Data Processing & Machine Learning
  • 3. Apache Kafka & Event Driven Microservices Service Service Service Service Service
  • 4. TensorFlow & 2.0 • Most popular machine learning framework • 1.x • 2.0 • Keras high level API • Eager execution • Data processing: tf.data • Distribution Strategy
  • 5. Training Workflow (TensorFlow 2.0) Read and Preprocess Data Model Building Training Saving tf.data tf.keras tf.estimator eager execution autograph distribution strategy saved model
  • 6. # <= read data into tf.data dataset = …. , test_dataset = …. # <= build keras model model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam’, loss='sparse_categorical_crossentropy’, metrics=['accuracy’]) # <= train the model model.fit(dataset, epochs=5) # <= inference predictions = model.predict(test_dataset, callbacks=[callback])
  • 7. Steam Data + Machine Learning: Challenges Streaming & big data • Hadoop/Zookeeper • Kafka, Spark, Flink • Java/Scala • Parquet/Avro/Arrow Machine leaning • CPU/GPU/TPU • Python (C++, CUDA) • Numpy • TFRecord/CSV/Text/JPEG/PNG
  • 8. TensorFlow TFRecord Format • Sequence of binary strings • Protocol buffer for serialization • Flexible and simple • ONLY used by TensorFlow • NO ecosystem support like parquet, etc.
  • 9. Infrastructure Solution Service Service TensorFlow Convert To TFRecord File Write Back to Kafka Persistent Storage / File System
  • 10. tf.data in TensorFlow • Part of the graph in TensorFlow • Support Eager and Graph mode • Focus on data transformation tf.data.TFRecordDataset tf.data.TextLineDataset tf.data.FixedLengthRecordDataset
  • 11. KafkaDataset for TensorFlow • Subclass of tf.data.Dataset • Written in C++ (linked with librdkafka) • Part of the graph in TensorFlow • Support Eager and Graph mode
  • 12. KafkaDataset for TensorFlow • Part of tensorflow package in TF 1.x • tf.contrib.kafka • Part of tensorflow-io package in TF 2.0 • KafkaDataset (Read) • KafkaOutputSequence (Write) • Python and R (other language support possible)
  • 13. $ pip install tensorflow-io import tensorflow_io.kafka as kafka_io dataset = kafka_io.KafkaDataset( 'topic', server='localhost', group= '') # <= pre-process data (if needed), batch/repeat/etc. dataset = dataset.map( lambda x: ……) # <= build keras model model = tf.keras.models… model.compile(…) # <= train the model model.fit(dataset, epochs=5)
  • 14. # continue with inference… # build keras callback class OutputCallback(tf.keras.callbacks.Callback): def __init__(self, batch_size, topic, servers): self._sequence = kafka_io.KafkaOutputSequence( topic=topic, servers=servers) self._batch_size = batch_size def on_predict_batch_end(self, batch, logs=None): ... self._sequence.setitem(index, class_names[np.argmax(output)]) # <= inference with callback for streaming input and output model.predict( test_dataset, callbacks=[OutputCallback(32, 'topic', 'localhost')])
  • 16. TensorFlow SIG I/O • https://github.com/tensorflow/io • Special Interest Group under TensorFlow org • Focus on I/O, streaming, and data formats • Streaming: • Apache Kafka, AWS Kinesis, Google Cloud PubSub • Memory, caching and serialization • Apache Ignite, Apache Arrow, Apache Parquet • File formats: • MNIST, WebP/TIFF/etc, Video, Audio • Cloud Storage • GCS, S3, Azure