MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf

Timothy Spann
Timothy SpannDeveloper Advocate
Event-Driven Machine Learning at Scale
Timothy Spann
Developer Advocate
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Timothy
Spann
StreamNative
Developer Advocate
@PaaSDev
/in/timothyspann
ABSTRACT SUMMARY
Event-Driven Machine Learning at Scale
Utilizing the FLiP stack we can drive NLP and other machine learning classifications in the cloud, on-premise, hybrid and at the edge
utilizing the open source Apache Pulsar project. We often utilize the full FLiPN stack which includes Apache Flink, Apache NiFi and
Apache Pulsar.
Apache Pulsar is a unified messaging platform that can host and execute classifications as events arrive asynchronously to topics.
We deploy our functionality in lightweight functions that can run as processes, threads or in K8. We often use the serverless
FunctionMesh to run these. This allows us to run machine learning classes built in Go, Python and Java. We will show you utilizing
Vader Sentiment, Pytorch, TensorFlow, DJL.AI, MXNet and other machine learning options with ease.
AGENDA:
▪ Welcome
▪ Introduction to Apache Pulsar
▪ Basics of Pulsar
▪ Pulsar Functions
▪ Let’s Build an ML App!
▪ Demo
▪ Resources
▪ Q&A
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
Unified
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Infinite
Scalability
WHY APACHE PULSAR?
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata Store
(ZK, RocksDB, etcd, …)
Pulsar Cluster
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Instance
Pulsar Cluster
Topics
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Producer / Consumer
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Messages - the Basic Unit of Pulsar
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar
Broker.
A serverless event streaming
framework
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
Function Mesh
Pulsar Functions, along with Pulsar
IO/Connectors, provide a powerful API for
ingesting, transforming, and outputting
data.
Function Mesh, another StreamNative
project, makes it easier for developers to
create entire applications built from
sources, functions, and sinks all through a
declarative API.
ML
• Visual Question and Answer
• NLP (Natural Language Processing)
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
• Content-based Recommendations
• Predictive Maintenance
• Fault Detection
• Fraud Detection
• Time-Series Predictions
• Naive Bayes
Using Pulsar For ML Models
High performance
High security
Multiple data consumers
Large data volumes,
high scalability
Multi-tenancy and
geo-replication
Deploying AI With an
Event-Driven
Platform
https://dzone.com/trendreports/enterprise-ai-1
Streaming FLiP-ML Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
CDC
Apps
Web UI
Chat with
Sentiment
21
Connect - Resources
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar Training
• Instructor-led courses
– Pulsar Fundamentals
– Pulsar Developers
– Pulsar Operations
• On-demand learning with labs
• 300+ engineers, admins and architects trained!
StreamNative Academy
Now Available
On-Demand
Pulsar Training
QUESTIONS?
THANK YOU!
@PaaSDev
/in/timothyspann
1 of 26

Recommended

Open Source Bristol 30 March 2022 by
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Timothy Spann
95 views55 slides
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ... by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...StreamNative
258 views22 slides
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa... by
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
1.4K views27 slides
Automation + dev ops summit hail hydrate! from stream to lake by
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lakeTimothy Spann
457 views33 slides
[March sn meetup] apache pulsar + apache nifi for cloud data lake by
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
903 views55 slides
Python web conference 2022 apache pulsar development 101 with python (f li-... by
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
282 views49 slides

More Related Content

What's hot

Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022Timothy Spann
571 views17 slides
Real time cloud native open source streaming of any data to apache solr by
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrTimothy Spann
759 views31 slides
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
440 views47 slides
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
366 views30 slides
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
3.1K views46 slides
DBCC 2021 - FLiP Stack for Cloud Data Lakes by
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
717 views36 slides

What's hot(20)

Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by Timothy Spann
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Timothy Spann571 views
Real time cloud native open source streaming of any data to apache solr by Timothy Spann
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann759 views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by Timothy Spann
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann366 views
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by Timothy Spann
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Timothy Spann3.1K views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
fluentd -- the missing log collector by Muga Nishizawa
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Muga Nishizawa2.2K views
Hail hydrate! from stream to lake using open source by Timothy Spann
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
Timothy Spann569 views
ApacheCon 2021 Apache Deep Learning 302 by Timothy Spann
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann632 views
Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... by Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann519 views
StreamNative FLiP into scylladb - scylla summit 2022 by Timothy Spann
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022
Timothy Spann528 views
Big data conference europe real-time streaming in any and all clouds, hybri... by Timothy Spann
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann811 views
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021 by StreamNative
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
StreamNative175 views
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by Timothy Spann
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann757 views
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ... by StreamNative
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
StreamNative273 views
Apache Deep Learning 201 - Philly Open Source by Timothy Spann
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann642 views
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl... by confluent
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
confluent13K views
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io... by Timothy Spann
Ai dev world   utilizing apache pulsar, apache ni fi and minifi for edgeai io...Ai dev world   utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Timothy Spann436 views

Similar to MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
305 views71 slides
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team by
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamNYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamTimothy Spann
485 views35 slides
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming by
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingTimothy Spann
214 views139 slides
[AI Dev World 2022] Build ML Enhanced Event Streaming by
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
201 views22 slides
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
197 views69 slides
bigdata 2022_ FLiP Into Pulsar Apps by
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
460 views60 slides

Similar to MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf(20)

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann305 views
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team by Timothy Spann
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamNYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann485 views
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming by Timothy Spann
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann214 views
[AI Dev World 2022] Build ML Enhanced Event Streaming by Timothy Spann
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
Timothy Spann201 views
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by Timothy Spann
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann197 views
bigdata 2022_ FLiP Into Pulsar Apps by Timothy Spann
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann460 views
OSA Con 2022: Streaming Data Made Easy by Timothy Spann
OSA Con 2022:  Streaming Data Made EasyOSA Con 2022:  Streaming Data Made Easy
OSA Con 2022: Streaming Data Made Easy
Timothy Spann456 views
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St... by Altinity Ltd
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd12 views
Serverless Event Streaming Applications as Functionson K8 by Timothy Spann
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8
Timothy Spann361 views
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent336 views
(Current22) Let's Monitor The Conditions at the Conference by Timothy Spann
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann150 views
Timothy Spann: Apache Pulsar for ML by Edunomica
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica37 views
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8 by Timothy Spann
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
Timothy Spann241 views
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia... by StreamNative
Deploying Machine Learning Models with Pulsar Functions  - Pulsar Summit Asia...Deploying Machine Learning Models with Pulsar Functions  - Pulsar Summit Asia...
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia...
StreamNative128 views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann175 views
Apache Pulsar Development 101 with Python by Timothy Spann
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann1.2K views
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann822 views
JConf.dev 2022 - Apache Pulsar Development 101 with Java by Timothy Spann
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann216 views
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp... by Timothy Spann
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann470 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

CryptoBotsAI by
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 views5 slides
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
164 views13 slides
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlueShapeBlue
152 views23 slides
Optimizing Communication to Optimize Human Behavior - LCBM by
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBMYaman Kumar
38 views49 slides
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...Jasper Oosterveld
35 views49 slides
"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 views34 slides

Recently uploaded(20)

Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue152 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views

MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf

  • 1. Event-Driven Machine Learning at Scale Timothy Spann Developer Advocate
  • 4. ABSTRACT SUMMARY Event-Driven Machine Learning at Scale Utilizing the FLiP stack we can drive NLP and other machine learning classifications in the cloud, on-premise, hybrid and at the edge utilizing the open source Apache Pulsar project. We often utilize the full FLiPN stack which includes Apache Flink, Apache NiFi and Apache Pulsar. Apache Pulsar is a unified messaging platform that can host and execute classifications as events arrive asynchronously to topics. We deploy our functionality in lightweight functions that can run as processes, threads or in K8. We often use the serverless FunctionMesh to run these. This allows us to run machine learning classes built in Go, Python and Java. We will show you utilizing Vader Sentiment, Pytorch, TensorFlow, DJL.AI, MXNet and other machine learning options with ease. AGENDA: ▪ Welcome ▪ Introduction to Apache Pulsar ▪ Basics of Pulsar ▪ Pulsar Functions ▪ Let’s Build an ML App! ▪ Demo ▪ Resources ▪ Q&A
  • 5. streamnative.io Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 7. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Store (ZK, RocksDB, etcd, …) Pulsar Cluster
  • 8. Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster Topics
  • 9. Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic Producer / Consumer
  • 10. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Messages - the Basic Unit of Pulsar
  • 11. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 12. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 13. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 14. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 15. Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API.
  • 16. ML • Visual Question and Answer • NLP (Natural Language Processing) • Sentiment Analysis • Text Classification • Named Entity Recognition • Content-based Recommendations • Predictive Maintenance • Fault Detection • Fraud Detection • Time-Series Predictions • Naive Bayes
  • 17. Using Pulsar For ML Models High performance High security Multiple data consumers Large data volumes, high scalability Multi-tenancy and geo-replication
  • 18. Deploying AI With an Event-Driven Platform https://dzone.com/trendreports/enterprise-ai-1
  • 19. Streaming FLiP-ML Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  • 23. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 24. Apache Pulsar Training • Instructor-led courses – Pulsar Fundamentals – Pulsar Developers – Pulsar Operations • On-demand learning with labs • 300+ engineers, admins and architects trained! StreamNative Academy Now Available On-Demand Pulsar Training