SlideShare a Scribd company logo
1 of 19
Real-Time Streaming with
Python ML Inference
Marko Topolnik
1
About Us
Hazelcast started as an In-Memory Data Grid project
(distributed caching with data-local computation)
Java codebase, polyglot clients
we focus on Simplicity
in 2017 we added Hazelcast Jet: In-Memory Distributed Stream
Processing
My role: core engineer at Hazelcast Jet
2
Example: Salary Prediction
Python, SciKit Learn
3
Sample Input and Output
Input: {
"age": 25,
"workclass": "Self-emp",
"fnlwgt": 176756,
"education": "HS-grad",
"education-num": 9,
"marital-status": "Never-married",
"occupation": "Farming-fishing",
"relationship": "Own-child",
"capital-gain": 0,
"capital-loss": 0,
"hours-per-week": 35,
"native-country": "United-States"
}
Output: {
"probability": 0.85
"income": "<=50K"
}
4
(Showing Project Directory)
5
Data Scientist's Perspective
Do Data Science Profit!
6
Data Engineer's Perspective
Do Data Science
Monitor
Profit
(fingers crossed)
7
Package
Load-Balance
Scale Up
Scale Out
Deploy
Re-Train
Productionizing the REST service
8
Request #1
Request #2
Request #3
R
E
S
T
Batching?
Productionizing the REST service
9
Request #1
Request #2
Request #3
R
E
S
T
Request
Queue
Batch
Response
Queue
Parallelism?
Productionizing the REST service
10
Request #1
Request #2
Request #3
R
E
S
T
R
E
S
T
Load
Balancer
Request
Queue
Response
Queue
Batch
Batch
Replace REST with Distributed Streaming
11
Request #1
Request #2
Request #3
Kafka
Hazelcast
Jet Cluster
$ jet submit ML
Hazelcast Jet Code
Pipeline p = Pipeline.create();
p.readFrom(Kafka.source())
.apply(mapUsingPython(new PythonServiceConfig()
.setBaseDir("/Users/mtopol/dev/python/sklearn")
.setHandlerModule("example_1_inference_jet")))
.writeTo(Kafka.sink());
jet.newJob(p);
$ mvn package
$ jet submit target/my-job.jar
12
Jet Node 1
Jet Job's Execution Plan
Source
mapUsing
Python
mapUsing
Python
Sink
13
Kafka Topic A
Python
process
Kafka Topic B
Batch
Python
process
Batch
Cluster Elasticity and Resilience
● Jet jobs are fault-tolerant
● nodes can join and leave the cluster, jobs go on
● automatically rescale to available hardware
14
Let's Start a Jet Cluster!
15
Cluster Self-Formation
16
Hazelcast Jet natively supports:
● Amazon AWS
● Google GCP
● Kubernetes
With simple configuration, the nodes self-discover in these
environments
Source and Sink Connectors
● Kafka
● Hadoop HDFS
● S3 bucket
● JDBC
● JMS queue and topic
● custom connector
● test connectors: convenient to test the code before submitting
17
Stream Operators
● windowed aggregation using Event Time
○ sliding, session window
○ count, sum, average, linear regression, ...
○ custom aggregate function
● rolling aggregation
● streaming join (co-grouping)
● hash join (enrichment)
● contact arbitrary external services
○ mapUsingPython uses this
18
Thanks for attending!
Q&A
marko@hazelcast.com
@mtopolnik
19

More Related Content

Similar to Real-Time Streaming with Python ML Inference

Webinar elastic stack {on telecom} english webinar part (1)
Webinar elastic stack {on telecom} english webinar part (1)Webinar elastic stack {on telecom} english webinar part (1)
Webinar elastic stack {on telecom} english webinar part (1)
Yassine, LASRI
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
Dataiku
 
Learning analytics inside government
Learning analytics inside governmentLearning analytics inside government
Learning analytics inside government
mwebbjisc
 

Similar to Real-Time Streaming with Python ML Inference (20)

Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Webinar elastic stack {on telecom} english webinar part (1)
Webinar elastic stack {on telecom} english webinar part (1)Webinar elastic stack {on telecom} english webinar part (1)
Webinar elastic stack {on telecom} english webinar part (1)
 
Transforming your application with Elasticsearch
Transforming your application with ElasticsearchTransforming your application with Elasticsearch
Transforming your application with Elasticsearch
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
Contract-driven development with OpenAPI 3 and Vert.x | DevNation Tech Talk
Contract-driven development with OpenAPI 3 and Vert.x | DevNation Tech TalkContract-driven development with OpenAPI 3 and Vert.x | DevNation Tech Talk
Contract-driven development with OpenAPI 3 and Vert.x | DevNation Tech Talk
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Open Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseOpen Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational Database
 
Learning analytics inside government
Learning analytics inside governmentLearning analytics inside government
Learning analytics inside government
 
Montreal Elasticsearch Meetup
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch Meetup
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 
DevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOpsDevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOps
 
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013Telling the LivePerson Technology Story at Couchbase [SF] 2013
Telling the LivePerson Technology Story at Couchbase [SF] 2013
 
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
 
ICONUK 2016: REST Assured, Freeing Your Domino Data Has Never Been That Easy!
ICONUK 2016: REST Assured, Freeing Your Domino Data Has Never Been That Easy!ICONUK 2016: REST Assured, Freeing Your Domino Data Has Never Been That Easy!
ICONUK 2016: REST Assured, Freeing Your Domino Data Has Never Been That Easy!
 

Recently uploaded

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Real-Time Streaming with Python ML Inference

  • 1. Real-Time Streaming with Python ML Inference Marko Topolnik 1
  • 2. About Us Hazelcast started as an In-Memory Data Grid project (distributed caching with data-local computation) Java codebase, polyglot clients we focus on Simplicity in 2017 we added Hazelcast Jet: In-Memory Distributed Stream Processing My role: core engineer at Hazelcast Jet 2
  • 4. Sample Input and Output Input: { "age": 25, "workclass": "Self-emp", "fnlwgt": 176756, "education": "HS-grad", "education-num": 9, "marital-status": "Never-married", "occupation": "Farming-fishing", "relationship": "Own-child", "capital-gain": 0, "capital-loss": 0, "hours-per-week": 35, "native-country": "United-States" } Output: { "probability": 0.85 "income": "<=50K" } 4
  • 6. Data Scientist's Perspective Do Data Science Profit! 6
  • 7. Data Engineer's Perspective Do Data Science Monitor Profit (fingers crossed) 7 Package Load-Balance Scale Up Scale Out Deploy Re-Train
  • 8. Productionizing the REST service 8 Request #1 Request #2 Request #3 R E S T Batching?
  • 9. Productionizing the REST service 9 Request #1 Request #2 Request #3 R E S T Request Queue Batch Response Queue Parallelism?
  • 10. Productionizing the REST service 10 Request #1 Request #2 Request #3 R E S T R E S T Load Balancer Request Queue Response Queue Batch Batch
  • 11. Replace REST with Distributed Streaming 11 Request #1 Request #2 Request #3 Kafka Hazelcast Jet Cluster $ jet submit ML
  • 12. Hazelcast Jet Code Pipeline p = Pipeline.create(); p.readFrom(Kafka.source()) .apply(mapUsingPython(new PythonServiceConfig() .setBaseDir("/Users/mtopol/dev/python/sklearn") .setHandlerModule("example_1_inference_jet"))) .writeTo(Kafka.sink()); jet.newJob(p); $ mvn package $ jet submit target/my-job.jar 12
  • 13. Jet Node 1 Jet Job's Execution Plan Source mapUsing Python mapUsing Python Sink 13 Kafka Topic A Python process Kafka Topic B Batch Python process Batch
  • 14. Cluster Elasticity and Resilience ● Jet jobs are fault-tolerant ● nodes can join and leave the cluster, jobs go on ● automatically rescale to available hardware 14
  • 15. Let's Start a Jet Cluster! 15
  • 16. Cluster Self-Formation 16 Hazelcast Jet natively supports: ● Amazon AWS ● Google GCP ● Kubernetes With simple configuration, the nodes self-discover in these environments
  • 17. Source and Sink Connectors ● Kafka ● Hadoop HDFS ● S3 bucket ● JDBC ● JMS queue and topic ● custom connector ● test connectors: convenient to test the code before submitting 17
  • 18. Stream Operators ● windowed aggregation using Event Time ○ sliding, session window ○ count, sum, average, linear regression, ... ○ custom aggregate function ● rolling aggregation ● streaming join (co-grouping) ● hash join (enrichment) ● contact arbitrary external services ○ mapUsingPython uses this 18

Editor's Notes

  1. Ask "Are you using Python in your data science?" What languages are you using?
  2. Step one: Do Data Science Step two: Profit!
  3. Then along comes a Data Engineer and asks "Wait, aren't we forgetting something?"