2. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
4
What can you expect during this session:
➔ Introductions
➔ Context
➔ Basics of Kafka
➔ Use Cases
➔ ML elements
➔ ML applied
➔ Demo
➔ Key takeaways
➔ Where next
7. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
9
Agile Integration foundations
DISTRIBUTED
INTEGRATION
CONTAINERS APIs
LIGHTWEIGHT
PATTERN BASED
EVENT-ORIENTED
COMMUNITY-SOURCED
CLOUD-NATIVE SOLUTIONS
LEAN ARTIFACTS, INDIVIDUALLY
DEPLOYABLE
CONTAINER-BASED SCALING & HIGH
AVAILABILITY
WELL-DEFINED, REUSABLE, &
WELL-MANAGED
ENDPOINTS
ECOSYSTEM LEVERAGE
API
SERVICES
SECURITY, AUTHENTICATION, AUDIT (RH-SSO)
RED HAT
FUSE
RED HAT
AMQ
8. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
10
AMQ offering
Flexible, standards-based messaging for the enterprise, cloud and Internet of Things
Self-service Messaging
- Scalable, easy-to-manage messaging utility for OpenShift Container Platform (Beta)
- Red Hat-managed deployment (Tech Preview)
Broker
(AMQ)
- Store & Forward
- Volatile & Durable
- Full JMS 2.0 Support
- Best-in-class perf
Interconnect
(AMQ)
- High-performance direct
messaging
- Distributed messaging
backbone
Streams
(AMQ Streams)
- Streaming platform
- Durable pub/sub
- Replayable streams
- Based on Apache
Kafka and Strimzi
Standard
Protocols
Polyglot
Clients
CommonManagement
9. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
11
AMQ offering
Broker
High-performance
messaging
implementation based on
ActiveMQ Artemis
Interconnect
Message router to build
large-scale messaging
networks using the
AMQP protocol to create
a redundant
application-level
messaging network
Streams
Streams simplifies the
deployment,
configuration,
management and use of
Apache Kafka on
OpenShift using the
Operator concept
24. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
26
JBOSS AMQ
● You care about individual messages
● You want clients to use standard APIs (e.g., JMS) or wire protocols (e.g., AMQP)
● You need transactional sends and receives
● You’re doing request-reply messaging
● Heterogeneous client/protocol messaging (ie, AMQP, MQTT, STOMP, etc)
● You send metadata/headers/properties with your messages
● You don’t want to implement broker functionality in your clients (ie, partitioning,
dispatching, coordination)
25. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
27
KAFKA
● You care about messages in volume
● You care about raw throughput, high performance
● You need sliding-window replay abilities
● Large numbers of subscribers for published events
● You need to finely control the parallelism/scalability of consumers
● You want to leverage application-level replication vs HA storage
● You need total order guarantees at the partition level
27. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
29
Industries
➔ Travel companies
➔ Finance and fintech companies
➔ Retailers and online shopping
➔ Automotive and manufacturing companies
➔ Video Streaming companies
➔ Social networks
➔ Transportation
➔ ...
[ https://kafka.apache.org/powered-by ]
28. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
30
SINGAPORE AIRLINES
[ https://speakerdeck.com/devacto/predictive-maintenance-pipeline-using-kafka-connect-streams-and-ksql ]
Problem:
Many
airplanes
Many components
per airplane
Each airplane
different
flight-plan
! Predictive maintenance is a hard !
29. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
31
SINGAPORE AIRLINES
Solution:
Kafka
Connector
Kafka
Streams App ML model
Web App
31. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
33
Text is everywhere
Most of it unstructured
> 40 million articles in Wikipedia
> 4.5 billion web pages
> 500 million tweets a day
> 1.5 trillion queries on Google a day
32. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
34
Problem Taxonomy
Classification
Supervised
Learning
Machine
Learning
Unsupervised
Learning
Regression
Clustering
...
33. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
35
NLP
Natural language
Natural Language Processing
English, Deutsch, Italiano
#rhte2019 for a great week in #vienna
c u l8r
Example:
Frequency count
Normalization & Stemming
Tokenization
34. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
36
Common Text Problems
How do I represent the text in a compact
and computer friendly way?
Bag of Words model
35. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
37
Common Text Problems
How do I represent the text in a compact
and computer friendly way?
How do I find a text similar to the one I
have?
Bag of Words model
Similarity measure
36. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
38
Common Text Problems
How do I represent the text in a compact
and computer friendly way?
How do I find a text similar to the one I
have?
How do we extract the core meaning of
the text? How do we get the important
words?
Bag of Words model
Similarity measure
Term frequency - inverse document
frequency (tf-idf)
37. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
39
Common Text Problems
How do I represent the text in a compact
and computer friendly way?
How do I find a text similar to the one I
have?
How do we extract the core meaning of
the text? How do we get the important
words?
How do I classify an article based on
certain categories?
Bag of Words model
Similarity measure
Term frequency - inverse document
frequency (tf-idf)
Naive Bayes model
39. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
41
Overall ML model
20 newsgroup
training data
Continuous Bag of
Words
IF-IDF
Multinomial Bayes
classifier
Trained model
Classification engine
Twitter stream
Boris Johnson on the phone. He’s
apparently not able to return to the UK
to answer questions in parliament as he
was booked on a Thomas Cook flight.
=> Politics
42. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
44
FIRST STEP
Start a local Kafka instance:
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
Create a topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
--partitions 1 --topic twitter-stream
43. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
45
SECOND STEP
Start the producer:
python twitter_kafka_producer.py
Start the consumer:
python doc_classifier.py
Had to fix some issues with original code:
● Add the Kafka api_version in the consumer and producer
● Add timeouts and retries since tweepy doesn’t implement that and Twitter is very restrictive with
RateLimits on its APIs
45. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
47
NEXT STEPS
● Test locally by exposing Kafka endpoint
(https://strimzi.io/2019/04/30/accessing-kafka-part-3.html)
● Deploy the python app to OpenShift and test everything on it
● Build a chart visualization
● Cache tweets locally for optimization and then leverage Debezium
● Using Faust, a stream processing library, porting the ideas from Kafka Streams to Python.
47. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
49
Key takeaways
➢ Learn possible use cases around Kafka and understand where does it fit better
➢ Learn the basics elements of an ML text classification model
➢ Learn the building blocks of a streaming application
48. CONFIDENTIAL INTERNAL USE2019 RED HAT TECH EXCHANGE
50
Resources
You can find more material on a similar topic here
Related project: OpenDataHub
AMQ Streams use cases
As reference you can use this book:
https://www.amazon.es/Building-Streaming-Applications-Apache-Kafka/dp/1787283984/