Successfully reported this slideshow.
Your SlideShare is downloading. ×

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 69 Ad

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices

Download to read offline

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices


https://github.com/tspannhw/SpeakerProfile
https://github.com/tspannhw/pulsar-ml-recipes
https://github.com/tspannhw/FLiPStackWeekly
https://github.com/tspannhw/pulsar-thermal-pinot
https://medium.com/@tspann/transit-watch-real-time-feeds-d98ff62b3bbb
https://github.com/tspannhw/pulsar-pychat-function
https://github.com/tspannhw/pulsar-transit-function
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/

http://tinyurl.com/bdha5p4r


https://dzone.com/trendreports/enterprise-ai-1

Introduction
What is Apache Pulsar?
Why Apache Pulsar?
Remember Hadoop and Kafka?
Functions
Apache NiFi
Apache Flink
Demos
Q&A
Topic:
Build ML Enhanced Event Streaming Applications with Java and Python Microservices

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices


https://github.com/tspannhw/SpeakerProfile
https://github.com/tspannhw/pulsar-ml-recipes
https://github.com/tspannhw/FLiPStackWeekly
https://github.com/tspannhw/pulsar-thermal-pinot
https://medium.com/@tspann/transit-watch-real-time-feeds-d98ff62b3bbb
https://github.com/tspannhw/pulsar-pychat-function
https://github.com/tspannhw/pulsar-transit-function
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/

http://tinyurl.com/bdha5p4r


https://dzone.com/trendreports/enterprise-ai-1

Introduction
What is Apache Pulsar?
Why Apache Pulsar?
Remember Hadoop and Kafka?
Functions
Apache NiFi
Apache Flink
Demos
Q&A
Topic:
Build ML Enhanced Event Streaming Applications with Java and Python Microservices

Advertisement
Advertisement

More Related Content

More from Timothy Spann (20)

Recently uploaded (20)

Advertisement

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices

  1. 1. BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH MICROSERVICES Tim Spann | Developer Advocate
  2. 2. ● Introduction ● What is Apache Pulsar? ● Why Apache Pulsar? ● Remember Hadoop and Kafka? ● Functions ● Apache NiFi ● Apache Flink ● Demos ● Q&A
  3. 3. Tim Spann Developer Advocate Tim Spann Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  4. 4. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  5. 5. Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  6. 6. It is often assumed that Pulsar and Kafka have equal capabilities. In reality, Pulsar offers a superset of Kafka. ● Pulsar is streaming and queuing together ● Pulsar is cloud-native with stateless brokers ● Natively includes geo-replication, multi-tenancy, and end-to-end security out of the box ● Pulsar provides automated rebalancing ● Pulsar offers 100X lower latency w/ 2.5 greater throughput than Kafka Advantages of Apache Pulsar
  7. 7. CREATED Originally developed inside Yahoo! as Cloud Messaging Service GROWTH 10x Contributors 10MM+ Downloads Ecosystem Expands Kafka on Pulsar AMQ on Pulsar Functions . . . 2012 2016 2018 TODAY APACHE TLP Pulsar becomes Apache top level project. OPEN SOURCE Pulsar committed to open source. Apache Pulsar Timeline
  8. 8. Evolution of Pulsar Growth
  9. 9. Pulsar Has a Built-in Super Set of OSS Features Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model Reduced Vendor Dependency Functions Open-Source Features
  10. 10. Apache Pulsar is built to support legacy applications, handle the needs of modern apps, and supports NextGen applications Support legacy workloads. Compatible with popular messaging and streaming tools. Legacy Built for today's real-time event driven applications. Modern Scalable, adaptive architecture ready for the future of real-time streaming. NextGen
  11. 11. Apache Pulsar features Cloud native with decoupled storage and compute layers. Built-in compatibility with your existing code and messaging infrastructure. Geographic redundancy and high availability included. Centralized cluster management and oversight. Elastic horizontal and vertical scalability. Seamless and instant partitioning rebalancing with no downtime. Flexible subscription model supports a wide array of use cases. Compatible with the tools you use to store, analyze, and process data.
  12. 12. Component Description Value / Data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Messages - the Basic Unit of Apache Pulsar
  13. 13. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Storage Pulsar Cluster
  14. 14. Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2 ,V 2 1> < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover Apache Pulsar Subscription Modes
  15. 15. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  16. 16. Messaging Use Cases Streaming Use Cases Service x commands service y to make some change. Example: order service removing item from inventory service Moving large amounts of data to another service (real-time ETL). Example: logs to elasticsearch Distributing messages that represent work among n workers. Example: order processing not in main “thread” Periodic jobs moving large amounts of data and aggregating to more traditional stores. Example: logs to s3 Sending “scheduled” messages. Example: notification service for marketing emails or push notifications Computing a near real-time aggregate of a message stream, split among n workers, with order being important. Example: real-time analytics over page views Messaging vs Streaming
  17. 17. Messaging Use Case Streaming Use Case Retention The amount of data retained is relatively small - typically only a day or two of data at most. Large amounts of data are retained, with higher ingest volumes and longer retention periods. Throughput Messaging systems are not designed to manage big “catch-up” reads. Streaming systems are designed to scale and can handle use cases such as catch-up reads. Differences in Consumption
  18. 18. byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); Reader<byte[]> reader = pulsarClient.newReader() .topic(topic) .startMessageId(id) .create(); Create a reader that will read from some message between earliest and latest. Reader Apache Pulsar Reader Interface
  19. 19. ● New Consumer type added in Pulsar 2.10 that provides a continuously updated key-value map view of compacted topic data. ● An abstraction of a changelog stream from a primary-keyed table, where each record in the changelog stream is an update on the primary-keyed table with the record key as the primary key. ● READ ONLY DATA STRUCTURE! Apache Pulsar TableView
  20. 20. Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers Schema Registry
  21. 21. ● Utilizing JSON Data with a JSON Schema ● Consistency, Contracts, Clean Data ● This enables easy SQL: ○ Pulsar SQL (Presto SQL) ○ Flink SQL ○ Spark Structured Streaming Use Schemas
  22. 22. • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) Sources, Sinks and Processing
  23. 23. Kafka on Pulsar (KoP)
  24. 24. MQTT on Pulsar (MoP)
  25. 25. AMQP on Pulsar (AoP)
  26. 26. The FLiPN Kitten crosses the stream, 4 ways with Pulsar MoP AoP KoP WebSockets
  27. 27. Use Apache Pulsar For Ingest
  28. 28. Use Apache Pulsar To Stream to Lakehouses
  29. 29. ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions A serverless event streaming framework Pulsar Functions
  30. 30. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  31. 31. ● Route ● Enrich ● Convert ● Lookups ● Run Machine Learning ● Logging ● Auditing ● Parse ● Split ● Convert Pulsar Functions
  32. 32. ML Java Coding (Deep Java Library)
  33. 33. Java Pulsar Functions
  34. 34. from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() logger.info("Message Content: {0}".format(input)) msg_id = context.get_message_id() row = { } row['id'] = str(msg_id) json_string = json.dumps(row) return json_string Python Pulsar Functions
  35. 35. from pulsar import Function from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import json class Chat(Function): def __init__(self): pass def process(self, input, context): fields = json.loads(input) sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(fields["comment"]) row = { } row['id'] = str(msg_id) if ss['compound'] < 0.00: row['sentiment'] = 'Negative' else: row['sentiment'] = 'Positive' row['comment'] = str(fields["comment"]) json_string = json.dumps(row) return json_string Entire Function Pulsar Python NLP Function
  36. 36. Why Pulsar Functions for Microservices? Desired Characteristic Pulsar Functions… Highly maintainable and testable Are small pieces of code written in popular languages such as Java, Python, or Go. They can be easily maintained in source control repositories and tested with existing frameworks automatically. Loosely coupled with other services Are not directly linked to one another and communicate via messages. Independently deployable Are designed to be deployed independently Can be developed by a small team Are often developed by a single developer. Inter-service Communication Support all message patterns using Pulsar as the underlying message bus. Deployment & Composition Can run as individual threads, processes, or K8s pods. The Function Mesh allows you to deploy multiple Pulsar Functions as a single unit.
  37. 37. Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API.
  38. 38. Edge AI
  39. 39. ● Apache Pulsar’s two-tier architecture separates the compute and storage layers, and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. Edge Computing with Pulsar
  40. 40. ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. Benefits of Running Pulsar Broker on the Edge
  41. 41. ● You can leverage 3rd party libraries within Pulsar Functions ● DeepLearning4J ● JPMML ● DJL.AI ● Keras ● Pulsar Functions are able to support: ● A variety of ML model types. ● Models developed with different languages and toolkits Pulsar Function – Third Party Library Support
  42. 42. Building Real-Time Apps Requires a Team
  43. 43. https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 components • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Apache NiFi Basics
  44. 44. Apache NiFi - Apache Pulsar Connector
  45. 45. https://github.com/streamnative/pulsar-nifi-bundle Apache NiFi - Apache Pulsar Connector
  46. 46. Apache NiFi - Apache Pulsar Connector
  47. 47. Apache NiFi - Apache Pulsar Connector
  48. 48. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink
  49. 49. Apache Flink Job Dashboard
  50. 50. NLP Streaming Architecture
  51. 51. IoT Streaming Architecture
  52. 52. Streaming FLiPN Java App
  53. 53. Learn More
  54. 54. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! Now Available On-Demand Pulsar Training Academy.StreamNative.io StreamNative Academy
  55. 55. ● https://github.com/tspannhw/pulsar-pychat-function ● https://streamnative.io/apache-nifi-connector/ ● https://nightlies.apache.org/flink/flink-docs-master/docs/conne ctors/datastream/pulsar/ ● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-o n-streamnative-cloud ● https://github.com/streamnative/flink-example ● https://pulsar.apache.org/docs/en/adaptors-spark/ Apache Pulsar Links
  56. 56. ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 Apache Pulsar Examples
  57. 57. Deploying AI With an Event-Driven Platform https://dzone.com/trendreports/enterprise-ai-1
  58. 58. Apache Pulsar in Action http://tinyurl.com/bdha5p4r Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  59. 59. https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
  60. 60. Scan the QR code to learn more about Apache Pulsar and StreamNative.
  61. 61. Scan the QR code to build your own apps today.
  62. 62. Tim Spann Developer Advocate @PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw Let’s Keep in Touch

×