OSA Con_ Streaming Data Made Easy
https://altinity.com/osa-con/
OSA Con 2022 Talk:
Streaming Data Made Easy
https://events.zoom.us/ev/AnOw0Lh3tcaRFG5d-A434hLZU6uq6vss4nOhTNYUVeaSALooyD-G~AggLXsr32QYFjq8BlYLZ5I06Dg
Click into new streaming applications the easy way with Apache Pulsar, Clickhouse and Open Source. A quick introduction on how to build modern data streaming applications.
Tim Spann
Developer Advocate @StreamNative
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
David Kjerrumgaard
Developer Advocate @StreamNative & Author of "Pulsar in Action"
David is a committer on the Apache Pulsar project, and also the author of "Pulsar in Action" and co-author of "Practical Hive". He currently serves as a Developer Advocate for StreamNative where he focuses on strengthening the Apache Pulsar community through education and evangelization. Prior to that he was a principal software engineer on the messaging team at Splunk, and Director of Solutions for two Big Data startups; Streamlio and Hortonworks.
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
Python web conference 2022 apache pulsar development 101 with python (FLiP-Py)
What is Apache Pulsar?
Python 3 Coding
Python Consumers
Python Producers
Python via MQTT, Web Sockets, Kafka
Python for Pulsar Functions
Schemas
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingTimothy Spann
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
https://www.meetup.com/new-york-city-apache-pulsar-meetup/
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/
|WHAT THE SESSION WILL COVER|
Apache NiFi
Apache Pulsar
Apache Flink
Flink SQL
We will show you how to build apps, so download beforehand to Docker, K8, your Laptop, or the cloud.
Cloudera CSP Setup
Getting Started with Cloudera Stream Processing Community Edition
You may download CSP-CE here:
Cloudera Stream Processing Community Edition
The Cloudera CDP User's page:
CDP Resources Page
https://youtu.be/s80sz3NWwHo
https://docs.cloudera.com/csp-ce/latest/index.html
https://www.cloudera.com/downloads/cdf/csp-community-edition.html
Apache Pulsar
https://pulsar.apache.org/docs/getting-started-standalone/
or
https://streamnative.io/free-cloud/
Cloudera + Pulsar
https://community.cloudera.com/t5/Cloudera-Stream-Processing-Forum/Using-Apache-Pulsar-with-SQL-Stream-Builder/m-p/349917
https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-Pulsar-for-Streaming/ta-p/337891
|AGENDA|
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
6:30 - 7:15 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:15 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Round Table on Real-Time Streaming, Q&A
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
See:
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
https://github.com/tspannhw/SpeakerProfile
https://www.meetup.com/futureofdata-newyork/
https://github.com/tspannhw/pulsar-transit-function
https://www.meetup.com/futureofdata-princeton/
https://github.com/tspannhw/create-nifi-pulsar-flink-apps
https://medium.com/@tspann/using-apache-pulsar-with-cloudera-sql-builder-apache-flink-b518aa9eadff
https://github.com/tspannhw/meetups/blob/main/15December2022.md All resources for the meetup
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
https://sixfeetup.com/company/news/90-talks-and-tutorials-from-2022-python-web-conference-released
https://www.youtube.com/playlist?list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl
https://www.youtube.com/watch?v=H88re4p-DoU&list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl&index=62
We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.
After this session you will be building real-time streaming and messaging applications with Python.
#PWC2022 attracted nearly 375 attendees from 36 countries and 21 time zones making it the biggest and best year yet. The highly engaging format featured 90 speakers, 6 tracks (including 80 talks and 4 tutorials) and took place virtually on March 21-25, 2022 on LoudSwarm by Six Feet Up.
Apache Pulsar
Apache Flink
Apache Kafka
MQTT
AMQP/RabbitMQ
WebSockets
Python3
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfTimothy Spann
MLconf 2022 NYC Event-Driven Machine Learning at Scale
https://mlconf.com/agenda/mlconf-2022-NYC/
https://mlconf.com/speakers/timothy-spann/
Bio
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Upcoming Abstract Summary
Event-Driven Machine Learning at Scale
Utilizing the FLiP stack we can drive NLP and other machine learning classifications in the cloud, on-premise, hybrid and at the edge utilizing the open source Apache Pulsar project. We often utilize the full FLiPN stack which includes Apache Flink, Apache NiFi and Apache Pulsar.
Apache Pulsar is a unified messaging platform that can host and execute classifications as events arrive asynchronously to topics. We deploy our functionality in lightweight functions that can run as processes, threads or in K8. We often use the serverless FunctionMesh to run these. This allows us to run machine learning classes built in Go, Python and Java. We will show you utilizing Vader Sentiment, Pytorch, TensorFlow, DJL.AI, MXNet and other machine learning options with ease.
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Living the Stream Dream with Pulsar and Spring Boot
https://events.geekle.us/java23
feb 21, 2023
https://geekle.us/schedule/java23
Living the Stream Dream with Pulsar and Spring Boot
For building Java applications, Spring is the universal answer as it supplies all the connectors and integrations one could want. The same is true for Apache Pulsar as it provides connectors, integration and flexibility to any use case. Apache Pulsar has a robust native Java library to use with Spring as well as other protocol options.
Apache Pulsar provides a cloud native, geo-replicated unified messaging platform that allows for many messaging paradigms. This lends itself well to upgrading existing applications as Pulsar supports using libraries for WebSockets, MQTT, Kafka, JMS, AMQP and RocketMQ. In this talk I will build some example applications utilizing several different protocols for building a variety of applications from IoT to Microservices to Log Analytics.
We will build Spring Boot microservices that utilize Apache Pulsar as a central data hub for communications and enrichment. This utilizes the new Spring for Pulsar framework.
https://github.com/spring-projects-experimental/spring-pulsar
We talked about it on Josh Long's podcast https://spring.io/blog/2022/09/15/a-bootiful-podcast-big-data-legend-former-pivot-and-friend-to-the-spring-community-tim-spann
programming real-time applications with Java 17, Spring Boot and Apache Pulsar 2.11
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
Python web conference 2022 apache pulsar development 101 with python (FLiP-Py)
What is Apache Pulsar?
Python 3 Coding
Python Consumers
Python Producers
Python via MQTT, Web Sockets, Kafka
Python for Pulsar Functions
Schemas
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingTimothy Spann
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
https://www.meetup.com/new-york-city-apache-pulsar-meetup/
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/
|WHAT THE SESSION WILL COVER|
Apache NiFi
Apache Pulsar
Apache Flink
Flink SQL
We will show you how to build apps, so download beforehand to Docker, K8, your Laptop, or the cloud.
Cloudera CSP Setup
Getting Started with Cloudera Stream Processing Community Edition
You may download CSP-CE here:
Cloudera Stream Processing Community Edition
The Cloudera CDP User's page:
CDP Resources Page
https://youtu.be/s80sz3NWwHo
https://docs.cloudera.com/csp-ce/latest/index.html
https://www.cloudera.com/downloads/cdf/csp-community-edition.html
Apache Pulsar
https://pulsar.apache.org/docs/getting-started-standalone/
or
https://streamnative.io/free-cloud/
Cloudera + Pulsar
https://community.cloudera.com/t5/Cloudera-Stream-Processing-Forum/Using-Apache-Pulsar-with-SQL-Stream-Builder/m-p/349917
https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-Pulsar-for-Streaming/ta-p/337891
|AGENDA|
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
6:30 - 7:15 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:15 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Round Table on Real-Time Streaming, Q&A
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
See:
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
https://github.com/tspannhw/SpeakerProfile
https://www.meetup.com/futureofdata-newyork/
https://github.com/tspannhw/pulsar-transit-function
https://www.meetup.com/futureofdata-princeton/
https://github.com/tspannhw/create-nifi-pulsar-flink-apps
https://medium.com/@tspann/using-apache-pulsar-with-cloudera-sql-builder-apache-flink-b518aa9eadff
https://github.com/tspannhw/meetups/blob/main/15December2022.md All resources for the meetup
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
https://sixfeetup.com/company/news/90-talks-and-tutorials-from-2022-python-web-conference-released
https://www.youtube.com/playlist?list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl
https://www.youtube.com/watch?v=H88re4p-DoU&list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl&index=62
We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.
After this session you will be building real-time streaming and messaging applications with Python.
#PWC2022 attracted nearly 375 attendees from 36 countries and 21 time zones making it the biggest and best year yet. The highly engaging format featured 90 speakers, 6 tracks (including 80 talks and 4 tutorials) and took place virtually on March 21-25, 2022 on LoudSwarm by Six Feet Up.
Apache Pulsar
Apache Flink
Apache Kafka
MQTT
AMQP/RabbitMQ
WebSockets
Python3
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfTimothy Spann
MLconf 2022 NYC Event-Driven Machine Learning at Scale
https://mlconf.com/agenda/mlconf-2022-NYC/
https://mlconf.com/speakers/timothy-spann/
Bio
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Upcoming Abstract Summary
Event-Driven Machine Learning at Scale
Utilizing the FLiP stack we can drive NLP and other machine learning classifications in the cloud, on-premise, hybrid and at the edge utilizing the open source Apache Pulsar project. We often utilize the full FLiPN stack which includes Apache Flink, Apache NiFi and Apache Pulsar.
Apache Pulsar is a unified messaging platform that can host and execute classifications as events arrive asynchronously to topics. We deploy our functionality in lightweight functions that can run as processes, threads or in K8. We often use the serverless FunctionMesh to run these. This allows us to run machine learning classes built in Go, Python and Java. We will show you utilizing Vader Sentiment, Pytorch, TensorFlow, DJL.AI, MXNet and other machine learning options with ease.
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Living the Stream Dream with Pulsar and Spring Boot
https://events.geekle.us/java23
feb 21, 2023
https://geekle.us/schedule/java23
Living the Stream Dream with Pulsar and Spring Boot
For building Java applications, Spring is the universal answer as it supplies all the connectors and integrations one could want. The same is true for Apache Pulsar as it provides connectors, integration and flexibility to any use case. Apache Pulsar has a robust native Java library to use with Spring as well as other protocol options.
Apache Pulsar provides a cloud native, geo-replicated unified messaging platform that allows for many messaging paradigms. This lends itself well to upgrading existing applications as Pulsar supports using libraries for WebSockets, MQTT, Kafka, JMS, AMQP and RocketMQ. In this talk I will build some example applications utilizing several different protocols for building a variety of applications from IoT to Microservices to Log Analytics.
We will build Spring Boot microservices that utilize Apache Pulsar as a central data hub for communications and enrichment. This utilizes the new Spring for Pulsar framework.
https://github.com/spring-projects-experimental/spring-pulsar
We talked about it on Josh Long's podcast https://spring.io/blog/2022/09/15/a-bootiful-podcast-big-data-legend-former-pivot-and-friend-to-the-spring-community-tim-spann
programming real-time applications with Java 17, Spring Boot and Apache Pulsar 2.11
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal/
Tim Spann
Streamnative
https://github.com/tspannhw/pulsar-pychat-function
https://events.dataidols.com/talks/using-the-flipn-stack-for-edge-ai-flink-nifi-pulsar/
Hosted by Karen Jean-Francois.
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Data Idols - Summer School - Data Science
bigdata 2022_ FLiP Into Pulsar Apps
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
FLiP Into Pulsar Apps
08:30 – 09:15
•
23 Nov, 2022
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
Unified Messaging and Data Streaming 101Timothy Spann
https://budapestdata.hu/2022/en/speakers/timothy-spann/
Unified Messaging and Data Streaming 101.pdf
Unified Messaging and Data Streaming 101
Democratizing the ability to build streaming data pipelines will help turn everyone who needs streaming data to be able to do it themselves. By utilizing the open source streaming platform Apache Pulsar enhanced with Apache Flink, we can build messages and data streaming applications utilizing the one unified data platform.
Apache Pulsar supports multiple protocols including Pulsar, AMQP, Kafka, MQTT and RocketMQ. It also supports a variety of message subscription models to support all your workloads from batch, work queues, exactly once streaming and more. With the ability to run cloud native on all your K8, VM, container and cloud platforms. Pulsar has built-in geo-replication, scalability, separate of compute and storage, function support and high reliability. Let’s get streaming.
I will also touch of utilizing Apache Spark and Apache NiFi with Apache Pulsar.
Timothy Spann
Developer Advocate, StreamNative
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Flink + Pulsar + Spark + NiFi
FLiPNs
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!?!??
My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
https://www.datainmotion.dev/2022/04/flip-py-pi-enviroplus-using-apache.html
https://dzone.com/articles/pulsar-in-python-on-pi
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
(Current22) Let's Monitor The Conditions at the Conference
Let's Monitor The Conditions at the Conference
Session Time11:15 am - 12:00 pm Session DateWednesday, 5 October 2022 Session Type:In-Person Location:Ballroom G
Session Description:
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk. Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!? My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
Timothy Spann, StreamNative
Developer Advocate
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC.
CODEONTHEBEACH_Streaming Applications with Apache PulsarTimothy Spann
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
https://www.codeonthebeach.com/schedule
Deep Dive into Building Streaming Applications with Apache Pulsar - Timothy Spann
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics.
We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript.
After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
JConf.dev 2022 - Apache Pulsar Development 101 with Java
https://2022.jconf.dev/
In this session I will get you started with real-time cloud native streaming programming with Java. We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Java libraries including native Java client, AMQP/RabbitMQ, MQTT and even Kafka. After this session you will building real-time streaming and messaging applications with Java. We will also touch on Apache Spark and Apache Flink.
Timothy Spann
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
tim spann developer advocate
streamnative
datainmotion.dev
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8Timothy Spann
https://github.com/tspannhw/FLiPN-Conf42-KubeNative-2022
https://www.conf42.com/Kube_Native_2022_Tim_Spann_realtime_pulsar_apps_k8
I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi.
Apache Pulsar
StreamNative
FLiPN Stack
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...Timothy Spann
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Applications on K8
https://github.com/tspannhw/FLiPN-DoKDayNA2022
Once you have built a topic in Apache Pulsar, you will quickly see the need to build event-driven applications. This can require a lot of decisions on what framework to use, where to run it, how to deploy it, and how to manage these applications on Kubernetes cloud natively.
I will walk you through step-by-step in building Pulsar Functions which is the easy way to design, test, develop, integrate, deploy, monitor, and manage serverless streaming applications in Java and Python.
Together we will build a full application as an Apache Pulsar function and enjoy the power of running it in the cloud for IoT events and add any routing, transformation, or machine learning that we need to accomplish our business requirements.
Through FunctionMesh we run on Kubernetes natively.
In this talk, you will deploy ML functions to transform real-time data on Kubernetes.
Fast Streaming into Clickhouse with Apache PulsarTimothy Spann
https://github.com/tspannhw/SpeakerProfile/tree/main/2022/talks
Fast Streaming into Clickhouse with Apache Pulsar
https://github.com/tspannhw/FLiPC-FastStreamingIntoClickhouseWithApachePulsar
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/285271332/
Fast Streaming into Clickhouse with Apache Pulsar - Meetup 2022
StreamNative - Apache Pulsar - Stream to Altinity Cloud - Clickhouse
May the 4th Be With You!
04-May-2022 Clickhosue Meetup
CREATE TABLE iotjetsonjson_local
(
uuid String,
camera String,
ipaddress String,
networktime String,
top1pct String,
top1 String,
cputemp String,
gputemp String,
gputempf String,
cputempf String,
runtime String,
host String,
filename String,
host_name String,
macaddress String,
te String,
systemtime String,
cpu String,
diskusage String,
memory String,
imageinput String
)
ENGINE = MergeTree()
PARTITION BY uuid
ORDER BY (uuid);
CREATE TABLE iotjetsonjson ON CLUSTER '{cluster}' AS iotjetsonjson_local
ENGINE = Distributed('{cluster}', default, iotjetsonjson_local, rand());
select uuid, top1pct, top1, gputempf, cputempf
from iotjetsonjson
where toFloat32OrZero(top1pct) > 40
order by toFloat32OrZero(top1pct) desc, systemtime desc
select uuid, systemtime, networktime, te, top1pct, top1, cputempf, gputempf, cpu, diskusage, memory,filename
from iotjetsonjson
order by systemtime desc
select top1, max(toFloat32OrZero(top1pct)), max(gputempf), max(cputempf)
from iotjetsonjson
group by top1
select top1, max(toFloat32OrZero(top1pct)) as maxTop1, max(gputempf), max(cputempf)
from iotjetsonjson
group by top1
order by maxTop1
Tim Spann
Developer Advocate
StreamNative
In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds.
Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order.
Building Modern Data Streaming Apps with PythonTimothy Spann
Building Modern Data Streaming Apps with Python
https://www.conf42.com/DevOps_2023_Tim_Spann_modern_data_streaming_apps
Share
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into various data stores. We use Python for microservices as Functions and stand-alone apps. We use the best streaming tools for the current applications with FLiPN. https://www.flipn.app/
Timothy Spann
Developer Advocate
https://datainmotion.dev/
Princeton Dec 2022 Meetup_ NiFi + Flink + PulsarTimothy Spann
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Streaming Data Platform for cloud-native event-driven applications
https://github.com/tspannhw/pulsar-csp-ce/blob/main/weather.md
https://github.com/tspannhw/create-nifi-pulsar-flink-apps
https://medium.com/@tspann/using-apache-pulsar-with-cloudera-sql-builder-apache-flink-b518aa9eadff
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/
For non-locals, we will Broadcast Live via Youtube. Sign up and we will send out the link.
Location:
TigerLabs in Princeton on the 2nd floor, walk up and the door will be open. Same that we were using for the old Future of Data - Princeton events 2016-2019.
Parking at the school is free. street parking nearby is free. there are meters on some streets, and a few blocks away is a paid parking garage.
We are joining forces with our friends Cloudera again on a FLiPN amazing journey into Real-Time Streaming Applications with Apache Flink, Apache NiFi, and Apache Pulsar.
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Apache NiFi
Apache Pulsar
Apache Flink
Flink SQL
We will show you how to build apps, so download beforehand to Docker, K8, your Laptop, or the cloud.
Cloudera CSP Setup
Getting Started with Cloudera Stream Processing Community Edition
You may download CSP-CE here:
Cloudera Stream Processing Community Edition
The Cloudera CDP User's page:
CDP Resources Page
https://youtu.be/s80sz3NWwHo
https://docs.cloudera.com/csp-ce/latest/index.html
https://www.cloudera.com/downloads/cdf/csp-community-edition.html
Apache Pulsar
https://pulsar.apache.org/docs/getting-started-standalone/
or
https://streamnative.io/free-cloud/
Cloudera + Pulsar
https://community.cloudera.com/t5/Cloudera-Stream-Processing-Forum/Using-Apache-Pulsar-with-SQL-Stream-Builder/m-p/349917
https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-Pulsar-for-Streaming/ta-p/337891
|AGENDA|
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
6:30 - 7:15 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:15 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Round Table on Real-Time Streaming, Q&A
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, dist
DevNexus: Apache Pulsar Development 101 with JavaTimothy Spann
DevNexus: Apache Pulsar Development 101 with Java
06-april-2023 georgia conference center atlanta, ga
https://devnexus.com/presentations/apache-pulsar-development-101-with-java
Abstract
In this session I will get you started with real-time cloud native streaming programming with Java. We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Java libraries including native Java client, websockets, MQTT and even Kafka. After this session you will building real-time streaming and messaging applications with Java. We will also touch on Apache Spark and Apache Flink.
Timothy Spann
Tim Spann is a Developer Advocate @ Cloudera where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal/
Tim Spann
Streamnative
https://github.com/tspannhw/pulsar-pychat-function
https://events.dataidols.com/talks/using-the-flipn-stack-for-edge-ai-flink-nifi-pulsar/
Hosted by Karen Jean-Francois.
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Data Idols - Summer School - Data Science
bigdata 2022_ FLiP Into Pulsar Apps
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
FLiP Into Pulsar Apps
08:30 – 09:15
•
23 Nov, 2022
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
Unified Messaging and Data Streaming 101Timothy Spann
https://budapestdata.hu/2022/en/speakers/timothy-spann/
Unified Messaging and Data Streaming 101.pdf
Unified Messaging and Data Streaming 101
Democratizing the ability to build streaming data pipelines will help turn everyone who needs streaming data to be able to do it themselves. By utilizing the open source streaming platform Apache Pulsar enhanced with Apache Flink, we can build messages and data streaming applications utilizing the one unified data platform.
Apache Pulsar supports multiple protocols including Pulsar, AMQP, Kafka, MQTT and RocketMQ. It also supports a variety of message subscription models to support all your workloads from batch, work queues, exactly once streaming and more. With the ability to run cloud native on all your K8, VM, container and cloud platforms. Pulsar has built-in geo-replication, scalability, separate of compute and storage, function support and high reliability. Let’s get streaming.
I will also touch of utilizing Apache Spark and Apache NiFi with Apache Pulsar.
Timothy Spann
Developer Advocate, StreamNative
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Flink + Pulsar + Spark + NiFi
FLiPNs
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!?!??
My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
https://www.datainmotion.dev/2022/04/flip-py-pi-enviroplus-using-apache.html
https://dzone.com/articles/pulsar-in-python-on-pi
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
(Current22) Let's Monitor The Conditions at the Conference
Let's Monitor The Conditions at the Conference
Session Time11:15 am - 12:00 pm Session DateWednesday, 5 October 2022 Session Type:In-Person Location:Ballroom G
Session Description:
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk. Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!? My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
Timothy Spann, StreamNative
Developer Advocate
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC.
CODEONTHEBEACH_Streaming Applications with Apache PulsarTimothy Spann
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
https://www.codeonthebeach.com/schedule
Deep Dive into Building Streaming Applications with Apache Pulsar - Timothy Spann
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics.
We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript.
After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
JConf.dev 2022 - Apache Pulsar Development 101 with Java
https://2022.jconf.dev/
In this session I will get you started with real-time cloud native streaming programming with Java. We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Java libraries including native Java client, AMQP/RabbitMQ, MQTT and even Kafka. After this session you will building real-time streaming and messaging applications with Java. We will also touch on Apache Spark and Apache Flink.
Timothy Spann
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
tim spann developer advocate
streamnative
datainmotion.dev
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8Timothy Spann
https://github.com/tspannhw/FLiPN-Conf42-KubeNative-2022
https://www.conf42.com/Kube_Native_2022_Tim_Spann_realtime_pulsar_apps_k8
I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi.
Apache Pulsar
StreamNative
FLiPN Stack
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...Timothy Spann
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Applications on K8
https://github.com/tspannhw/FLiPN-DoKDayNA2022
Once you have built a topic in Apache Pulsar, you will quickly see the need to build event-driven applications. This can require a lot of decisions on what framework to use, where to run it, how to deploy it, and how to manage these applications on Kubernetes cloud natively.
I will walk you through step-by-step in building Pulsar Functions which is the easy way to design, test, develop, integrate, deploy, monitor, and manage serverless streaming applications in Java and Python.
Together we will build a full application as an Apache Pulsar function and enjoy the power of running it in the cloud for IoT events and add any routing, transformation, or machine learning that we need to accomplish our business requirements.
Through FunctionMesh we run on Kubernetes natively.
In this talk, you will deploy ML functions to transform real-time data on Kubernetes.
Fast Streaming into Clickhouse with Apache PulsarTimothy Spann
https://github.com/tspannhw/SpeakerProfile/tree/main/2022/talks
Fast Streaming into Clickhouse with Apache Pulsar
https://github.com/tspannhw/FLiPC-FastStreamingIntoClickhouseWithApachePulsar
https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/285271332/
Fast Streaming into Clickhouse with Apache Pulsar - Meetup 2022
StreamNative - Apache Pulsar - Stream to Altinity Cloud - Clickhouse
May the 4th Be With You!
04-May-2022 Clickhosue Meetup
CREATE TABLE iotjetsonjson_local
(
uuid String,
camera String,
ipaddress String,
networktime String,
top1pct String,
top1 String,
cputemp String,
gputemp String,
gputempf String,
cputempf String,
runtime String,
host String,
filename String,
host_name String,
macaddress String,
te String,
systemtime String,
cpu String,
diskusage String,
memory String,
imageinput String
)
ENGINE = MergeTree()
PARTITION BY uuid
ORDER BY (uuid);
CREATE TABLE iotjetsonjson ON CLUSTER '{cluster}' AS iotjetsonjson_local
ENGINE = Distributed('{cluster}', default, iotjetsonjson_local, rand());
select uuid, top1pct, top1, gputempf, cputempf
from iotjetsonjson
where toFloat32OrZero(top1pct) > 40
order by toFloat32OrZero(top1pct) desc, systemtime desc
select uuid, systemtime, networktime, te, top1pct, top1, cputempf, gputempf, cpu, diskusage, memory,filename
from iotjetsonjson
order by systemtime desc
select top1, max(toFloat32OrZero(top1pct)), max(gputempf), max(cputempf)
from iotjetsonjson
group by top1
select top1, max(toFloat32OrZero(top1pct)) as maxTop1, max(gputempf), max(cputempf)
from iotjetsonjson
group by top1
order by maxTop1
Tim Spann
Developer Advocate
StreamNative
In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds.
Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order.
Building Modern Data Streaming Apps with PythonTimothy Spann
Building Modern Data Streaming Apps with Python
https://www.conf42.com/DevOps_2023_Tim_Spann_modern_data_streaming_apps
Share
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into various data stores. We use Python for microservices as Functions and stand-alone apps. We use the best streaming tools for the current applications with FLiPN. https://www.flipn.app/
Timothy Spann
Developer Advocate
https://datainmotion.dev/
Princeton Dec 2022 Meetup_ NiFi + Flink + PulsarTimothy Spann
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Streaming Data Platform for cloud-native event-driven applications
https://github.com/tspannhw/pulsar-csp-ce/blob/main/weather.md
https://github.com/tspannhw/create-nifi-pulsar-flink-apps
https://medium.com/@tspann/using-apache-pulsar-with-cloudera-sql-builder-apache-flink-b518aa9eadff
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/
For non-locals, we will Broadcast Live via Youtube. Sign up and we will send out the link.
Location:
TigerLabs in Princeton on the 2nd floor, walk up and the door will be open. Same that we were using for the old Future of Data - Princeton events 2016-2019.
Parking at the school is free. street parking nearby is free. there are meters on some streets, and a few blocks away is a paid parking garage.
We are joining forces with our friends Cloudera again on a FLiPN amazing journey into Real-Time Streaming Applications with Apache Flink, Apache NiFi, and Apache Pulsar.
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Apache NiFi
Apache Pulsar
Apache Flink
Flink SQL
We will show you how to build apps, so download beforehand to Docker, K8, your Laptop, or the cloud.
Cloudera CSP Setup
Getting Started with Cloudera Stream Processing Community Edition
You may download CSP-CE here:
Cloudera Stream Processing Community Edition
The Cloudera CDP User's page:
CDP Resources Page
https://youtu.be/s80sz3NWwHo
https://docs.cloudera.com/csp-ce/latest/index.html
https://www.cloudera.com/downloads/cdf/csp-community-edition.html
Apache Pulsar
https://pulsar.apache.org/docs/getting-started-standalone/
or
https://streamnative.io/free-cloud/
Cloudera + Pulsar
https://community.cloudera.com/t5/Cloudera-Stream-Processing-Forum/Using-Apache-Pulsar-with-SQL-Stream-Builder/m-p/349917
https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-Pulsar-for-Streaming/ta-p/337891
|AGENDA|
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
6:30 - 7:15 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:15 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Round Table on Real-Time Streaming, Q&A
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, dist
DevNexus: Apache Pulsar Development 101 with JavaTimothy Spann
DevNexus: Apache Pulsar Development 101 with Java
06-april-2023 georgia conference center atlanta, ga
https://devnexus.com/presentations/apache-pulsar-development-101-with-java
Abstract
In this session I will get you started with real-time cloud native streaming programming with Java. We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Java libraries including native Java client, websockets, MQTT and even Kafka. After this session you will building real-time streaming and messaging applications with Java. We will also touch on Apache Spark and Apache Flink.
Timothy Spann
Tim Spann is a Developer Advocate @ Cloudera where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963
Similar to OSA Con 2022: Streaming Data Made Easy (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Building Real-Time Pipelines With FLaNK
Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Apache NiFi
Apache Kafka
Apache Flink
Apache Iceberg
LLM
Generative AI
Slack
Postgresql
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Gen AI on Enterprise Cloud
Apache NiFi
Milvus
Apache Kafka
Apache Flink
Cloudera Machine Learning
Cloudera DataFlow
https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa
https://www.meetup.com/futureofdata-princeton/events/300737266/
https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY
If you're interested in working with Generative AI on the cloud, this virtual workshop is for you.
Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale.
9:00 - 9:05: Intro
9:05 - 9:15: What is Milvus
9:15 - 9:25: Cloudera Development Platform
9:25 - 10:00: Demo
Location
https://www.youtube.com/watch?v=IfWIzKsoHnA
https://github.com/tspannhw/SpeakerProfile
https://www.linkedin.com/in/yujiantang/
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
https://www.youtube.com/watch?v=Yeua8NlzQ3Y
https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming
Adding Generative AI to Real-Time Streaming Pipelines
Abstract
Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results.
Summary
Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there.
Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive.
Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into.
So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking.
Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
https://xtremej.dev/2023/schedule/
Building Real-time Pipelines with FLaNK: A Case Study with Transit Data
Overview of the problem, the application (code walkthru and running), overview of FLaNK, introduction to NiFi, introduction to Kafka, and introduction to Flink.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
https://princetonacm.acm.org/tcfpro/
18th Annual IEEE IT Professional Conference (ITPC)
Armstrong Hall at The College of New Jersey
Friday, March 15th, 2024 | 10:00 AM to 5:00 PM
IT Professional Conference at Trenton Computer Festival
IEEE Information Technology Professional Conference on Friday, March 15th, 2024
TCFPro24 Building Real-Time Generative AI Pipelines
Building Real-Time Generative AI Pipelines
In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms.
From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment.
Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time.
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines
https://www.meetup.com/futureofdata-newyork/events/298660453/
Unlocking Financial Data with Real-Time Pipelines
(Flink Analytics on Stocks with SQL )
By Timothy Spann
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases.
Timothy Spann
Principal Developer Advocate, Cloudera
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
https://twitter.com/PaaSDev
https://www.linkedin.com/in/timothyspann/
https://medium.com/@tspann
https://github.com/tspannhw/FLiPStackWeekly/
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM
Abstract
In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data.
Karin Wolok
Developer Relations, Dev Marketing, and Community Programming @ Project Elevate
Karin Wolok's LinkedIn account Karin Wolok's twitter account
Tim Spann
Principal Developer Advocate @ Cloudera
Tim Spann's LinkedIn account Tim Spann's twitter account
https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Unlocking Financial Data with Real-Time Pipelines
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an
Building Real-time Travel Alerts
In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts.
We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development.
Let's get streaming.
Apache Flink
Apache Kafka
Apache NiFi
FLaNK Stack
Tim Spann
Big Data Conference Europe 2023
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
JConWorld: Continuous SQL with Kafka and Flink
In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.
We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html
https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
2. ● Apache Pulsar Committer | Author of Pulsar In Action
● Former Principal Software Engineer on Splunk’s messaging
team that is responsible for Splunk’s internal
Pulsar-as-a-Service platform.
● Former Director of Solution Architecture at Streamlio.
David
Kjerrumgaard
Sales Engineer
3. ● Author of Pulsar In Action.
● Co-Author of Practical Hive
https://streamnative.io/download/manning-ebook-apache-pulsar-in-action
David Kjerrumgaard
Developer Advocate
4. Tim Spann
Developer Advocate
Tim Spann
Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi,
Big Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and
experience at both global conferences and through individual conversations.
5. FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
6. Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
9. Cloud native with decoupled storage and
compute layers.
Built-in compatibility with your existing
code and messaging infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management and
oversight.
Elastic horizontal and vertical scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model supports a wide
array of use cases.
Compatible with the tools you use to store,
analyze, and process data.
Enterprise Features
10. ● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata
Storage
Pulsar Cluster
12. If you want to... Do this... And use these subscription
modes...
achieve fanout messaging
among consumers
specify a unique subscription name
for each consumer
exclusive (or failover)
achieve message queuing
among consumers
share the same subscription name
among multiple consumers
shared
allow for ordered
consumption (streaming)
distribute work among any number
of consumers
exclusive, failover or key
shared
Fanout, Queueing, or Streaming
13. Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Messages
14. Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Integrated Schema Registry
15. Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();
Flexible Pub/Sub API for Pulsar - Shared
16. Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();
Flexible Pub/Sub API for Pulsar - Failover
17. byte[] msgIdBytes = // Some byte
array
MessageId id =
MessageId.fromByteArray(msgIdBytes);
Reader<byte[]> reader =
pulsarClient.newReader()
.topic(topic)
.startMessageId(id)
.create();
Create a reader that will read from some
message between earliest and latest.
Reader
Reader Interface
18. Producer<String> producer = client.newProducer(Schema.STRING)
.topic("hellotopic")
.blockIfQueueFull(true) // enable blocking
// when queue is full
.maxPendingMessages(10) // max queue size
.create();
// During Back Pressure: the sendAsync call blocks
// with no room in queues
producer.newMessage()
.key("mykey")
.value("myvalue")
.sendAsync(); // can be a blocking call
Built-in Back Pressure
19. ● In many use cases, applications want to consume data from a
Pulsar Topic as if it were a database table, where each new
message is an “update” to the table.
● Up until now, applications used Pulsar consumers or readers to
fetch all the updates from a topic and construct a map with the
latest value of each key for received messages.
● The Table Abstraction provides a standard implementation of this
message consumption pattern for any given keyed topic.
Table Abstraction
20. ● When you create a TableView, and
additional compacted topic is
created.
● In a compacted topic, only the most
recent value associated with each
key is retained.
● A background reader consumes
from the compacted topic and
updates the map when new
messages arrive.
How Does it Work?
26. ● Lightweight computation similar
to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Java Functions
A serverless event
streaming framework
Pulsar Functions
27. ● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
30. from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
fields = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(fields["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(fields["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
Pulsar Python NLP Function
31. Pulsar Functions, along with Pulsar
IO/Connectors, provide a powerful API for
ingesting, transforming, and outputting
data.
Function Mesh, another StreamNative
project, makes it easier for developers to
create entire applications built from
sources, functions, and sinks all through a
declarative API.
Function Mesh
36. ● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink
37. Apache Flink SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;
44. Founded by the original creators of Apache Pulsar.
StreamNative has more experience designing,
deploying, and running large-scale Apache Pulsar
instances than any team in the world.
StreamNative employs more than 50% of the
active core committers to Apache Pulsar.
45. Scan the QR code to
learn more about
Apache Pulsar and
StreamNative.
46. Scan the QR code
to build your own
apps today.