OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
tim spann developer advocate
streamnative
datainmotion.dev
CODEONTHEBEACH_Streaming Applications with Apache PulsarTimothy Spann
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
https://www.codeonthebeach.com/schedule
Deep Dive into Building Streaming Applications with Apache Pulsar - Timothy Spann
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics.
We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript.
After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
Building Modern Data Streaming Apps with PythonTimothy Spann
Building Modern Data Streaming Apps with Python
https://www.conf42.com/DevOps_2023_Tim_Spann_modern_data_streaming_apps
Share
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into various data stores. We use Python for microservices as Functions and stand-alone apps. We use the best streaming tools for the current applications with FLiPN. https://www.flipn.app/
Timothy Spann
Developer Advocate
https://datainmotion.dev/
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Timothy Spann
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
Using FLiP with InfluxDB for EdgeAI IoT at Scale
Date: Thursday, February 10th at 11am PT / 2pm ET
Join this webcast as Timothy from StreamNative takes you on a hands-on deep-dive using Pulsar, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano.
The team runs deep learning models on the edge devices, sends images, and captures real-time GPS and sensor data. Their low-coding IoT applications provide easy edge routing, transformation, data acquisition and alerting before they decide what data to stream in real-time to their data space. These edge applications classify images and sensor readings in real-time at the edge and then send Deep Learning results to Flink SQL and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to InfluxDB.
In this session you will learn how to:
Build an end-to-end streaming edge app
Pull messages from Pulsar topics and persists the messages to InfluxDB
Build a data stream for IoT with NiFi and InfluxDB
Use Apache Flink + Apache Pulsar
Timothy Spann, Developer Advocate, StreamNative
Tim Spann is a Developer Advocate at StreamNative where he works with Apache NiFi, MiniFi, Kafka, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale
Tim Spann
Developer Advocate
StreamNative
datainmotion.dev
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
https://sixfeetup.com/company/news/90-talks-and-tutorials-from-2022-python-web-conference-released
https://www.youtube.com/playlist?list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl
https://www.youtube.com/watch?v=H88re4p-DoU&list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl&index=62
We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.
After this session you will be building real-time streaming and messaging applications with Python.
#PWC2022 attracted nearly 375 attendees from 36 countries and 21 time zones making it the biggest and best year yet. The highly engaging format featured 90 speakers, 6 tracks (including 80 talks and 4 tutorials) and took place virtually on March 21-25, 2022 on LoudSwarm by Six Feet Up.
Apache Pulsar
Apache Flink
Apache Kafka
MQTT
AMQP/RabbitMQ
WebSockets
Python3
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
Python web conference 2022 apache pulsar development 101 with python (FLiP-Py)
What is Apache Pulsar?
Python 3 Coding
Python Consumers
Python Producers
Python via MQTT, Web Sockets, Kafka
Python for Pulsar Functions
Schemas
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
tim spann developer advocate
streamnative
datainmotion.dev
CODEONTHEBEACH_Streaming Applications with Apache PulsarTimothy Spann
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
https://www.codeonthebeach.com/schedule
Deep Dive into Building Streaming Applications with Apache Pulsar - Timothy Spann
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics.
We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript.
After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
Building Modern Data Streaming Apps with PythonTimothy Spann
Building Modern Data Streaming Apps with Python
https://www.conf42.com/DevOps_2023_Tim_Spann_modern_data_streaming_apps
Share
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into various data stores. We use Python for microservices as Functions and stand-alone apps. We use the best streaming tools for the current applications with FLiPN. https://www.flipn.app/
Timothy Spann
Developer Advocate
https://datainmotion.dev/
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Timothy Spann
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
Using FLiP with InfluxDB for EdgeAI IoT at Scale
Date: Thursday, February 10th at 11am PT / 2pm ET
Join this webcast as Timothy from StreamNative takes you on a hands-on deep-dive using Pulsar, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano.
The team runs deep learning models on the edge devices, sends images, and captures real-time GPS and sensor data. Their low-coding IoT applications provide easy edge routing, transformation, data acquisition and alerting before they decide what data to stream in real-time to their data space. These edge applications classify images and sensor readings in real-time at the edge and then send Deep Learning results to Flink SQL and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to InfluxDB.
In this session you will learn how to:
Build an end-to-end streaming edge app
Pull messages from Pulsar topics and persists the messages to InfluxDB
Build a data stream for IoT with NiFi and InfluxDB
Use Apache Flink + Apache Pulsar
Timothy Spann, Developer Advocate, StreamNative
Tim Spann is a Developer Advocate at StreamNative where he works with Apache NiFi, MiniFi, Kafka, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale
Tim Spann
Developer Advocate
StreamNative
datainmotion.dev
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
https://sixfeetup.com/company/news/90-talks-and-tutorials-from-2022-python-web-conference-released
https://www.youtube.com/playlist?list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl
https://www.youtube.com/watch?v=H88re4p-DoU&list=PLt4L3V8wVnF7PJ3wfq1rdJWX4ziasHMHl&index=62
We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.
After this session you will be building real-time streaming and messaging applications with Python.
#PWC2022 attracted nearly 375 attendees from 36 countries and 21 time zones making it the biggest and best year yet. The highly engaging format featured 90 speakers, 6 tracks (including 80 talks and 4 tutorials) and took place virtually on March 21-25, 2022 on LoudSwarm by Six Feet Up.
Apache Pulsar
Apache Flink
Apache Kafka
MQTT
AMQP/RabbitMQ
WebSockets
Python3
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
Python web conference 2022 apache pulsar development 101 with python (FLiP-Py)
What is Apache Pulsar?
Python 3 Coding
Python Consumers
Python Producers
Python via MQTT, Web Sockets, Kafka
Python for Pulsar Functions
Schemas
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
All Day DevOps - FLiP Stack for Cloud Data LakesTimothy Spann
https://www.alldaydevops.com/addo-speakers/timothy-spann
Timothy Spann
StreamNative
MODERN INFRASTRUCTURE
SHARE THIS SESSION
Session Name: FLiP Stack for Cloud Data Lakes
Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar, and Apache NiFi. We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring.
Speaker Bio:
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, Pulsar, NiFi, the blockchain, and Spark.
Sink Your Teeth into Streaming at Any ScaleTimothy Spann
Sink Your Teeth into Streaming at Any Scale
https://www.scylladb.com/2023/01/26/scylladb-summit-for-the-scylladb-curious-serious-sea-monsters/
Sink Your Teeth into Streaming at Any Scale
Timothy Spann & David Kjerrumgaard, StreamNative
How to build a low-latency scalable platform for today’s massively data-intensive real-time streaming applications using ScyllaDB, Pulsar, and Flink.
Sink Your Teeth into Streaming at Any ScaleScyllaDB
Using the low-latency Apache Pulsar we can build up millions of streams of concurrent data and join them in real time with Apache Flink. We need an ultra-low latency database that can support these workloads to build next-generation IoT, financial and instant analytical transit applications
By sinking data into ScyllaDB we enable amazingly fast applications that can grow to any size and join with existing data sources.
The next generation of apps is being built now, you must choose the right low-latency scalable platform for these massively data-intensive applications. ScyllaDB + Pulsar + Flink is that platform. Choose Open, Choose Fast, and Make the right choice.
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesTimothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
https://www.conf42.com/Python_2023_Tim_Spann_David_Kjerrumgaard_ml_enhanced_event_streaming_apps_microservices
Build ML Enhanced Event Streaming Apps with Python Microservices
TIM SPANN
PRINCIPAL DEVELOPER ADVOCATE @ STREAMNATIVE
Tim Spann's LinkedIn account Tim Spann's twitter account
DAVID KJERRUMGAARD
DEVELOPER ADVOCATE @ STREAMNATIVE
David Kjerrumgaard's LinkedIn account
Share
The easy way to build and scale machine learning apps.
The Next Generation of Streaming
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/291048765/
https://www.meetup.com/risingwave-community/events/291294056/
The Next Generation of Streaming
David Kjerrumgaard, Developer Advocate
Tim Spann
Building a cloud-native streaming application requires a team, from the start of events to the final analytics. I'll walk you through using Apache Pulsar as your central messaging hub to distribute events and real-time messages from and to everywhere it needs to be. All instantly. The next step is to stream into continuous SQL.
Agenda
9:00 -- 9:20 The Next Generation of Streaming by Tim Spann,
Developer Advocate at StreamNative
9:20 -- 9:40 RisingWave: Cloud-Native Streaming SQL over Pulsar by Rayees Pasha, Head of Product at RisingWave Labs
23-feb-2023 12:15pm EST
https://github.com/tspannhw/Flow-SGP30-MLX90640
Apache NiFi, Apache Pulsar, Apache Kafka, MQTT, RabbitMQ, AMQP, Source, Sink, Stream Processing
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal/
Tim Spann
Streamnative
https://github.com/tspannhw/pulsar-pychat-function
https://events.dataidols.com/talks/using-the-flipn-stack-for-edge-ai-flink-nifi-pulsar/
Hosted by Karen Jean-Francois.
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Data Idols - Summer School - Data Science
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Living the Stream Dream with Pulsar and Spring Boot
https://events.geekle.us/java23
feb 21, 2023
https://geekle.us/schedule/java23
Living the Stream Dream with Pulsar and Spring Boot
For building Java applications, Spring is the universal answer as it supplies all the connectors and integrations one could want. The same is true for Apache Pulsar as it provides connectors, integration and flexibility to any use case. Apache Pulsar has a robust native Java library to use with Spring as well as other protocol options.
Apache Pulsar provides a cloud native, geo-replicated unified messaging platform that allows for many messaging paradigms. This lends itself well to upgrading existing applications as Pulsar supports using libraries for WebSockets, MQTT, Kafka, JMS, AMQP and RocketMQ. In this talk I will build some example applications utilizing several different protocols for building a variety of applications from IoT to Microservices to Log Analytics.
We will build Spring Boot microservices that utilize Apache Pulsar as a central data hub for communications and enrichment. This utilizes the new Spring for Pulsar framework.
https://github.com/spring-projects-experimental/spring-pulsar
We talked about it on Josh Long's podcast https://spring.io/blog/2022/09/15/a-bootiful-podcast-big-data-legend-former-pivot-and-friend-to-the-spring-community-tim-spann
programming real-time applications with Java 17, Spring Boot and Apache Pulsar 2.11
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Building an Event Streaming Architecture with Apache PulsarScyllaDB
What is Apache Pulsar? How does it differ from other event streaming technologies available? StreamNative Developer Advocate Tim Spann will walk you through the features and architecture of this increasingly popular event streaming system, along with best practices for streaming and storing your data.
bigdata 2022_ FLiP Into Pulsar Apps
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
FLiP Into Pulsar Apps
08:30 – 09:15
•
23 Nov, 2022
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
More Related Content
Similar to OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
All Day DevOps - FLiP Stack for Cloud Data LakesTimothy Spann
https://www.alldaydevops.com/addo-speakers/timothy-spann
Timothy Spann
StreamNative
MODERN INFRASTRUCTURE
SHARE THIS SESSION
Session Name: FLiP Stack for Cloud Data Lakes
Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar, and Apache NiFi. We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring.
Speaker Bio:
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, Pulsar, NiFi, the blockchain, and Spark.
Sink Your Teeth into Streaming at Any ScaleTimothy Spann
Sink Your Teeth into Streaming at Any Scale
https://www.scylladb.com/2023/01/26/scylladb-summit-for-the-scylladb-curious-serious-sea-monsters/
Sink Your Teeth into Streaming at Any Scale
Timothy Spann & David Kjerrumgaard, StreamNative
How to build a low-latency scalable platform for today’s massively data-intensive real-time streaming applications using ScyllaDB, Pulsar, and Flink.
Sink Your Teeth into Streaming at Any ScaleScyllaDB
Using the low-latency Apache Pulsar we can build up millions of streams of concurrent data and join them in real time with Apache Flink. We need an ultra-low latency database that can support these workloads to build next-generation IoT, financial and instant analytical transit applications
By sinking data into ScyllaDB we enable amazingly fast applications that can grow to any size and join with existing data sources.
The next generation of apps is being built now, you must choose the right low-latency scalable platform for these massively data-intensive applications. ScyllaDB + Pulsar + Flink is that platform. Choose Open, Choose Fast, and Make the right choice.
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesTimothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
https://www.conf42.com/Python_2023_Tim_Spann_David_Kjerrumgaard_ml_enhanced_event_streaming_apps_microservices
Build ML Enhanced Event Streaming Apps with Python Microservices
TIM SPANN
PRINCIPAL DEVELOPER ADVOCATE @ STREAMNATIVE
Tim Spann's LinkedIn account Tim Spann's twitter account
DAVID KJERRUMGAARD
DEVELOPER ADVOCATE @ STREAMNATIVE
David Kjerrumgaard's LinkedIn account
Share
The easy way to build and scale machine learning apps.
The Next Generation of Streaming
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/291048765/
https://www.meetup.com/risingwave-community/events/291294056/
The Next Generation of Streaming
David Kjerrumgaard, Developer Advocate
Tim Spann
Building a cloud-native streaming application requires a team, from the start of events to the final analytics. I'll walk you through using Apache Pulsar as your central messaging hub to distribute events and real-time messages from and to everywhere it needs to be. All instantly. The next step is to stream into continuous SQL.
Agenda
9:00 -- 9:20 The Next Generation of Streaming by Tim Spann,
Developer Advocate at StreamNative
9:20 -- 9:40 RisingWave: Cloud-Native Streaming SQL over Pulsar by Rayees Pasha, Head of Product at RisingWave Labs
23-feb-2023 12:15pm EST
https://github.com/tspannhw/Flow-SGP30-MLX90640
Apache NiFi, Apache Pulsar, Apache Kafka, MQTT, RabbitMQ, AMQP, Source, Sink, Stream Processing
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal/
Tim Spann
Streamnative
https://github.com/tspannhw/pulsar-pychat-function
https://events.dataidols.com/talks/using-the-flipn-stack-for-edge-ai-flink-nifi-pulsar/
Hosted by Karen Jean-Francois.
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Data Idols - Summer School - Data Science
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Living the Stream Dream with Pulsar and Spring Boot
https://events.geekle.us/java23
feb 21, 2023
https://geekle.us/schedule/java23
Living the Stream Dream with Pulsar and Spring Boot
For building Java applications, Spring is the universal answer as it supplies all the connectors and integrations one could want. The same is true for Apache Pulsar as it provides connectors, integration and flexibility to any use case. Apache Pulsar has a robust native Java library to use with Spring as well as other protocol options.
Apache Pulsar provides a cloud native, geo-replicated unified messaging platform that allows for many messaging paradigms. This lends itself well to upgrading existing applications as Pulsar supports using libraries for WebSockets, MQTT, Kafka, JMS, AMQP and RocketMQ. In this talk I will build some example applications utilizing several different protocols for building a variety of applications from IoT to Microservices to Log Analytics.
We will build Spring Boot microservices that utilize Apache Pulsar as a central data hub for communications and enrichment. This utilizes the new Spring for Pulsar framework.
https://github.com/spring-projects-experimental/spring-pulsar
We talked about it on Josh Long's podcast https://spring.io/blog/2022/09/15/a-bootiful-podcast-big-data-legend-former-pivot-and-friend-to-the-spring-community-tim-spann
programming real-time applications with Java 17, Spring Boot and Apache Pulsar 2.11
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Building an Event Streaming Architecture with Apache PulsarScyllaDB
What is Apache Pulsar? How does it differ from other event streaming technologies available? StreamNative Developer Advocate Tim Spann will walk you through the features and architecture of this increasingly popular event streaming system, along with best practices for streaming and storing your data.
bigdata 2022_ FLiP Into Pulsar Apps
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
FLiP Into Pulsar Apps
08:30 – 09:15
•
23 Nov, 2022
In this session, Timothy will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools.
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
Similar to OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Building Real-Time Pipelines With FLaNK
Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Apache NiFi
Apache Kafka
Apache Flink
Apache Iceberg
LLM
Generative AI
Slack
Postgresql
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Gen AI on Enterprise Cloud
Apache NiFi
Milvus
Apache Kafka
Apache Flink
Cloudera Machine Learning
Cloudera DataFlow
https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa
https://www.meetup.com/futureofdata-princeton/events/300737266/
https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY
If you're interested in working with Generative AI on the cloud, this virtual workshop is for you.
Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale.
9:00 - 9:05: Intro
9:05 - 9:15: What is Milvus
9:15 - 9:25: Cloudera Development Platform
9:25 - 10:00: Demo
Location
https://www.youtube.com/watch?v=IfWIzKsoHnA
https://github.com/tspannhw/SpeakerProfile
https://www.linkedin.com/in/yujiantang/
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
https://www.youtube.com/watch?v=Yeua8NlzQ3Y
https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming
Adding Generative AI to Real-Time Streaming Pipelines
Abstract
Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results.
Summary
Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there.
Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive.
Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into.
So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking.
Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
https://xtremej.dev/2023/schedule/
Building Real-time Pipelines with FLaNK: A Case Study with Transit Data
Overview of the problem, the application (code walkthru and running), overview of FLaNK, introduction to NiFi, introduction to Kafka, and introduction to Flink.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
https://princetonacm.acm.org/tcfpro/
18th Annual IEEE IT Professional Conference (ITPC)
Armstrong Hall at The College of New Jersey
Friday, March 15th, 2024 | 10:00 AM to 5:00 PM
IT Professional Conference at Trenton Computer Festival
IEEE Information Technology Professional Conference on Friday, March 15th, 2024
TCFPro24 Building Real-Time Generative AI Pipelines
Building Real-Time Generative AI Pipelines
In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms.
From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment.
Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time.
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines
https://www.meetup.com/futureofdata-newyork/events/298660453/
Unlocking Financial Data with Real-Time Pipelines
(Flink Analytics on Stocks with SQL )
By Timothy Spann
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases.
Timothy Spann
Principal Developer Advocate, Cloudera
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
https://twitter.com/PaaSDev
https://www.linkedin.com/in/timothyspann/
https://medium.com/@tspann
https://github.com/tspannhw/FLiPStackWeekly/
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM
Abstract
In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data.
Karin Wolok
Developer Relations, Dev Marketing, and Community Programming @ Project Elevate
Karin Wolok's LinkedIn account Karin Wolok's twitter account
Tim Spann
Principal Developer Advocate @ Cloudera
Tim Spann's LinkedIn account Tim Spann's twitter account
https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Unlocking Financial Data with Real-Time Pipelines
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an
Building Real-time Travel Alerts
In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts.
We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development.
Let's get streaming.
Apache Flink
Apache Kafka
Apache NiFi
FLaNK Stack
Tim Spann
Big Data Conference Europe 2023
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
JConWorld: Continuous SQL with Kafka and Flink
In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.
We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html
https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
1. Deep Dive into Building Streaming
Applications with Apache Pulsar
Tim Spann / Developer Advocate
#ossummit @PaaSDev
2. Deep Dive into Building
Streaming Applications with
Apache Pulsar
3. Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming technologies
including Pulsar, Flink, Spark, NiFi, Spring, Java, Big Data, Cloud,
MXNet, Hadoop, Datalakes, IoT and more.
4. FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
8. ● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
9. Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Messages - the basic unit of Pulsar
10. Key Pulsar Concepts: Messaging vs Streaming
Message Queueing - Queueing
systems are ideal for work
queues that do not require
tasks to be performed in a
particular order.
Streaming - Streaming works
best in situations where the
order of messages is
important.
12. #ossummit
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
20. #ossummit
Pulsar Functions
● Lightweight computation similar to AWS Lambda.
● Specifically designed to use Apache Pulsar as a
message bus.
● Function runtime can be located within Pulsar Broker.
A serverless event streaming framework
21. #ossummit
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
22. #ossummit
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.10.1/apache-pulsar-2.10.1-
bin.tar.gz
tar xvfz apache-pulsar-2.10.1-bin.tar.gz
cd apache-pulsar-2.10.1
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
23. #ossummit
<or> Run in Docker
docker run -it
-p 6650:6650
-p 8080:8080
--mount source=pulsardata,target=/pulsar/data
--mount source=pulsarconf,target=/pulsar/conf
apachepulsar/pulsar:2.10.1
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
24. #ossummit
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conf
bin/pulsar-admin namespaces create conf/europe
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conf
bin/pulsar-admin topics create persistent://conf/europe/first
bin/pulsar-admin topics list conf/europe
25. #ossummit
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.10.1[all]'
Includes AARCH64, ARM, M2, INTEL, …
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
https://pypi.org/project/pulsar-client/2.10.0/#files
26. #ossummit
Building a Python 3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer
client.create_producer('persistent://conf/ete/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
46. #ossummit
Subscribing to a Topic and Setting Subscription Name
Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscribe();
49. #ossummit
Metrics: Broker
Broker metrics are exposed under "/metrics" at port 8080.
You can change the port by updating webServicePort to a different port
in the broker.conf configuration file.
All the metrics exposed by a broker are labeled with
cluster=${pulsar_cluster}.
The name of Pulsar cluster is the value of ${pulsar_cluster},
configured in the broker.conf file.
For more information: https://pulsar.apache.org/docs/en/reference-metrics/#broker