Automation + dev ops summit hail hydrate! from stream to lake
2021
Apache Pulsar, APache NiFi, Apache Flink
StreamNative
https://sessionize.com/app/speaker/session/265189
Tim Spann, Developer Advocate
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
This document provides an overview and summary of Apache Pulsar with MQTT for edge computing. It discusses how Pulsar is an open-source, cloud-native distributed messaging and streaming platform that supports MQTT and other protocols. It also summarizes Pulsar's key capabilities like data durability, scalability, geo-replication, and unified messaging model. The document includes diagrams showcasing Pulsar's publish-subscribe model and different subscription modes. It demonstrates how Pulsar can be used with edge devices via protocols like MQTT and how streams of data from edge can be processed using connectors, functions and SQL.
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
Hail hydrate! from stream to lake using open sourceTimothy Spann
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative
https://osselc21.sched.com/event/lAPi?iframe=no
A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
This document provides an overview and introduction to Apache Pulsar and StreamNative. Some key points:
- Apache Pulsar is an open-source distributed messaging and streaming platform built for cloud-native applications. It provides features like data durability, scalability, geo-replication, and multi-tenancy.
- StreamNative helps companies adopt Pulsar for use cases like building microservices, capturing real-time data, and cloud migrations. They provide commercial support for Pulsar through products like StreamNative Cloud.
- The document discusses how Pulsar works, its key capabilities and milestones, and reference architectures for using it with tools like Apache Flink and ClickHouse for unified messaging, streaming
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
ApacheCon 2021 Apache Deep Learning 302
Tuesday 18:00 UTC
Apache Deep Learning 302
Timothy Spann
This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters.
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
* https://github.com/tspannhw/ApacheDeepLearning302/
* https://github.com/tspannhw/nifi-djl-processor
* https://github.com/tspannhw/nifi-djlsentimentanalysis-processor
* https://github.com/tspannhw/nifi-djlqa-processor
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
This document provides an overview and summary of Apache Pulsar, a distributed streaming and messaging platform. It discusses Pulsar's benefits like data durability, scalability, geo-replication and multi-tenancy. It outlines key use cases like message queuing and data streaming. The document also summarizes Pulsar's architecture, subscriptions modes, connectors, and integration with other technologies like Apache Flink, Apache NiFi and MQTT. It highlights real-world customer implementations and provides demos of ingesting IoT data via Pulsar.
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
This document provides an overview and summary of Apache Pulsar with MQTT for edge computing. It discusses how Pulsar is an open-source, cloud-native distributed messaging and streaming platform that supports MQTT and other protocols. It also summarizes Pulsar's key capabilities like data durability, scalability, geo-replication, and unified messaging model. The document includes diagrams showcasing Pulsar's publish-subscribe model and different subscription modes. It demonstrates how Pulsar can be used with edge devices via protocols like MQTT and how streams of data from edge can be processed using connectors, functions and SQL.
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
Hail hydrate! from stream to lake using open sourceTimothy Spann
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative
https://osselc21.sched.com/event/lAPi?iframe=no
A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
This document provides an overview and introduction to Apache Pulsar and StreamNative. Some key points:
- Apache Pulsar is an open-source distributed messaging and streaming platform built for cloud-native applications. It provides features like data durability, scalability, geo-replication, and multi-tenancy.
- StreamNative helps companies adopt Pulsar for use cases like building microservices, capturing real-time data, and cloud migrations. They provide commercial support for Pulsar through products like StreamNative Cloud.
- The document discusses how Pulsar works, its key capabilities and milestones, and reference architectures for using it with tools like Apache Flink and ClickHouse for unified messaging, streaming
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
ApacheCon 2021 Apache Deep Learning 302
Tuesday 18:00 UTC
Apache Deep Learning 302
Timothy Spann
This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters.
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
* https://github.com/tspannhw/ApacheDeepLearning302/
* https://github.com/tspannhw/nifi-djl-processor
* https://github.com/tspannhw/nifi-djlsentimentanalysis-processor
* https://github.com/tspannhw/nifi-djlqa-processor
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
This document provides an overview and summary of Apache Pulsar, a distributed streaming and messaging platform. It discusses Pulsar's benefits like data durability, scalability, geo-replication and multi-tenancy. It outlines key use cases like message queuing and data streaming. The document also summarizes Pulsar's architecture, subscriptions modes, connectors, and integration with other technologies like Apache Flink, Apache NiFi and MQTT. It highlights real-world customer implementations and provides demos of ingesting IoT data via Pulsar.
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
FLiP Into Trino
FLiP into Trino. Flink Pulsar Trino
Pulsar SQL (Trino/Presto)
Remember the days when you could wait until your batch data load was done and then you could run some simple queries or build stale dashboards? Those days are over, today you need instant analytics as the data is streaming in real-time. You need universal analytics where that data is. I will show you how to do this utilizing the latest cloud native open source tools. In this talk we will utilize Trino, Apache Pulsar, Pulsar SQL and Apache Flink to analyze instantly data from IoT, sensors, transportation systems, Logs, REST endpoints, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach how to use Pulsar SQL to run analytics on live data.
Tim Spann
Developer Advocate
StreamNative
David Kjerrumgaard
Developer Advocate
StreamNative
https://www.starburst.io/info/trinosummit/
https://github.com/tspannhw/FLiP-Into-Trino/blob/main/README.md
https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/tree/main/src/main/java
select * from pulsar."public/default"."weather";
Apache Pulsar plus Trio = fast analytics at scale
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
Scenic city summit real-time streaming in any and all clouds, hybrid and beyond
24-September-2021. Scenic City Summit. Virtual. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Apache Pulsar, Apache NiFi, Apache Flink
StreamNative
Tim Spann
https://sceniccitysummit.com/
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
Biography
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Talk
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.
References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK
FLiP Stack
StreamNative
Codeless pipelines with pulsar and flinkTimothy Spann
This document summarizes Tim Spann's presentation on codeless pipelines with Apache Pulsar and Apache Flink. The presentation discusses how StreamNative's platform uses Pulsar and Flink to enable end-to-end streaming data pipelines without code. It provides an overview of Pulsar's capabilities for messaging, stream processing, and integration with other Apache projects like Kafka, NiFi and Flink. Examples are given of ingesting IoT data into Pulsar and running real-time analytics on the data using Flink SQL.
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Kafka, and Flink
Timothy Spann
Twitter - @PaasDev // Blog: www.datainmotion.dev
Frequent speaker at major conferences and events.
Principal DataFlow Field Engineer for streaming around Apache NiFi, NiFi Registry, MiNiFi, Kafka, Kafka Connect, Kafka Streams, Flink, Flink SQL, SMM, SRM, SR and EFM.
Previously at E&Y, HPE, Pivotal & Hortonworks
Question #1
What is the most difficult part of an Edge Flow?
Gateway Agent
Edge Data Collection
Processing Data
https://github.com/tspannhw/DemoJam2021
https://github.com/tspannhw/CloudDemo2021
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
This document provides an overview of using Apache Pulsar for Python development. It discusses Python producers, consumers, and schemas. It also covers connecting Pulsar to other technologies like MQTT, web sockets, and Kafka via Python. Pulsar Functions in Python are demonstrated. Examples of using Python with Pulsar on Raspberry Pi are provided. The document is presented by Tim Spann, a developer advocate at StreamNative, and includes information on his background and StreamNative's training resources.
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Timothy Spann
PASS Data Community Summit
2021
Apache NiFi, Apache Flink, Apache Pulsar
FLiP Stack
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache Pulsar
https://passdatacommunitysummit.com/
Axway presented on its API Management Plus solution. The presentation covered Axway's vision of digital transformation and customer experience networks. It then demonstrated API Management Plus's full lifecycle API management capabilities. This includes API creation, governance, consumption, and measurement. The solution aims to streamline digital innovation and increase ecosystem engagement.
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Timothy Spann
This document announces the Pulsar Virtual Summit Europe 2021 and provides information about StreamNative, Apache Pulsar, Apache Flink, Apache NiFi, and the FLiP(N) stack. It promotes the unified batch and stream processing capabilities of Apache Flink powered by Apache Pulsar. Additionally, it highlights features of Apache NiFi and advertises an upcoming demo of using NVIDIA Jetson devices with Pulsar. Contact information and links to relevant GitHub repositories and blogs are provided for further resources.
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
27-April-2021. Developer Week Europe. OPEN Stage A. 11:00
Tspann cracking the nut, solving edge ai with apache tools and frameworks
Using Apache Flink, Apache Airflow, Apache Arrow, Apache NiFi, Apache Kafka, Apache MXNet, DJL.AI, Apache Tika, Apache OpenNLP, Apache Kudu, Apache Impala, Apache HBase and more open source tools for edge AI.
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
by Timothy Spann
Wednesday 17:10 UTC - Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Wednesday 17:10 UTC
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache FLiP Stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Pulsar for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in various Apache datastores. Event-Driven Microservices in Apache Pulsar Functions.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet
Matt Franklin - Apache Software (Geekfest)W2O Group
The document discusses the potential benefits of container technologies like Docker. It notes that containers offer significantly higher density than virtual machines by avoiding hypervisor overhead. This density improvement can lead to major cost reductions by reducing infrastructure needs. Containers also improve developer efficiency by making development environments portable and disposable. This allows more rapid experimentation and innovation, potentially translating to increased revenue. Technologies like Amazon Lambda take the on-demand aspects of containers even further by abstracting compute resources. The document promotes StackEngine as a solution for managing containers at scale in production environments.
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...StreamNative
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References
https://www.datainmotion.dev/2019/08/...
https://www.datainmotion.dev/2019/09/...
https://www.datainmotion.dev/2019/05/...
https://www.datainmotion.dev/2019/03/...
Get the presentation slides: https://www.slideshare.net/streamnati...
Subscribe to the StreamNative Newsletter for Apache Pulsar for more Pulsar content: https://share.hsforms.com/1IS56E-RvSV...
Get started with the on-demand Pulsar training by StreamNative Academy: https://www.academy.streamnative.io/
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
Data insights and data-driven strategies create the competitive differentiators companies thrive off today. The need for unified messaging and streaming has never been more apparent.
Pulsar started with the goal of building a global, geo-replicated infrastructure to serve Yahoo!’s messaging needs. With the increased need to process both business events (such as payment request, billing request) and operational events (such as log data, click events, etc), the team at Yahoo! set out to build a true unified infrastructure platform to handle all in-motion data. That technology became Apache Pulsar.
In this talk, Matteo Merli and Sijie Guo will dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like.
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022
As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.
I will walk through how to get started, some use cases and demos and answer questions.
https://www.devfest-uki.com/schedule
https://linktr.ee/tspannhw
FLiP Into Trino
FLiP into Trino. Flink Pulsar Trino
Pulsar SQL (Trino/Presto)
Remember the days when you could wait until your batch data load was done and then you could run some simple queries or build stale dashboards? Those days are over, today you need instant analytics as the data is streaming in real-time. You need universal analytics where that data is. I will show you how to do this utilizing the latest cloud native open source tools. In this talk we will utilize Trino, Apache Pulsar, Pulsar SQL and Apache Flink to analyze instantly data from IoT, sensors, transportation systems, Logs, REST endpoints, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach how to use Pulsar SQL to run analytics on live data.
Tim Spann
Developer Advocate
StreamNative
David Kjerrumgaard
Developer Advocate
StreamNative
https://www.starburst.io/info/trinosummit/
https://github.com/tspannhw/FLiP-Into-Trino/blob/main/README.md
https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/tree/main/src/main/java
select * from pulsar."public/default"."weather";
Apache Pulsar plus Trio = fast analytics at scale
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
Scenic city summit real-time streaming in any and all clouds, hybrid and beyond
24-September-2021. Scenic City Summit. Virtual. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Apache Pulsar, Apache NiFi, Apache Flink
StreamNative
Tim Spann
https://sceniccitysummit.com/
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
Biography
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Talk
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.
References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK
FLiP Stack
StreamNative
Codeless pipelines with pulsar and flinkTimothy Spann
This document summarizes Tim Spann's presentation on codeless pipelines with Apache Pulsar and Apache Flink. The presentation discusses how StreamNative's platform uses Pulsar and Flink to enable end-to-end streaming data pipelines without code. It provides an overview of Pulsar's capabilities for messaging, stream processing, and integration with other Apache projects like Kafka, NiFi and Flink. Examples are given of ingesting IoT data into Pulsar and running real-time analytics on the data using Flink SQL.
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Kafka, and Flink
Timothy Spann
Twitter - @PaasDev // Blog: www.datainmotion.dev
Frequent speaker at major conferences and events.
Principal DataFlow Field Engineer for streaming around Apache NiFi, NiFi Registry, MiNiFi, Kafka, Kafka Connect, Kafka Streams, Flink, Flink SQL, SMM, SRM, SR and EFM.
Previously at E&Y, HPE, Pivotal & Hortonworks
Question #1
What is the most difficult part of an Edge Flow?
Gateway Agent
Edge Data Collection
Processing Data
https://github.com/tspannhw/DemoJam2021
https://github.com/tspannhw/CloudDemo2021
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
This document provides an overview of using Apache Pulsar for Python development. It discusses Python producers, consumers, and schemas. It also covers connecting Pulsar to other technologies like MQTT, web sockets, and Kafka via Python. Pulsar Functions in Python are demonstrated. Examples of using Python with Pulsar on Raspberry Pi are provided. The document is presented by Tim Spann, a developer advocate at StreamNative, and includes information on his background and StreamNative's training resources.
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Timothy Spann
PASS Data Community Summit
2021
Apache NiFi, Apache Flink, Apache Pulsar
FLiP Stack
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache Pulsar
https://passdatacommunitysummit.com/
Axway presented on its API Management Plus solution. The presentation covered Axway's vision of digital transformation and customer experience networks. It then demonstrated API Management Plus's full lifecycle API management capabilities. This includes API creation, governance, consumption, and measurement. The solution aims to streamline digital innovation and increase ecosystem engagement.
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Timothy Spann
This document announces the Pulsar Virtual Summit Europe 2021 and provides information about StreamNative, Apache Pulsar, Apache Flink, Apache NiFi, and the FLiP(N) stack. It promotes the unified batch and stream processing capabilities of Apache Flink powered by Apache Pulsar. Additionally, it highlights features of Apache NiFi and advertises an upcoming demo of using NVIDIA Jetson devices with Pulsar. Contact information and links to relevant GitHub repositories and blogs are provided for further resources.
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/283837865/
Learn how to use Apache Pulsar and Apache NiFi to Stream to your Data Lake
Discover how to stream data to and from your data lake or data mart using Apache Pulsar™ and Apache NiFi®. Learn how these cloud-native, scalable open-source projects built for streaming data pipelines work together to enable you to quickly build applications with minimal coding.
|WHAT THE SESSION WILL COVER|
Best Practices for using Pulsar and NiFi
A deep dive on Apache NiFi's Pulsar connector and demos
Building an End-to-End Application in the Hybrid Cloud
Attend for a chance to win a We <3 Pulsar t-shirt! The first 50 registrants who register through here [https://hubs.ly/Q013LTpn0] will be entered in a drawing!
—------------------------
|AGENDA|
6:00 - 7:00 PM EST: Presentation - Tim Spann, StreamNative Developer Advocate
7:00 - 8:00 PM EST: Presentation - John Kuchmek, Cloudera Principal Solutions Engineer
8:00 - 8:30 PM EST: Q&A + Networking
—------------------------
|ABOUT THE SPEAKERS|
John Kuchmek is a Principal Solutions Engineer for Cloudera. Before joining Cloudera, John transitioned to the Autonomous Intelligence team where he was in charge of integrating the platforms to allow data scientists to work with various types of data.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. He is currently working on a book about the FLiP Stack.
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
27-April-2021. Developer Week Europe. OPEN Stage A. 11:00
Tspann cracking the nut, solving edge ai with apache tools and frameworks
Using Apache Flink, Apache Airflow, Apache Arrow, Apache NiFi, Apache Kafka, Apache MXNet, DJL.AI, Apache Tika, Apache OpenNLP, Apache Kudu, Apache Impala, Apache HBase and more open source tools for edge AI.
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
by Timothy Spann
Wednesday 17:10 UTC - Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Wednesday 17:10 UTC
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache FLiP Stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Pulsar for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in various Apache datastores. Event-Driven Microservices in Apache Pulsar Functions.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet
Matt Franklin - Apache Software (Geekfest)W2O Group
The document discusses the potential benefits of container technologies like Docker. It notes that containers offer significantly higher density than virtual machines by avoiding hypervisor overhead. This density improvement can lead to major cost reductions by reducing infrastructure needs. Containers also improve developer efficiency by making development environments portable and disposable. This allows more rapid experimentation and innovation, potentially translating to increased revenue. Technologies like Amazon Lambda take the on-demand aspects of containers even further by abstracting compute resources. The document promotes StackEngine as a solution for managing containers at scale in production environments.
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...StreamNative
Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.
FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
Tools
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI
References
https://www.datainmotion.dev/2019/08/...
https://www.datainmotion.dev/2019/09/...
https://www.datainmotion.dev/2019/05/...
https://www.datainmotion.dev/2019/03/...
Get the presentation slides: https://www.slideshare.net/streamnati...
Subscribe to the StreamNative Newsletter for Apache Pulsar for more Pulsar content: https://share.hsforms.com/1IS56E-RvSV...
Get started with the on-demand Pulsar training by StreamNative Academy: https://www.academy.streamnative.io/
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
Data insights and data-driven strategies create the competitive differentiators companies thrive off today. The need for unified messaging and streaming has never been more apparent.
Pulsar started with the goal of building a global, geo-replicated infrastructure to serve Yahoo!’s messaging needs. With the increased need to process both business events (such as payment request, billing request) and operational events (such as log data, click events, etc), the team at Yahoo! set out to build a true unified infrastructure platform to handle all in-motion data. That technology became Apache Pulsar.
In this talk, Matteo Merli and Sijie Guo will dive into the landscape of unified messaging and streaming, how Pulsar helps companies achieve this vision, and what the future of Pulsar will look like.
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022
As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.
I will walk through how to get started, some use cases and demos and answer questions.
https://www.devfest-uki.com/schedule
https://linktr.ee/tspannhw
Building an Event Streaming Architecture with Apache PulsarScyllaDB
What is Apache Pulsar? How does it differ from other event streaming technologies available? StreamNative Developer Advocate Tim Spann will walk you through the features and architecture of this increasingly popular event streaming system, along with best practices for streaming and storing your data.
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
This document discusses using Apache Pulsar with MQTT for edge computing. It provides an overview of Apache Pulsar and how it enables message queuing and data streaming with features like pub-sub, geo-replication, and multi-protocol support including MQTT. It also discusses edge computing characteristics and challenges, and how running Apache Pulsar on edge devices can address these by extending data processing to the edge and integrating with sensors using the MQTT protocol. Examples are provided of ingesting IoT data into Pulsar from Python and using NVIDIA Jetson devices with Pulsar.
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Timothy Spann
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
Using FLiP with InfluxDB for EdgeAI IoT at Scale
Date: Thursday, February 10th at 11am PT / 2pm ET
Join this webcast as Timothy from StreamNative takes you on a hands-on deep-dive using Pulsar, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano.
The team runs deep learning models on the edge devices, sends images, and captures real-time GPS and sensor data. Their low-coding IoT applications provide easy edge routing, transformation, data acquisition and alerting before they decide what data to stream in real-time to their data space. These edge applications classify images and sensor readings in real-time at the edge and then send Deep Learning results to Flink SQL and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to InfluxDB.
In this session you will learn how to:
Build an end-to-end streaming edge app
Pull messages from Pulsar topics and persists the messages to InfluxDB
Build a data stream for IoT with NiFi and InfluxDB
Use Apache Flink + Apache Pulsar
Timothy Spann, Developer Advocate, StreamNative
Tim Spann is a Developer Advocate at StreamNative where he works with Apache NiFi, MiniFi, Kafka, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale
Tim Spann
Developer Advocate
StreamNative
datainmotion.dev
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!?!??
My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
https://www.datainmotion.dev/2022/04/flip-py-pi-enviroplus-using-apache.html
https://dzone.com/articles/pulsar-in-python-on-pi
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
(Current22) Let's Monitor The Conditions at the Conference
Let's Monitor The Conditions at the Conference
Session Time11:15 am - 12:00 pm Session DateWednesday, 5 October 2022 Session Type:In-Person Location:Ballroom G
Session Description:
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk. Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!? My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
Timothy Spann, StreamNative
Developer Advocate
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC.
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
This document contains the agenda for a Pulsar Summit keynote on event-driven applications. The keynote will feature talks from Sijie Guo, Co-Founder and CEO of StreamNative, and Matteo Merli, CTO of StreamNative. Guo will discuss the growth of the Pulsar community and platform. Merli will cover the evolution of event-driven applications and the five fundamentals of modern event-driven architecture: data abstraction, API, primitives, processing semantics, and tools. The keynote aims to explain how Pulsar solves challenges in building complex event-driven applications.
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
[AI Dev World 2022] Build ML Enhanced Event Streaming
3:00 PM - 3:25 PM PDT
[#VIRTUAL] OPEN TALK (AI): BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH JAVA MICROSERVICES
VIRTUAL ONLY API WORLD & AI DEVWORLD STAGE
VIRTUAL ONLY VIRTUAL APPLIED MACHINE LEARNING
Timothy Spann
Timothy Spann
StreamNative, Developer Advocate
In this talk we will walk through how to build event streaming applications as functions running in with cloud native messaging via Apache Pulsar that run on near infinite scale in any cloud, docker or K8. We will show you have to deploy ML functions to transform real-time data for IoT, Streaming Analytics and many other use cases. After this talk you will be able to build Java microservices with ease and deploy them anywhere utilizing the open source unified streaming and messaging platform, Apache Pulsar. Finally, we will show you have to add dashboards with Web Sockets, no code data sinks, integrate with Apache NiFi data pipelines, SQL Reports with Apache Spark and finally continuous ETL with Apache Flink. I have built many of these applications for many organizations as part of the FLiPN Stack. Let's build next generation applications today regardless if your data is REST APIs, Sensors, Logs, NoSQL Sources, Events or Database tables.
https://aidevworld.com/conference/conference-tracks/applied-machine-learning/
https://github.com/tspannhw/pulsar-transit-function
https://github.com/tspannhw/FLiPStackWeekly/blob/main/README.md
This document provides an introduction to Apache Flink and its streaming capabilities. It discusses stream processing as an abstraction and how Flink uses streams as a core abstraction. It compares Flink streaming to Spark streaming and outlines some of Flink's streaming features like windows, triggers, and how it handles event time.
ITPC Building Modern Data Streaming AppsTimothy Spann
ITPC Building Modern Data Streaming Apps
https://princetonacm.acm.org/tcfpro/
17th Annual IEEE IT Professional Conference (ITPC)
Armstrong Hall at The College of New Jersey
Friday, March 17th, 2023 at 8:30 AM to 5:00 PM
TCF Photo
In continuous operation since 1976, the Trenton Computer Festival (TCF) is the nation's longest running personal computer. For the seventeenth year, the TCF is extending its program to provide Information Technology and computer professionals with an additional day of conference. It is intended, in an economical way, to provide attendees with insight and information pertinent to their jobs, and to keep them informed of emerging technologies that could impact their work.
The IT Professional Conference is co-sponsored by the Institute of Electrical and Electronics Engineers (IEEE) Computer Society Chapter of Princeton / Central Jersey.
11:00am Building Modern Data Streaming Apps
presented by
Timothy Spann
Building Modern Data Streaming Apps
In this session, I will show you some best practices I have discovered over the last seven years in building data streaming applications including IoT, CDC, Logs, and more.
In my modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there we build streaming ETL with Spark, enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL.
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022Timothy Spann
This document discusses using Apache Pulsar with MQTT for edge computing. It provides an overview of Pulsar's capabilities as a unified messaging platform, including guaranteed message delivery, resiliency, and scalability. It then describes how Pulsar supports the MQTT protocol (MoP) for ingesting IoT data from devices. Examples are given of using Python and Java to publish sensor readings to Pulsar topics from the edge via MQTT. Finally, it mentions ways to use NVIDIA Jetson devices with Pulsar for edge AI workloads.
Scylla Summit 2022: FLiP Into Apache Pulsar Apps with ScyllaDBScyllaDB
In this session, Timothy Spann, Developer Advocate at StreamNative will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming applications with a variety of OSS libraries, schemas, languages, frameworks, and tools against ScyllaDB. He will show you all the options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol, and more. You will FLiP your lid on how much you learn in a short time. He will share instructions on the few steps you need to get an environment ready to start building awesome apps. You will also learn how to quickly deploy an app to a production cloud cluster with StreamNative.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
StreamNative FLiP into scylladb - scylla summit 2022Timothy Spann
StreamNative FLiP into scylladb - scylla summit 2022
Utilizing Apache Pulsar with Apache NiFi, Apache Flink, Apache Spark and Scylla for fast IoT application with MQTT and beyond.
What We Learned From Building a Modern Messaging and Streaming System for CloudStreamNative
Sijie Guo discusses lessons learned from building Apache Pulsar, a modern messaging and streaming system. Key points include:
- Pulsar was designed for the cloud-native era, separating compute and storage for scalability unlike earlier systems designed for on-premise use.
- It supports unified messaging and streaming with a single API and multi-protocol support like Kafka, AMQP, and MQTT.
- Features like multi-tenancy, geo-replication, and infinite stream storage help support use cases like multi-cloud, hybrid cloud, and managing entire data lifecycles.
- The talk reflects on how Pulsar's architecture was influenced by trends in cloud computing,
MRA AMA Part 10: Kubernetes and the Microservices Reference ArchitectureNGINX, Inc.
On Demand Link - https://www.nginx.com/resources/webinars/mra-ama-part-10-kubernetes-and-the-microservices-reference-architecture/
The NGINX Microservices Reference Architecture (MRA) has been a major contributor to the discussion of microservices architectures. Kubernetes has now emerged as the leading container orchestration platform, and NGINX has developed the NGINX Kubernetes Ingress controller.
In this webinar, we describe and demonstrate how to use NGINX Open Source and NGINX Plus with Kubernetes and the NGINX Kubernetes Ingress controller. We relate the use of NGINX tools and Kubernetes to the MRA’s Proxy Model, Router Mesh Model, and the Fabric Model. We also briefly compare these to full-service mesh implementations such as Istio.
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamTimothy Spann
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289817171/
We are excited to invite you to an in-person meetup, that is all about streaming data!
If you are interested in Apache Pinot, Apache Pulsar, Apache Flink, and Apache NiFi, this event is for you!
AGENDA
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
7:15 - 8:00 PM EST: Introduction to Real-Time Analytics with Apache Pinot: David G. Simmons @ StarTree
7:15 - 8:00 PM EST: Building Real-Time Requires a Team: Tim Spann, Developer Advocate @ StreamNative
8:00 - 8:30 PM EST: Round Table
8:30 - 9:00 PM EST: Q&A + Networking
----
“Building Real-Time Requires a Team”- Tim Spann, Developer Advocate @ StreamNative
This talk will discuss building real-time streaming applications utilizing the best open-source systems. This FLiPPN Stack consisting of Apache Flink, Apache Pulsar, Apache Pinot, and Apache NiFi supercharges any real-time app or pipeline building. I will walk you through the what, why, how, and where to use these amazing tools. We will walk through some demos and dive into the source. At the end, we will have driven real-time events into Apache Pinot, setting the stage for your real-time user-facing insights.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal, and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit, and many more.
Pronouns: He, They
https://github.com/tspannhw/SpeakerProfile
“Introduction to Real-Time Analytics with Apache Pinot" - David G. Simmons, Head of Developer Advocacy @ StarTree
https://notist.davidgs.com/ https://notist.davidgs.com/
We will simulcast: https://streamnative.zoom.us/meeting/register/tZcpf-iurTojHtTJjwhi87e_iKJJYhrONpAG
Apache Kafka is a distributed streaming platform. It provides a high-throughput distributed messaging system that can handle trillions of events daily. Many large companies use Kafka for application logging, metrics collection, and powering real-time analytics. The current version is 0.8.2 and upcoming versions will include a new consumer, security features, and support for transactions.
Similar to Automation + dev ops summit hail hydrate! from stream to lake (20)
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Building Real-Time Pipelines With FLaNK
Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Apache NiFi
Apache Kafka
Apache Flink
Apache Iceberg
LLM
Generative AI
Slack
Postgresql
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Gen AI on Enterprise Cloud
Apache NiFi
Milvus
Apache Kafka
Apache Flink
Cloudera Machine Learning
Cloudera DataFlow
https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa
https://www.meetup.com/futureofdata-princeton/events/300737266/
https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY
If you're interested in working with Generative AI on the cloud, this virtual workshop is for you.
Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale.
9:00 - 9:05: Intro
9:05 - 9:15: What is Milvus
9:15 - 9:25: Cloudera Development Platform
9:25 - 10:00: Demo
Location
https://www.youtube.com/watch?v=IfWIzKsoHnA
https://github.com/tspannhw/SpeakerProfile
https://www.linkedin.com/in/yujiantang/
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
https://www.youtube.com/watch?v=Yeua8NlzQ3Y
https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming
Adding Generative AI to Real-Time Streaming Pipelines
Abstract
Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results.
Summary
Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there.
Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive.
Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into.
So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking.
Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
https://xtremej.dev/2023/schedule/
Building Real-time Pipelines with FLaNK: A Case Study with Transit Data
Overview of the problem, the application (code walkthru and running), overview of FLaNK, introduction to NiFi, introduction to Kafka, and introduction to Flink.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
https://princetonacm.acm.org/tcfpro/
18th Annual IEEE IT Professional Conference (ITPC)
Armstrong Hall at The College of New Jersey
Friday, March 15th, 2024 | 10:00 AM to 5:00 PM
IT Professional Conference at Trenton Computer Festival
IEEE Information Technology Professional Conference on Friday, March 15th, 2024
TCFPro24 Building Real-Time Generative AI Pipelines
Building Real-Time Generative AI Pipelines
In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms.
From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment.
Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time.
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines
https://www.meetup.com/futureofdata-newyork/events/298660453/
Unlocking Financial Data with Real-Time Pipelines
(Flink Analytics on Stocks with SQL )
By Timothy Spann
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases.
Timothy Spann
Principal Developer Advocate, Cloudera
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
https://twitter.com/PaaSDev
https://www.linkedin.com/in/timothyspann/
https://medium.com/@tspann
https://github.com/tspannhw/FLiPStackWeekly/
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM
Abstract
In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data.
Karin Wolok
Developer Relations, Dev Marketing, and Community Programming @ Project Elevate
Karin Wolok's LinkedIn account Karin Wolok's twitter account
Tim Spann
Principal Developer Advocate @ Cloudera
Tim Spann's LinkedIn account Tim Spann's twitter account
https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Unlocking Financial Data with Real-Time Pipelines
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
2. #AutomationSummit
Tim Spann
Developer Advocate
● https://www.datainmotion.dev/
● https://github.com/tspannhw/SpeakerProfile
● https://dev.to/tspannhw
● https://sessionize.com/tspann/
DZone Zone Leader and Big Data
MVB Data DJay
3. #AutomationSummit
Founded by the original developers of
Apache Pulsar and Apache BookKeeper,
StreamNative builds a cloud-native event
streaming platform that enables
enterprises to easily access data as
real-time event streams.
10. #AutomationSummit
Key Milestones
2012 2016 2017 2018 2019 2020
Originally developed
inside Yahoo! as “Cloud
Messaging Service”
Pulsar is
committed to
Open Source
Pulsar is accepted into
the Apache Software
Foundation
Pulsar
becomes a
Top-Level
Project
● StreamNative is founded and
seed round raised.
● Tencent adopts Pulsar for
payment processing platform.
● BestPay adopts Pulsar for
payment processing.
● Pulsar hits 200 contributors.
● 2 global Pulsar conferences, 80+ speakers, 1,500+ attendees
● Pulsar hits 340 contributors
● StreamNative and OVHCloud launch Kafka on Pulsar (KoP)
● StreamNative + China Mobile launch AMQP on Pulsar (AoP)
● Pulsar Ecosystem expands - StreamNative Hub launches
● StreamNative Cloud launches on GCP and Alibaba Cloud
● StreamNative customer adoption continues - new
customers include Flipkart and Applied Materials
● Pulsar 2.7 + Transactions
● Pulsar Flink Connector 2.7
Major increase in adoption following
TLP designation in 2018
2021
● 3 global Pulsar conferences
● StreamNative hits 400
contributors (June).
● Pulsar surpasses Kafka in
monthly active contributors.
● Pulsar 2.8 + Exactly-Once
semantics
● StreamNative Platform launches
11. #AutomationSummit
Apache Pulsar Overview
Enable Geo-Replicated Messaging
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
12. #AutomationSummit
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that
producers use to transmit messages to
subscribed consumers.
● Messages belong to a topic and contain an
arbitrary payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration rules
that determine how messages are delivered to
consumers.
● Consumers receive messages.
13. #AutomationSummit
What is the Pulsar Ecosystem?
● Functions and Connectors
○ Functions: Lightweight stream processing
○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink” APIs
■ Files, Databases, Data tools, Cloud Services, etc
● Protocol Handlers
○ Allows Pulsar to handle additional protocols by an extendable API running in
the broker
■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
15. #AutomationSummit
Pulsar subscription modes
Different subscription modes have
different semantics:
Exclusive/Failover - guaranteed order,
single active consumer
Shared - multiple active consumers, no
order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer
D-2
Key-Shared
<
K
1
,V
1
0
>
<
K
1
,V
1
1
>
<
K
1
,V
1
2
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1
>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1
,V
1
0
>
<
K
2
,V
2
1
>
<
K
1
,V
1
2
>
<
K
2
,V
2
0
>
<
K
1
,V
1
1
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
18. #AutomationSummit
Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move data
in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink
19. #AutomationSummit
● Customer 360 analytics
● Payment reconciliation
● Security analytics
● Adoptions are growing
across different business
units, such as Gaming,
Wechat, Music, etc.
● ETL
● Recommendation
● Data Analysis
● Real-time data
warehouse
https://pulsar-summit.org/
21. #AutomationSummit
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time
messaging and streaming platform to support multi-cloud and hybrid cloud
strategies.
Built for Containers
Cloud Native
StreamNative Cloud
Flink SQL
24. #AutomationSummit
Example: E-Commerce with Pulsar
● Unified storage with access
to underlying data
● Native tiered storage
● Single system to exchange
data
● Teams share toolset
28. #AutomationSummit
Connect with the Community & Stay Up-To-Date
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Subscribe to Monthly Pulsar Newsletter for major news, events, project updates, and resources in the
Pulsar community
30. #AutomationSummit
Interested In Learning More?
Flink SQL Cookbook
The Github Source for Flink
SQL Demo
The GitHub Source for Demo
Manning's Apache Pulsar in
Action
O’Reilly Book
[11/18] Developer Week Austin
[11/19] Porto Tech Hub Con
[12/3] Data Science Camp
Resources Free eBooks Upcoming Events