SlideShare a Scribd company logo
1 of 15
Download to read offline
© 2023 Cloudera, Inc. All rights reserved.
Data-in-Motion to
Supercharge AI
Tim Spann
Principal Developer Advocate
23-August-2023
© 2023 Cloudera, Inc. All rights reserved. 2
Tim Spann
@PaasDev www.datainmotion.dev
github.com/tspannhw medium.com/@tspann
Principal Developer Advocate
Princeton Future of Data Meetup
ex-Pivotal, ex-Hortonworks, ex-StreamNative,
ex-PwC, ex-EY, ex-HPE.
Apache NiFi x Apache Kafka x Apache Flink x AI
© 2023 Cloudera, Inc. All rights reserved. 3
REAL-TIME REQUIRES A PLATFORM
SQL
Stream
Builder
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 4
Cloudera + LLMs
Knowledge Repository
Data Storage / Management
Data Preparation
Data Engineering
LLM Fine Tuning Process
Training Framework
LLM Serving
Serving Framework
Key:
CPU Task
GPU Task
CML
CDE
CDP
Vector DB
CDF
Streaming Classification
Real-Time Model Deployment
© 2023 Cloudera, Inc. All rights reserved.
Run collection and streaming on any cloud, server, container, bare metal, device or VM
Data Sources Cloudera Data
Flow
Cloudera
Streaming
Analytics
Cloudera
Streams
Processing
Kafka
Lake House
INGEST
© 2023 Cloudera, Inc. All rights reserved.
ENRICH
© 2023 Cloudera, Inc. All rights reserved.
FUNNEL
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
© 2023 Cloudera, Inc. All rights reserved.
DISTRIBUTE
DEPLOY
https://github.com/tspannhw/FLaNK-Edge-Models
© 2023 Cloudera, Inc. All rights reserved.
STORE
© 2019 Cloudera, Inc. All rights reserved. 10
APACHE NIFI WITH PYTHON CUSTOM PROCESSORS
Python as a First Class Citizen
https://github.com/apache/nifi/blob/614947e4ac6798ad80817e82514c39349d5faacb/nifi-docs/src/main/
asciidoc/python-developer-guide.adoc
11
© 2023 Cloudera, Inc. All rights reserved.
Future of Data - Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python, Java,
AI, ML, LLM and Open Source friends.
https://bit.ly/32dAJft
© 2023 Cloudera, Inc. All rights reserved. 13
CSP
Community
Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv
vgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
© 2023 Cloudera, Inc. All rights reserved. 15
Resources
threads.net/@tspannhw

More Related Content

What's hot

Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep diveWinton Winton
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container PlatformDLT Solutions
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Azure Security Fundamentals
Azure Security FundamentalsAzure Security Fundamentals
Azure Security FundamentalsLorenzo Barbieri
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Google Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonGoogle Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonKrishna-Kumar
 
Container Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondContainer Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondKubeAcademy
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfpbonillo1
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus TrainingTim Tyler
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeGuido Schmutz
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack ArchitectureMirantis
 

What's hot (20)

Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container Platform
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Azure Security Fundamentals
Azure Security FundamentalsAzure Security Fundamentals
Azure Security Fundamentals
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Azure Reference Architectures
Azure Reference ArchitecturesAzure Reference Architectures
Azure Reference Architectures
 
Azure Backup Simplifies
Azure Backup SimplifiesAzure Backup Simplifies
Azure Backup Simplifies
 
Google Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :ComparisonGoogle Anthos - Azure Stack - AWS Outposts :Comparison
Google Anthos - Azure Stack - AWS Outposts :Comparison
 
Container Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyondContainer Network Interface: Network Plugins for Kubernetes and beyond
Container Network Interface: Network Plugins for Kubernetes and beyond
 
Nutanix
NutanixNutanix
Nutanix
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus Training
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
OpenStack Architecture
OpenStack ArchitectureOpenStack Architecture
OpenStack Architecture
 

Similar to AIDEVDAY_ Data-in-Motion to Supercharge AI

26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19ssuser73434e
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
 

Similar to AIDEVDAY_ Data-in-Motion to Supercharge AI (20)

26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 

More from Timothy Spann

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
 

More from Timothy Spann (16)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 

Recently uploaded

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Recently uploaded (20)

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

AIDEVDAY_ Data-in-Motion to Supercharge AI

  • 1. © 2023 Cloudera, Inc. All rights reserved. Data-in-Motion to Supercharge AI Tim Spann Principal Developer Advocate 23-August-2023
  • 2. © 2023 Cloudera, Inc. All rights reserved. 2 Tim Spann @PaasDev www.datainmotion.dev github.com/tspannhw medium.com/@tspann Principal Developer Advocate Princeton Future of Data Meetup ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC, ex-EY, ex-HPE. Apache NiFi x Apache Kafka x Apache Flink x AI
  • 3. © 2023 Cloudera, Inc. All rights reserved. 3 REAL-TIME REQUIRES A PLATFORM SQL Stream Builder
  • 4. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 4 Cloudera + LLMs Knowledge Repository Data Storage / Management Data Preparation Data Engineering LLM Fine Tuning Process Training Framework LLM Serving Serving Framework Key: CPU Task GPU Task CML CDE CDP Vector DB CDF Streaming Classification Real-Time Model Deployment
  • 5. © 2023 Cloudera, Inc. All rights reserved. Run collection and streaming on any cloud, server, container, bare metal, device or VM Data Sources Cloudera Data Flow Cloudera Streaming Analytics Cloudera Streams Processing Kafka Lake House INGEST
  • 6. © 2023 Cloudera, Inc. All rights reserved. ENRICH
  • 7. © 2023 Cloudera, Inc. All rights reserved. FUNNEL https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis https://github.com/tspannhw/FLaNK-LLM
  • 8. © 2023 Cloudera, Inc. All rights reserved. DISTRIBUTE DEPLOY https://github.com/tspannhw/FLaNK-Edge-Models
  • 9. © 2023 Cloudera, Inc. All rights reserved. STORE
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 APACHE NIFI WITH PYTHON CUSTOM PROCESSORS Python as a First Class Citizen https://github.com/apache/nifi/blob/614947e4ac6798ad80817e82514c39349d5faacb/nifi-docs/src/main/ asciidoc/python-developer-guide.adoc
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. Future of Data - Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 12. FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java, AI, ML, LLM and Open Source friends. https://bit.ly/32dAJft
  • 13. © 2023 Cloudera, Inc. All rights reserved. 13 CSP Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 14. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv vgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 15. © 2023 Cloudera, Inc. All rights reserved. 15 Resources threads.net/@tspannhw