Meetup - Brasil - Data In Motion - 2023 September 19

Timothy Spann
Timothy SpannDeveloper Advocate
Data in Motion:
Overview e Novidades do
NiFi, Kafka e Flink
Tim Spann - Principal Developer Advocate
Data In Motion
Meetup - Brasil - Data In Motion - 2023 September 19
3
© 2023 Cloudera, Inc. All rights reserved.
TODAY’S LEAD
Who am I?
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton and NYC Future of Data Meetups
ex-Pivotal Field Engineer ex-StreamNative ex-PwC
https://github.com/tspannhw https://twitter.com/PaaSDev
https://www.datainmotion.dev/
https://medium.com/@tspann
Principal Data-in-Motion Developer Advocate
4
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
Intro to NiFi
Intro to Kafka
Intro to Flink
Together as FLaNK
Demos
Q&A
© 2023 Cloudera, Inc. All rights reserved. 5
REAL-TIME REQUIRES A PLATFORM
SQL
Stream
Builder
© 2023 Cloudera, Inc. All rights reserved. 6
REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time
INGEST PREPARE PUBLISH
DATA SOURCES
Internal Users
(After Sales)
External
Systems
ENTERPRISE
LAKEHOUSE
CAPABILITY VIEW
INGESTION
MESSAGE HUB
STORAGE
BATCH
MANAGEMENT
STREAM
CONSUMPTION
Closed Loop
Systems
SQL Stream Builder
Machine Learning
Data Visualization
Workload Manager
watsonx.data
Cloudera DataFlow - Apache NiFi
© 2019 Cloudera, Inc. All rights reserved. 8
CLOUDERA DATAFLOW - POWERED BY APACHE NiFi
Ingest and manage data from edge-to-cloud using a no-code interface
● #1 data ingestion/movement engine
● Strong community
● Product maturity over 11 years
● Deploy on-premises or in the cloud
● Over 400+ pre-built processors
● Built-in data provenance
● Guaranteed delivery
● Throttling and Back pressure
© 2023 Cloudera, Inc. All rights reserved. 9
PROVENANCE
10
© 2023 Cloudera, Inc. All rights reserved.
RECORD-ORIENTED DATA WITH NIFI
• Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
• Record Writers - Avro, CSV, FreeFromText, Json, Parquet,
Scripted, XML
• Record Reader and Writer support referencing a schema registry
for retrieving schemas when necessary.
• Enable processors that accept any data format without having to
worry about the parsing and serialization logic.
• Allows us to keep FlowFiles larger, each consisting of multiple
records, which results in far better performance.
11
© 2023 Cloudera, Inc. All rights reserved.
RUNNING SQL ON FLOWFILES
• Evaluates one or more SQL queries against the contents of a
FlowFile.
• This can be used, for example, for field-specific filtering,
transformation, and row-level filtering.
• Columns can be renamed, simple calculations and aggregations
performed.
• The SQL statement must be valid ANSI SQL and is powered by
Apache Calcite.
12
© 2023 Cloudera, Inc. All rights reserved.
READYFLOW
GALLERY
• Cloudera provided flow
definitions
• Cover most common data flow
use cases
• Optimized to work with CDP
sources/destinations
• Can be deployed and adjusted
as needed
Cloudera Streams Messaging
Manager - Apache Kafka
14
© 2023 Cloudera, Inc. All rights reserved.
STREAMS MESSAGING WITH KAFKA
• Highly reliable distributed messaging system.
• Decouple applications, enables many-to-many
patterns.
• Publish-Subscribe semantics.
• Horizontal scalability.
• Efficient implementation to operate at speed with
big data volumes.
• Organized by topic to support several use cases.
Cloudera SQL Stream Builder - Flink
SQL
16
© 2023 Cloudera, Inc. All rights reserved.
DELIVERING STREAMING ANALYTICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second)
SQL
Parsing and
Blending Data
Streaming
Analytics
Both offline and
streaming data
Data Analysts Can
Write Queries
Across the Lines of Business
Capture Events
that Matter
Low-latency analytics use
cases
Events
Processing
17
© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
18
© 2023 Cloudera, Inc. All rights reserved.
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
Demo
20
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
21
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
23
© 2023 Cloudera, Inc. All rights reserved.
Cloudera Streams
Processing -
Community Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
RESOURCES, WRAP-UP, Q&A
© 2023 Cloudera, Inc. All rights reserved. 26
Future of Data - NYC / Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to LLM to Cloud to
Analytics to NLP to Fast Data to Machine Learning to
Microservices to ...
https://medium.com/cloudera-inc/streaming-llm-with-apache-nifi-huggin
gface-ad2f0d367468
28
© 2023 Cloudera, Inc. All rights reserved.
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
© 2023 Cloudera, Inc. All rights reserved. 29
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python,
Java and Open Source friends.
https://bit.ly/32dAJft
Generative AI
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
watsonx.ai
LLM USE CASE
Vector DB
AI Model
Unstructured file types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
Meetup - Brasil - Data In Motion - 2023 September 19
33
© 2023 Cloudera, Inc. All rights reserved.
TH N Y U
1 of 33

Recommended

OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
23 views43 slides
GSJUG: Mastering Data Streaming Pipelines 09May2023 by
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
255 views80 slides
Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
109 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
100 views36 slides
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 views45 slides
ITPC Building Modern Data Streaming Apps by
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
797 views64 slides

More Related Content

Similar to Meetup - Brasil - Data In Motion - 2023 September 19

The Never Landing Stream with HTAP and Streaming by
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
254 views39 slides
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
163 views27 slides
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 by
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
54 views79 slides
Best Practices For Workflow by
Best Practices For WorkflowBest Practices For Workflow
Best Practices For WorkflowTimothy Spann
89 views86 slides
Kafka for DBAs by
Kafka for DBAsKafka for DBAs
Kafka for DBAsGwen (Chen) Shapira
12.8K views41 slides
Introduction to Apache NiFi 1.10 by
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Timothy Spann
2K views24 slides

Similar to Meetup - Brasil - Data In Motion - 2023 September 19(20)

The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 by ssuser73434e
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e54 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views
Introduction to Apache NiFi 1.10 by Timothy Spann
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
Timothy Spann2K views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann28 views
Reinventing Kafka in the Data Streaming Era - Jun Rao by confluent
Reinventing Kafka in the Data Streaming Era - Jun RaoReinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun Rao
confluent137 views
Part 2: A Visual Dive into Machine Learning and Deep Learning 
 by Cloudera, Inc.
Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.1.5K views
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends by Timothy Spann
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann986 views
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen by confluent
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
confluent403 views
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... by Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann518 views
Spark One Platform Webinar by Cloudera, Inc.
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.2.5K views
Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
Webinar | Better Together: Apache Cassandra and Apache Kafka by DataStax
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax880 views
End to End Streaming Architectures by Cloudera, Inc.
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.3.5K views
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre... by HostedbyConfluent
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent324 views
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic... by HostedbyConfluent
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
HostedbyConfluent388 views

More from Timothy Spann

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
94 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
156 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 views56 slides
big data fest building modern data streaming apps by
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
317 views55 slides

More from Timothy Spann(17)

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann94 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann156 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
DevNexus: Apache Pulsar Development 101 with Java by Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 views
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices by Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 views
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python by Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 views
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java by Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann17 views
Living the Stream Dream with Pulsar and Spring Boot by Timothy Spann
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring Boot
Timothy Spann92 views
Let's keep it simple and streaming by Timothy Spann
Let's keep it simple and streamingLet's keep it simple and streaming
Let's keep it simple and streaming
Timothy Spann19 views
Sink Your Teeth into Streaming at Any Scale by Timothy Spann
Sink Your Teeth into Streaming at Any ScaleSink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Timothy Spann12 views

Recently uploaded

Survey on Factuality in LLM's.pptx by
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptxNeethaSherra1
5 views9 slides
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx by
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxJaysonGarabilesEspej
6 views3 slides
RuleBookForTheFairDataEconomy.pptx by
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptxnoraelstela1
67 views16 slides
Understanding Hallucinations in LLMs - 2023 09 29.pptx by
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
13 views18 slides
3196 The Case of The East River by
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
11 views4 slides
Data structure and algorithm. by
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
18 views24 slides

Recently uploaded(20)

Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
RuleBookForTheFairDataEconomy.pptx by noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx by Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski13 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Data structure and algorithm. by Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 18 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views
Supercharging your Data with Azure AI Search and Azure OpenAI by Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0118 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials10 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx

Meetup - Brasil - Data In Motion - 2023 September 19

  • 1. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Tim Spann - Principal Developer Advocate Data In Motion
  • 3. 3 © 2023 Cloudera, Inc. All rights reserved. TODAY’S LEAD Who am I? @PaasDev DZone Zone Leader and Big Data MVB Princeton and NYC Future of Data Meetups ex-Pivotal Field Engineer ex-StreamNative ex-PwC https://github.com/tspannhw https://twitter.com/PaaSDev https://www.datainmotion.dev/ https://medium.com/@tspann Principal Data-in-Motion Developer Advocate
  • 4. 4 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate Intro to NiFi Intro to Kafka Intro to Flink Together as FLaNK Demos Q&A
  • 5. © 2023 Cloudera, Inc. All rights reserved. 5 REAL-TIME REQUIRES A PLATFORM SQL Stream Builder
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time INGEST PREPARE PUBLISH DATA SOURCES Internal Users (After Sales) External Systems ENTERPRISE LAKEHOUSE CAPABILITY VIEW INGESTION MESSAGE HUB STORAGE BATCH MANAGEMENT STREAM CONSUMPTION Closed Loop Systems SQL Stream Builder Machine Learning Data Visualization Workload Manager watsonx.data
  • 7. Cloudera DataFlow - Apache NiFi
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 CLOUDERA DATAFLOW - POWERED BY APACHE NiFi Ingest and manage data from edge-to-cloud using a no-code interface ● #1 data ingestion/movement engine ● Strong community ● Product maturity over 11 years ● Deploy on-premises or in the cloud ● Over 400+ pre-built processors ● Built-in data provenance ● Guaranteed delivery ● Throttling and Back pressure
  • 9. © 2023 Cloudera, Inc. All rights reserved. 9 PROVENANCE
  • 10. 10 © 2023 Cloudera, Inc. All rights reserved. RECORD-ORIENTED DATA WITH NIFI • Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet, Scripted, Syslog5424, Syslog, WindowsEvent, XML • Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML • Record Reader and Writer support referencing a schema registry for retrieving schemas when necessary. • Enable processors that accept any data format without having to worry about the parsing and serialization logic. • Allows us to keep FlowFiles larger, each consisting of multiple records, which results in far better performance.
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. RUNNING SQL ON FLOWFILES • Evaluates one or more SQL queries against the contents of a FlowFile. • This can be used, for example, for field-specific filtering, transformation, and row-level filtering. • Columns can be renamed, simple calculations and aggregations performed. • The SQL statement must be valid ANSI SQL and is powered by Apache Calcite.
  • 12. 12 © 2023 Cloudera, Inc. All rights reserved. READYFLOW GALLERY • Cloudera provided flow definitions • Cover most common data flow use cases • Optimized to work with CDP sources/destinations • Can be deployed and adjusted as needed
  • 14. 14 © 2023 Cloudera, Inc. All rights reserved. STREAMS MESSAGING WITH KAFKA • Highly reliable distributed messaging system. • Decouple applications, enables many-to-many patterns. • Publish-Subscribe semantics. • Horizontal scalability. • Efficient implementation to operate at speed with big data volumes. • Organized by topic to support several use cases.
  • 15. Cloudera SQL Stream Builder - Flink SQL
  • 16. 16 © 2023 Cloudera, Inc. All rights reserved. DELIVERING STREAMING ANALYTICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second) SQL Parsing and Blending Data Streaming Analytics Both offline and streaming data Data Analysts Can Write Queries Across the Lines of Business Capture Events that Matter Low-latency analytics use cases Events Processing
  • 17. 17 © 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 18. 18 © 2023 Cloudera, Inc. All rights reserved. SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 19. Demo
  • 20. 20 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved.
  • 23. 23 © 2023 Cloudera, Inc. All rights reserved. Cloudera Streams Processing - Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 24. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 26. © 2023 Cloudera, Inc. All rights reserved. 26 Future of Data - NYC / Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to LLM to Cloud to Analytics to NLP to Fast Data to Machine Learning to Microservices to ...
  • 28. 28 © 2023 Cloudera, Inc. All rights reserved. Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 29. © 2023 Cloudera, Inc. All rights reserved. 29 FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java and Open Source friends. https://bit.ly/32dAJft
  • 31. LLM USE CASE Vector DB AI Model Unstructured file types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams
  • 33. 33 © 2023 Cloudera, Inc. All rights reserved. TH N Y U