SlideShare a Scribd company logo
1 of 80
Download to read offline
© 2023 Cloudera, Inc. All rights reserved.
Mastering Data Streaming Pipelines
Tim Spann
Principal Developer Advocate
09-May-2023
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 3
FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://medium.com/@tspann
https://github.com/tspannhw
Apache NiFi x Apache Kafka x Apache Flink x Java
© 2023 Cloudera, Inc. All rights reserved. 4
FLiP Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python,
Java and Open Source friends.
https://bit.ly/32dAJft
See details here: https://lnkd.in/gCUVSvM6 (Passcode: a.7eZl5N).
© 2023 Cloudera, Inc. All rights reserved. 6
Future of Data - Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
© 2023 Cloudera, Inc. All rights reserved. 8
CSP Community
Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
© 2023 Cloudera, Inc. All rights reserved.
STREAMING
© 2023 Cloudera, Inc. All rights reserved. 10
WHAT IS REAL-TIME?
© 2023 Cloudera, Inc. All rights reserved. 11
BUILDING REAL-TIME REQUIRES A TEAM
© 2023 Cloudera, Inc. All rights reserved. 12
Streaming for Java Developers
Multiple users, frameworks, languages, devices, data sources & clusters
• Expert in ETL (Eating, Ties
and Laziness)
• Deep SME in Buzzwords
• No Coding Skills
• R&D into Lasers
CAT AI
• Will Drive your Car?
• Will Fix Your Code?
• Will Beat You At Q-Bert
• Will Write my Next Talk
STREAMING ENGINEER
• Coding skills in Python,
Java
• Experience with Apache
Kafka or Pulsar
• Knowledge of database
query languages such as
SQL
• Knowledge of tools such as
Apache Flink, Apache
Spark and Apache NiFi
JAVA DEVELOPER
• Frameworks like Spring,
Quarkus and micronaut
• Relational Databases, SQL
• Cloud
• Dev and Build Tools
APACHE SPARK
© 2023 Cloudera, Inc. All rights reserved. 15
APACHE SPARK
Data Engineering at Scale
• Multi-language support with Java, Scala, Python and R
• Batch and Microbatch
• Strong SQL support
• Machine Learning
• Jupyter and Apache Zeppelin notebook support
© 2023 Cloudera, Inc. All rights reserved. 16
APACHE ICEBERG
A Flexible, Performant & Scalable Table Format
• Donated by Netflix to the Apache Foundation in 2018
• Flexibility
– Hidden partitioning
– Full schema evolution
• Data Warehouse Operations
– Atomic Consistent Isolated Durable (ACID)
Transactions
– Time travel and rollback
• Supports best in class SQL performance
– High performance at Petabyte scale
© 2023 Cloudera, Inc. All rights reserved.
APACHE KAFKA
I Can Haz Data?
© 2023 Cloudera, Inc. All rights reserved. 18
Yes, Franz, It’s Kafka
Let’s do a metamorphosis on your data. Don’t fear changing data.
You don’t need to be a brilliant writer to stream
data.
Franz Kafka was a German-speaking
Bohemian novelist and short-story writer,
widely regarded as one of the major figures of
20th-century literature. His work fuses
elements of realism and the fantastic.
Wikipedia
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 19
STREAMS MESSAGING WITH KAFKA
• Highly reliable distributed messaging system.
• Decouple applications, enables many-to-many
patterns.
• Publish-Subscribe semantics.
• Horizontal scalability.
• Efficient implementation to operate at speed with
big data volumes.
• Organized by topic to support several use cases.
© 2023 Cloudera, Inc. All rights reserved. 20
What is Can You Do With Apache Kafka?
• Web site activity: track page views, searches, etc. in real time
• Events & log aggregation: particularly in distributed systems where messages
come from multiple sources
• Monitoring and metrics: aggregate statistics from distributed applications and
build a dashboard application
• Stream processing: process raw data, clean it up, and forward it on to another
topic or messaging system
• Real-time data ingestion: fast processing of a very large volume of messages
© 2019 Cloudera, Inc. All rights reserved. 21
KAFKA CLUSTER GEO 2
DATA SYNDICATE SERVICES
Kafka Topic
syndicate-
transmission
Kafka Topic
syndicate-
temp
Kafka Topic
syndicate-
speed
Kafka Topic
syndicate-
geo
KAFKA CLUSTER GEO 1
DATA SYNDICATE SERVICES
Kafka Topic
syndicate-
transmission
Kafka Topic
syndicate-
temp
Kafka Topic
syndicate-
speed
Kafka Topic
syndicate-
geo
Apache Kafka
DATA COLLECTION
AT THE EDGE
C++ agent
US-West Fleet
C++ agent
US-Central Fleet
C++ agent
US-East Fleet
INGEST GATEWAY
POWERED BY
KAFKA
gateway-west-
raw-sensors
gateway-central-
raw-sensors
gateway-east-
raw-sensors
DATA FLOW APPS
POWERED BY NIFI
STREAMING
ANALYTICS APPS
Micro Batch Analytics
Stream Analytics App
Micro Services
Stream Analytics App
Complex Low Latent
Stream Analytics App
Apache
Flink
Structured
Streaming
Replication /
Data Deployment
MiNiFi Apache Kafka Apache NiFi Apache Kafka Apache Flink
© 2023 Cloudera, Inc. All rights reserved.
APACHE FLINK
© 2023 Cloudera, Inc. All rights reserved. 23
Flink SQL
https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
© 2023 Cloudera, Inc. All rights reserved. 24
CONTINUOUS SQL
● SSB is a Continuous SQL engine
● It’s SQL, but a slightly different mental model, but with big implications
Traditional Parse/Execute/Fetch model Continuous SQL Model
Hint: The query is boundless and never finishes, and time matters
AKA: SELECT * FROM foo WHERE 1=0 -- will run forever
25
© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
© 2023 Cloudera, Inc. All rights reserved. 26
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
© 2023 Cloudera, Inc. All rights reserved. 27
Streaming ETL Data Pipeline Made Simple with SQL StreamBuilder
Write Streaming
Result to Kudu
Join 2 Streaming
User Event Topics
Enrich Stream from
Warehouse HR Table
Enrich Stream from RT
Mart Timesheet Table
Filter &
Transform
© 2023 Cloudera, Inc. All rights reserved. 28
Streaming Data Lineage with SDX
DATA GOVERNANCE FOR THE ENTIRE STREAMING PIPELINE
• Track Consumer, Producer, Topics
and Consumer Group Lineage
• No changes required to
Consumers or Producers
• End-To-End lineage from consumer
to producer
© 2023 Cloudera, Inc. All rights reserved. 29
Moving Beyond Draining of Streams Into Lakes: Analytics-in-Stream
Data Sources Streaming Storage
Substrate
Cloudera Stream Processing
Kafka + NiFi enables
real-time ingestion into
lakes / analytics services
Data Distribution
Service
Cloudera DataFlow
Warehouses & Operational DB
Data Lakes & Lake Houses
Data-At-Rest Analytics
Data Apps Powered by
Streaming Insights and used
by other Analytics Services
Kafka + Flink
enables streaming
analytics
Cloudera Stream Processing
Streaming
Analytics
Low Latency
Data Products
Data-In-Motion Streaming Analytics
© 2023 Cloudera, Inc. All rights reserved.
DATAFLOW
APACHE NIFI
© 2023 Cloudera, Inc. All rights reserved. 31
Cloudera DataFlow: Universal Data Distribution Service
Process
Route
Filter
Enrich
Transform
Distribute
Connectors
Any
destination
Deliver
Ingest
Active
Passive
Connectors
Gateway
Endpoint
Connect & Pull
Send
Data born in
the cloud
Data born
outside the
cloud
UNIVERSAL DATA DISTRIBUTION WITH CLOUDERA DATAFLOW (CDF)
Connect to Any Data Source Anywhere then Process and Deliver to Any Destination
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 32
CLOUDERA DATAFLOW - POWERED BY APACHE NiFi
Ingest and manage data from edge-to-cloud using a no-code interface
● #1 data ingestion/movement engine
● Strong community
● Product maturity over 11 years
● Deploy on-premises or in the cloud
● Over 400+ pre-built processors
● Built-in data provenance
● Guaranteed delivery
● Throttling and Back pressure
© 2023 Cloudera, Inc. All rights reserved. 33
CLOUDERA FLOW AND EDGE MANAGEMENT
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data
center) to any downstream system with built in end-to-end security and provenance
Advanced tooling to industrialize
flow development (Flow Development
Life Cycle)
ACQUIRE
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
PROCESS
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ENCRYPT
TALL
EVALUATE
EXECUTE
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
ROUTE RATE
DISTRIBUTE LOAD
DELIVER
• Guaranteed Delivery
• Full data provenance from
acquisition to delivery
• Diverse, Non-Traditional Sources
• Eco-system integration
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved.
Apache NiFi Pulsar Connector
https://streamnative.io/apache-nifi-connector/
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 37
PROVENANCE
© 2023 Cloudera, Inc. All rights reserved. 38
EXTENSIBILITY
• Built from the ground up with extensions in mind
• Service-loader pattern for…
– Processors
– Controller Services
– Reporting Tasks
– Prioritizers
• Extensions packaged as NiFi Archives (NARs)
– Deploy NiFi lib directory and restart
– Same model as standard components
© 2019 Cloudera, Inc. All rights reserved. 39
NiFi Load Balancing
• Improve NiFi cluster throughput
• Defined at connection level
• Configurable balancing
strategies
• Critical for scale up paradigm in
Kubernetes
• Alleviates S2S balancing “hack”
customers use
© 2019 Cloudera, Inc. All rights reserved. 40
QUEUE CONFIGURATION
• FlowFile Expiration - Data that cannot be processed in a timely
fashion can be automatically removed from the flow.
• Back Pressure Thresholds - Thresholds indicate how much data
should be allowed to exist in the queue before the component
that is the source of the Connection is no longer scheduled to
run. This allows the system to avoid being overrun with data.
• Load Balance Strategy – Strategy to distribute the data in a flow
across the nodes in the cluster. When enabled, compression can
be configured on FlowFile contents and attributes.
• Prioritization – Determines the order in which flow files are
processed.
© 2019 Cloudera, Inc. All rights reserved. 41
RECORD-ORIENTED DATA WITH NIFI
• Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
• Record Writers - Avro, CSV, FreeFromText, Json, Parquet,
Scripted, XML
• Record Reader and Writer support referencing a schema registry
for retrieving schemas when necessary.
• Enable processors that accept any data format without having to
worry about the parsing and serialization logic.
• Allows us to keep FlowFiles larger, each consisting of multiple
records, which results in far better performance.
© 2019 Cloudera, Inc. All rights reserved. 42
RUNNING SQL ON FLOWFILES
• Evaluates one or more SQL queries against the contents of a
FlowFile.
• This can be used, for example, for field-specific filtering,
transformation, and row-level filtering.
• Columns can be renamed, simple calculations and aggregations
performed.
• The SQL statement must be valid ANSI SQL and is powered by
Apache Calcite.
© 2023 Cloudera, Inc. All rights reserved. 43
Processing one million events per second with
NiFi
Apache NiFi with Python Custom Processors
Python as a 1st class citizen
© 2023 Cloudera, Inc. All rights reserved.
CUSTOM JAVA APPS
© 2023 Cloudera, Inc. All rights reserved. 46
Custom Processors
https://github.com/tspannhw/nifi-extracttext-processor
https://github.com/tspannhw/nifi-tensorflow-processor
https://github.com/tspannhw/nifi-nlp-processor
https://github.com/tspannhw/nifi-convertjsontoddl-processor
https://github.com/tspannhw/nifi-corenlp-processor
https://github.com/tspannhw/nifi-imageextractor-processor
https://github.com/tspannhw/nifi-attributecleaner-processor
https://github.com/tspannhw/linkextractorprocessor
https://github.com/tspannhw/GetWebCamera
https://github.com/tspannhw/nifi-langdetect-processor
https://github.com/tspannhw/nifi-postimage-processor
© 2023 Cloudera, Inc. All rights reserved. 47
Custom Processors in Java
https://www.datainmotion.dev/2019/03/getting-started-with-custom-processor.html
https://www.datainmotion.dev/2019/03/posting-images-to-imgur-via-apache-nifi.html
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 48
https://github.com/tspannhw/nifi-tensorflow-processor
public class TensorFlowProcessor extends AbstractProcessor {
public static final PropertyDescriptor MODEL_DIR = new
PropertyDescriptor.Builder().name(MODEL_DIR_NAME)
.description("Model Directory").required(true).expressionLanguageSupported(true)
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR).build();
@Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws
ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
flowFile = session.create();
}
try {
flowFile.getAttributes();
}
© 2023 Cloudera, Inc. All rights reserved.
CLOUDERA FLOW DESIGNER
© 2023 Cloudera, Inc. All rights reserved. 50
51
© 2023 Cloudera, Inc. All rights reserved.
READYFLOW
GALLERY
• Cloudera provided flow
definitions
• Cover most common data flow
use cases
• Optimized to work with CDP
sources/destinations
• Can be deployed and adjusted
as needed
52
© 2023 Cloudera, Inc. All rights reserved.
FLOW CATALOG
• Central repository for flow
definitions
• Import existing NiFi flows
• Manage flow definitions
• Initiate flow deployments
53
© 2023 Cloudera, Inc. All rights reserved.
DEPLOYMENT
WIZARD
• Turns flow definitions into flow
deployments
• Guides users through providing
required configuration
• Choose NiFi runtime version
• Pick from pre-defined NiFi node sizes
• Define KPIs for the deployment
Start Deployment Wizard Provide Parameters
Configure Sizing & Scaling Define KPIs
54
© 2023 Cloudera, Inc. All rights reserved.
KEY
PERFORMANCE
INDICATORS
• Visibility into flow deployments
• Track high level flow
performance
• Track in-depth NiFi component
metrics
• Defined in Deployment Wizard
• Monitoring & Alerts in
Deployment Details
KPI Definition in Deployment Wizard KPI Monitoring
55
© 2023 Cloudera, Inc. All rights reserved.
DASHBOARD
• Central Monitoring View
• Monitors flow deployments
across CDP environments
• Monitors flow deployment
health & performance
• Drill into flow deployment to
monitor system metrics and
deployment events
56
© 2023 Cloudera, Inc. All rights reserved.
DEPLOYMENT
MANAGER
• Manage flow deployment
lifecycle
(Suspend/Start/Terminate)
• Add/Edit KPIs
• Change sizing configuration
• Update parameters
• Change NiFi version of the
deployment
• Gateway to NiFi canvas
57
© 2023 Cloudera, Inc. All rights reserved.
NIFI VERSION
UPGRADES
• Pick up NiFi hotfixes easily
• Upgrade (or downgrade) the
hotfix version of existing
deployments
• Rolling upgrade (if the
deployment has >1 NiFi nodes)
© 2019 Cloudera, Inc. All rights reserved. 58
DATA FLOW
DESIGN FOR
EVERYONE
• Cloud-native data flow
development
• Developers get their own
sandbox
• Start developing flows without
installing NiFi
• Redesigned visual canvas
• Optimized interaction patterns
• Integration into CDF-PC Catalog
for versioning
© 2019 Cloudera, Inc. All rights reserved. 59
No-Code/Low-Code User Experience
DataFlow Designer SQL Stream Builder
visual canvas + automatic provisioning=
Self-service pipeline development
Instant data access + unified SQL processing =
Self-service streaming event processing
Development & Runtime of DataFlow Functions
Step1. Develop functions
on local workstation or in
CDP Public Cloud using
no-code, UI designer
Step 2. Run functions on
serverless compute
services in AWS, Azure &
GCP
AWS Lambda Azure Functions Google Cloud Functions
© 2023 Cloudera, Inc. All rights reserved. 61
DataFlow Functions Use Cases
Trigger Based, Batch, Scheduled and Microservice Use Cases
Serverless Trigger-Based
File Processing Pipeline
Develop & run data processing pipelines when
files are created or updated in any of the cloud
object stores
Example: When a photo is uploaded to object
storage, a data flow is triggered which runs image
resizing code and delivers resized image to
different locations.
Serverless Workflows /
Orchestration
Chain different low-code functions to build
complex workflows
Example: Automate the handling of support
tickets in a call center or orchestrate data
movement across different cloud services.
Serverless
Scheduled Tasks
Develop and run scheduled tasks without any
code on pre-defined timed intervals
Example: Offload an external database running
on-premises into the cloud once a day every
morning at 4:00 a.m.
Serverless
Microservices
Build and deploy serverless independent modules
that power your applications microservices
architecture
Example: Event-driven functions for easy
communication between thousands of decoupled
services that power a ride-sharing application.
Serverless
Web APIs
Easily build endpoints for your web applications
with HTTP APIs without any code using DFF and
any of the cloud providers' function triggers
Example: Build high performant, scalable web
applications across multiple data centers.
Serverless
Customized Triggers
With the DFF State feature, build flows to create
customized triggers allowing access to
on-premises or external services
Example: Near real time offloading of files from a
remote SFTP server.
© 2023 Cloudera, Inc. All rights reserved.
DEMOS
© 2019 Cloudera, Inc. All rights reserved. 63
Cloudera Dataflow Demos
© 2019 Cloudera, Inc. All rights reserved. 64
https://github.com/tspannhw/FLaNK-TravelAdvisory
Examples
© 2023 Cloudera, Inc. All rights reserved. 65
© 2019 Cloudera, Inc. All rights reserved. 66
Examples - Spring Boot - Apache Kafka - Apache NiFi - Apache Flink
https://github.com/tspannhw/flank-airquality
© 2019 Cloudera, Inc. All rights reserved. 67
Examples - Python - Kafka - NiFi - Flink - Spark
© 2019 Cloudera, Inc. All rights reserved. 68
Examples - NiFi - Kafka - Flink
© 2023 Cloudera, Inc. All rights reserved. 69
© 2023 Cloudera, Inc. All rights reserved. 70
© 2023 Cloudera, Inc. All rights reserved.
BEST PRACTICES
© 2023 Cloudera, Inc. All rights reserved. 72
STREAMING TECH DEBT TIPS
• Version Control All Assets
• Managed Public Cloud like Cloudera
• Use DevOps and APIs
• Latest Java and Python
• Stream Sizing (NiFi, Kafka, Flink)
© 2023 Cloudera, Inc. All rights reserved.
RESOURCES AND WRAP-UP
© 2023 Cloudera, Inc. All rights reserved. 74
Resources
© 2023 Cloudera, Inc. All rights reserved.
© 2021 Cloudera, Inc. All rights reserved. 75
© 2023 Cloudera, Inc. All rights reserved.
© 2021 Cloudera, Inc. All rights reserved. 76
Upcoming Events
May 10 May 23
May 25
© 2023 Cloudera, Inc. All rights reserved. 77
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 78
● https://spring.io/guides/gs/spring-boot/
● https://spring.io/projects/spring-amqp/
● https://spring.io/projects/spring-kafka/
● https://github.com/spring-projects/spring-integration-kafka
● https://github.com/spring-projects/spring-integration
● https://github.com/spring-projects/spring-data-relational
● https://github.com/spring-projects/spring-kafka
● https://github.com/spring-projects/spring-amqp
Spring Things
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 79
80
TH N Y U

More Related Content

What's hot

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 

What's hot (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 

Similar to GSJUG: Mastering Data Streaming Pipelines 09May2023

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
Timothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
Timothy Spann
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 

Similar to GSJUG: Mastering Data Streaming Pipelines 09May2023 (20)

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 

More from Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Timothy Spann
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann
 

More from Timothy Spann (19)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
CoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
 

Recently uploaded

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Recently uploaded (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 

GSJUG: Mastering Data Streaming Pipelines 09May2023

  • 1. © 2023 Cloudera, Inc. All rights reserved. Mastering Data Streaming Pipelines Tim Spann Principal Developer Advocate 09-May-2023
  • 2. © 2023 Cloudera, Inc. All rights reserved.
  • 3. © 2023 Cloudera, Inc. All rights reserved. 3 FLaNK Stack Tim Spann @PaasDev // Blog: www.datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw Apache NiFi x Apache Kafka x Apache Flink x Java
  • 4. © 2023 Cloudera, Inc. All rights reserved. 4 FLiP Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java and Open Source friends. https://bit.ly/32dAJft
  • 5. See details here: https://lnkd.in/gCUVSvM6 (Passcode: a.7eZl5N).
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 Future of Data - Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 7. © 2023 Cloudera, Inc. All rights reserved. FREE LEARNING ENVIRONMENT
  • 8. © 2023 Cloudera, Inc. All rights reserved. 8 CSP Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 9. © 2023 Cloudera, Inc. All rights reserved. STREAMING
  • 10. © 2023 Cloudera, Inc. All rights reserved. 10 WHAT IS REAL-TIME?
  • 11. © 2023 Cloudera, Inc. All rights reserved. 11 BUILDING REAL-TIME REQUIRES A TEAM
  • 12. © 2023 Cloudera, Inc. All rights reserved. 12
  • 13. Streaming for Java Developers Multiple users, frameworks, languages, devices, data sources & clusters • Expert in ETL (Eating, Ties and Laziness) • Deep SME in Buzzwords • No Coding Skills • R&D into Lasers CAT AI • Will Drive your Car? • Will Fix Your Code? • Will Beat You At Q-Bert • Will Write my Next Talk STREAMING ENGINEER • Coding skills in Python, Java • Experience with Apache Kafka or Pulsar • Knowledge of database query languages such as SQL • Knowledge of tools such as Apache Flink, Apache Spark and Apache NiFi JAVA DEVELOPER • Frameworks like Spring, Quarkus and micronaut • Relational Databases, SQL • Cloud • Dev and Build Tools
  • 15. © 2023 Cloudera, Inc. All rights reserved. 15 APACHE SPARK Data Engineering at Scale • Multi-language support with Java, Scala, Python and R • Batch and Microbatch • Strong SQL support • Machine Learning • Jupyter and Apache Zeppelin notebook support
  • 16. © 2023 Cloudera, Inc. All rights reserved. 16 APACHE ICEBERG A Flexible, Performant & Scalable Table Format • Donated by Netflix to the Apache Foundation in 2018 • Flexibility – Hidden partitioning – Full schema evolution • Data Warehouse Operations – Atomic Consistent Isolated Durable (ACID) Transactions – Time travel and rollback • Supports best in class SQL performance – High performance at Petabyte scale
  • 17. © 2023 Cloudera, Inc. All rights reserved. APACHE KAFKA I Can Haz Data?
  • 18. © 2023 Cloudera, Inc. All rights reserved. 18 Yes, Franz, It’s Kafka Let’s do a metamorphosis on your data. Don’t fear changing data. You don’t need to be a brilliant writer to stream data. Franz Kafka was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. Wikipedia
  • 19. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 19 STREAMS MESSAGING WITH KAFKA • Highly reliable distributed messaging system. • Decouple applications, enables many-to-many patterns. • Publish-Subscribe semantics. • Horizontal scalability. • Efficient implementation to operate at speed with big data volumes. • Organized by topic to support several use cases.
  • 20. © 2023 Cloudera, Inc. All rights reserved. 20 What is Can You Do With Apache Kafka? • Web site activity: track page views, searches, etc. in real time • Events & log aggregation: particularly in distributed systems where messages come from multiple sources • Monitoring and metrics: aggregate statistics from distributed applications and build a dashboard application • Stream processing: process raw data, clean it up, and forward it on to another topic or messaging system • Real-time data ingestion: fast processing of a very large volume of messages
  • 21. © 2019 Cloudera, Inc. All rights reserved. 21 KAFKA CLUSTER GEO 2 DATA SYNDICATE SERVICES Kafka Topic syndicate- transmission Kafka Topic syndicate- temp Kafka Topic syndicate- speed Kafka Topic syndicate- geo KAFKA CLUSTER GEO 1 DATA SYNDICATE SERVICES Kafka Topic syndicate- transmission Kafka Topic syndicate- temp Kafka Topic syndicate- speed Kafka Topic syndicate- geo Apache Kafka DATA COLLECTION AT THE EDGE C++ agent US-West Fleet C++ agent US-Central Fleet C++ agent US-East Fleet INGEST GATEWAY POWERED BY KAFKA gateway-west- raw-sensors gateway-central- raw-sensors gateway-east- raw-sensors DATA FLOW APPS POWERED BY NIFI STREAMING ANALYTICS APPS Micro Batch Analytics Stream Analytics App Micro Services Stream Analytics App Complex Low Latent Stream Analytics App Apache Flink Structured Streaming Replication / Data Deployment MiNiFi Apache Kafka Apache NiFi Apache Kafka Apache Flink
  • 22. © 2023 Cloudera, Inc. All rights reserved. APACHE FLINK
  • 23. © 2023 Cloudera, Inc. All rights reserved. 23 Flink SQL https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite
  • 24. © 2023 Cloudera, Inc. All rights reserved. 24 CONTINUOUS SQL ● SSB is a Continuous SQL engine ● It’s SQL, but a slightly different mental model, but with big implications Traditional Parse/Execute/Fetch model Continuous SQL Model Hint: The query is boundless and never finishes, and time matters AKA: SELECT * FROM foo WHERE 1=0 -- will run forever
  • 25. 25 © 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 26. © 2023 Cloudera, Inc. All rights reserved. 26 SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 27. © 2023 Cloudera, Inc. All rights reserved. 27 Streaming ETL Data Pipeline Made Simple with SQL StreamBuilder Write Streaming Result to Kudu Join 2 Streaming User Event Topics Enrich Stream from Warehouse HR Table Enrich Stream from RT Mart Timesheet Table Filter & Transform
  • 28. © 2023 Cloudera, Inc. All rights reserved. 28 Streaming Data Lineage with SDX DATA GOVERNANCE FOR THE ENTIRE STREAMING PIPELINE • Track Consumer, Producer, Topics and Consumer Group Lineage • No changes required to Consumers or Producers • End-To-End lineage from consumer to producer
  • 29. © 2023 Cloudera, Inc. All rights reserved. 29 Moving Beyond Draining of Streams Into Lakes: Analytics-in-Stream Data Sources Streaming Storage Substrate Cloudera Stream Processing Kafka + NiFi enables real-time ingestion into lakes / analytics services Data Distribution Service Cloudera DataFlow Warehouses & Operational DB Data Lakes & Lake Houses Data-At-Rest Analytics Data Apps Powered by Streaming Insights and used by other Analytics Services Kafka + Flink enables streaming analytics Cloudera Stream Processing Streaming Analytics Low Latency Data Products Data-In-Motion Streaming Analytics
  • 30. © 2023 Cloudera, Inc. All rights reserved. DATAFLOW APACHE NIFI
  • 31. © 2023 Cloudera, Inc. All rights reserved. 31 Cloudera DataFlow: Universal Data Distribution Service Process Route Filter Enrich Transform Distribute Connectors Any destination Deliver Ingest Active Passive Connectors Gateway Endpoint Connect & Pull Send Data born in the cloud Data born outside the cloud UNIVERSAL DATA DISTRIBUTION WITH CLOUDERA DATAFLOW (CDF) Connect to Any Data Source Anywhere then Process and Deliver to Any Destination
  • 32. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 32 CLOUDERA DATAFLOW - POWERED BY APACHE NiFi Ingest and manage data from edge-to-cloud using a no-code interface ● #1 data ingestion/movement engine ● Strong community ● Product maturity over 11 years ● Deploy on-premises or in the cloud ● Over 400+ pre-built processors ● Built-in data provenance ● Guaranteed delivery ● Throttling and Back pressure
  • 33. © 2023 Cloudera, Inc. All rights reserved. 33 CLOUDERA FLOW AND EDGE MANAGEMENT Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance Advanced tooling to industrialize flow development (Flow Development Life Cycle) ACQUIRE • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG PROCESS HASH MERGE EXTRACT DUPLICATE SPLIT ENCRYPT TALL EVALUATE EXECUTE GEOENRICH SCAN REPLACE TRANSLATE CONVERT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT ROUTE RATE DISTRIBUTE LOAD DELIVER • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG
  • 34. © 2023 Cloudera, Inc. All rights reserved.
  • 35. © 2023 Cloudera, Inc. All rights reserved. Apache NiFi Pulsar Connector https://streamnative.io/apache-nifi-connector/
  • 36. © 2023 Cloudera, Inc. All rights reserved.
  • 37. © 2023 Cloudera, Inc. All rights reserved. 37 PROVENANCE
  • 38. © 2023 Cloudera, Inc. All rights reserved. 38 EXTENSIBILITY • Built from the ground up with extensions in mind • Service-loader pattern for… – Processors – Controller Services – Reporting Tasks – Prioritizers • Extensions packaged as NiFi Archives (NARs) – Deploy NiFi lib directory and restart – Same model as standard components
  • 39. © 2019 Cloudera, Inc. All rights reserved. 39 NiFi Load Balancing • Improve NiFi cluster throughput • Defined at connection level • Configurable balancing strategies • Critical for scale up paradigm in Kubernetes • Alleviates S2S balancing “hack” customers use
  • 40. © 2019 Cloudera, Inc. All rights reserved. 40 QUEUE CONFIGURATION • FlowFile Expiration - Data that cannot be processed in a timely fashion can be automatically removed from the flow. • Back Pressure Thresholds - Thresholds indicate how much data should be allowed to exist in the queue before the component that is the source of the Connection is no longer scheduled to run. This allows the system to avoid being overrun with data. • Load Balance Strategy – Strategy to distribute the data in a flow across the nodes in the cluster. When enabled, compression can be configured on FlowFile contents and attributes. • Prioritization – Determines the order in which flow files are processed.
  • 41. © 2019 Cloudera, Inc. All rights reserved. 41 RECORD-ORIENTED DATA WITH NIFI • Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet, Scripted, Syslog5424, Syslog, WindowsEvent, XML • Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML • Record Reader and Writer support referencing a schema registry for retrieving schemas when necessary. • Enable processors that accept any data format without having to worry about the parsing and serialization logic. • Allows us to keep FlowFiles larger, each consisting of multiple records, which results in far better performance.
  • 42. © 2019 Cloudera, Inc. All rights reserved. 42 RUNNING SQL ON FLOWFILES • Evaluates one or more SQL queries against the contents of a FlowFile. • This can be used, for example, for field-specific filtering, transformation, and row-level filtering. • Columns can be renamed, simple calculations and aggregations performed. • The SQL statement must be valid ANSI SQL and is powered by Apache Calcite.
  • 43. © 2023 Cloudera, Inc. All rights reserved. 43 Processing one million events per second with NiFi
  • 44. Apache NiFi with Python Custom Processors Python as a 1st class citizen
  • 45. © 2023 Cloudera, Inc. All rights reserved. CUSTOM JAVA APPS
  • 46. © 2023 Cloudera, Inc. All rights reserved. 46 Custom Processors https://github.com/tspannhw/nifi-extracttext-processor https://github.com/tspannhw/nifi-tensorflow-processor https://github.com/tspannhw/nifi-nlp-processor https://github.com/tspannhw/nifi-convertjsontoddl-processor https://github.com/tspannhw/nifi-corenlp-processor https://github.com/tspannhw/nifi-imageextractor-processor https://github.com/tspannhw/nifi-attributecleaner-processor https://github.com/tspannhw/linkextractorprocessor https://github.com/tspannhw/GetWebCamera https://github.com/tspannhw/nifi-langdetect-processor https://github.com/tspannhw/nifi-postimage-processor
  • 47. © 2023 Cloudera, Inc. All rights reserved. 47 Custom Processors in Java https://www.datainmotion.dev/2019/03/getting-started-with-custom-processor.html https://www.datainmotion.dev/2019/03/posting-images-to-imgur-via-apache-nifi.html
  • 48. © 2023 Cloudera, Inc. All rights reserved. © 2023 Cloudera, Inc. All rights reserved. 48 https://github.com/tspannhw/nifi-tensorflow-processor public class TensorFlowProcessor extends AbstractProcessor { public static final PropertyDescriptor MODEL_DIR = new PropertyDescriptor.Builder().name(MODEL_DIR_NAME) .description("Model Directory").required(true).expressionLanguageSupported(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR).build(); @Override public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException { FlowFile flowFile = session.get(); if (flowFile == null) { flowFile = session.create(); } try { flowFile.getAttributes(); }
  • 49. © 2023 Cloudera, Inc. All rights reserved. CLOUDERA FLOW DESIGNER
  • 50. © 2023 Cloudera, Inc. All rights reserved. 50
  • 51. 51 © 2023 Cloudera, Inc. All rights reserved. READYFLOW GALLERY • Cloudera provided flow definitions • Cover most common data flow use cases • Optimized to work with CDP sources/destinations • Can be deployed and adjusted as needed
  • 52. 52 © 2023 Cloudera, Inc. All rights reserved. FLOW CATALOG • Central repository for flow definitions • Import existing NiFi flows • Manage flow definitions • Initiate flow deployments
  • 53. 53 © 2023 Cloudera, Inc. All rights reserved. DEPLOYMENT WIZARD • Turns flow definitions into flow deployments • Guides users through providing required configuration • Choose NiFi runtime version • Pick from pre-defined NiFi node sizes • Define KPIs for the deployment Start Deployment Wizard Provide Parameters Configure Sizing & Scaling Define KPIs
  • 54. 54 © 2023 Cloudera, Inc. All rights reserved. KEY PERFORMANCE INDICATORS • Visibility into flow deployments • Track high level flow performance • Track in-depth NiFi component metrics • Defined in Deployment Wizard • Monitoring & Alerts in Deployment Details KPI Definition in Deployment Wizard KPI Monitoring
  • 55. 55 © 2023 Cloudera, Inc. All rights reserved. DASHBOARD • Central Monitoring View • Monitors flow deployments across CDP environments • Monitors flow deployment health & performance • Drill into flow deployment to monitor system metrics and deployment events
  • 56. 56 © 2023 Cloudera, Inc. All rights reserved. DEPLOYMENT MANAGER • Manage flow deployment lifecycle (Suspend/Start/Terminate) • Add/Edit KPIs • Change sizing configuration • Update parameters • Change NiFi version of the deployment • Gateway to NiFi canvas
  • 57. 57 © 2023 Cloudera, Inc. All rights reserved. NIFI VERSION UPGRADES • Pick up NiFi hotfixes easily • Upgrade (or downgrade) the hotfix version of existing deployments • Rolling upgrade (if the deployment has >1 NiFi nodes)
  • 58. © 2019 Cloudera, Inc. All rights reserved. 58 DATA FLOW DESIGN FOR EVERYONE • Cloud-native data flow development • Developers get their own sandbox • Start developing flows without installing NiFi • Redesigned visual canvas • Optimized interaction patterns • Integration into CDF-PC Catalog for versioning
  • 59. © 2019 Cloudera, Inc. All rights reserved. 59 No-Code/Low-Code User Experience DataFlow Designer SQL Stream Builder visual canvas + automatic provisioning= Self-service pipeline development Instant data access + unified SQL processing = Self-service streaming event processing
  • 60. Development & Runtime of DataFlow Functions Step1. Develop functions on local workstation or in CDP Public Cloud using no-code, UI designer Step 2. Run functions on serverless compute services in AWS, Azure & GCP AWS Lambda Azure Functions Google Cloud Functions
  • 61. © 2023 Cloudera, Inc. All rights reserved. 61 DataFlow Functions Use Cases Trigger Based, Batch, Scheduled and Microservice Use Cases Serverless Trigger-Based File Processing Pipeline Develop & run data processing pipelines when files are created or updated in any of the cloud object stores Example: When a photo is uploaded to object storage, a data flow is triggered which runs image resizing code and delivers resized image to different locations. Serverless Workflows / Orchestration Chain different low-code functions to build complex workflows Example: Automate the handling of support tickets in a call center or orchestrate data movement across different cloud services. Serverless Scheduled Tasks Develop and run scheduled tasks without any code on pre-defined timed intervals Example: Offload an external database running on-premises into the cloud once a day every morning at 4:00 a.m. Serverless Microservices Build and deploy serverless independent modules that power your applications microservices architecture Example: Event-driven functions for easy communication between thousands of decoupled services that power a ride-sharing application. Serverless Web APIs Easily build endpoints for your web applications with HTTP APIs without any code using DFF and any of the cloud providers' function triggers Example: Build high performant, scalable web applications across multiple data centers. Serverless Customized Triggers With the DFF State feature, build flows to create customized triggers allowing access to on-premises or external services Example: Near real time offloading of files from a remote SFTP server.
  • 62. © 2023 Cloudera, Inc. All rights reserved. DEMOS
  • 63. © 2019 Cloudera, Inc. All rights reserved. 63 Cloudera Dataflow Demos
  • 64. © 2019 Cloudera, Inc. All rights reserved. 64 https://github.com/tspannhw/FLaNK-TravelAdvisory Examples
  • 65. © 2023 Cloudera, Inc. All rights reserved. 65
  • 66. © 2019 Cloudera, Inc. All rights reserved. 66 Examples - Spring Boot - Apache Kafka - Apache NiFi - Apache Flink https://github.com/tspannhw/flank-airquality
  • 67. © 2019 Cloudera, Inc. All rights reserved. 67 Examples - Python - Kafka - NiFi - Flink - Spark
  • 68. © 2019 Cloudera, Inc. All rights reserved. 68 Examples - NiFi - Kafka - Flink
  • 69. © 2023 Cloudera, Inc. All rights reserved. 69
  • 70. © 2023 Cloudera, Inc. All rights reserved. 70
  • 71. © 2023 Cloudera, Inc. All rights reserved. BEST PRACTICES
  • 72. © 2023 Cloudera, Inc. All rights reserved. 72 STREAMING TECH DEBT TIPS • Version Control All Assets • Managed Public Cloud like Cloudera • Use DevOps and APIs • Latest Java and Python • Stream Sizing (NiFi, Kafka, Flink)
  • 73. © 2023 Cloudera, Inc. All rights reserved. RESOURCES AND WRAP-UP
  • 74. © 2023 Cloudera, Inc. All rights reserved. 74 Resources
  • 75. © 2023 Cloudera, Inc. All rights reserved. © 2021 Cloudera, Inc. All rights reserved. 75
  • 76. © 2023 Cloudera, Inc. All rights reserved. © 2021 Cloudera, Inc. All rights reserved. 76 Upcoming Events May 10 May 23 May 25
  • 77. © 2023 Cloudera, Inc. All rights reserved. 77 Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 78. © 2023 Cloudera, Inc. All rights reserved. © 2023 Cloudera, Inc. All rights reserved. 78 ● https://spring.io/guides/gs/spring-boot/ ● https://spring.io/projects/spring-amqp/ ● https://spring.io/projects/spring-kafka/ ● https://github.com/spring-projects/spring-integration-kafka ● https://github.com/spring-projects/spring-integration ● https://github.com/spring-projects/spring-data-relational ● https://github.com/spring-projects/spring-kafka ● https://github.com/spring-projects/spring-amqp Spring Things
  • 79. © 2023 Cloudera, Inc. All rights reserved. © 2023 Cloudera, Inc. All rights reserved. 79