SlideShare a Scribd company logo
1 of 76
Download to read offline
Welcome to the
Flink SQL Hands-on Workshop
by
Workshop:
Stream Processing made easy with
Flink
27th February 2024, Madrid
Today's speakers and moderators
Juan Soto
Senior Customer Success
Technical Architect
Spain
Rui Fernandes
Senior Customer Success
Technical Architect
Spain
Tomas Dias Almeida
Customer Success Technical
Architect
Spain
Salvo Alessandro
Enterprise Solutions Engineer
Spain
Angelica Tacca
Solutions Engineer
Spain
Remember? Prerequisites?
We need a Confluent Cloud cluster on AWS running
● in an environment with Schema Registry enabled, where
● 3 topics exist and
● events are generated by our Datagen Source connector
See here:
https://github.com/griga23/shoe-store/blob/main/prereq.md
… and do not forget to clean up Confluent Cloud resources like cluster, connectors,
Flink pool etc. after the workshop (!)
09:00
09:30
10:30
12:00
12:30
13:30
14:00
Registracion y networking
Introducion: Qué son los análisis en tiempo real y
los análisis de procesamiento, y cuando se
utilizan? Stream Processing usando Confluent
Hands-on: Intro to Flink SQL
Pausa para el cafe’
Hands-on: Implementación de casos de usos con
Flink SQL
Recap, Roadmap, Q&A.
Almuerzo y networking
Agenda —
Workshop
5
Intro:
What is Real-Time Analytics and Stream Processing
with Confluent
Stream processing is a critical part of data streaming
Enable frictionless
access to up-to-date
trustworthy data
products
Share
Reimagine data
streaming everywhere,
on-prem and in every
major public cloud
Stream
Make data in motion
self-service, secure,
compliant and
trustworthy
Govern
Drive greater
data reuse with always-
on stream processing
Process
Make it easy to on-
ramp and off-ramp
data from existing
systems and apps
Connect
Stream processing acts as the compute layer to Kafka,
powering real-time applications & pipelines
DATA IN MOTION
Streaming
Applications
Apache
Flink
Apache
Kafka
DATA AT REST
Application
Layer
Processing
Layer
Storage
Layer
Traditional
Databases
File
Systems
Web
Applications
Processing
Kafka
Custom apps
3rd party apps
Databases
Database
Data
Warehouse
SaaS app
Queries
Analytics
Interactions
Processing
Processing
Processing down
stream of Kafka
increases latency, adds
costs and redundancy,
and inhibits data reuse
Increased complexity from
redundant processing
Data systems & applications
built on stale data
Expensive & inefficient to clean
and enrich data multiple times
Processing data at
ingest improves
latency, data
portability, and cost
effectiveness
Custom apps
3rd party apps
Databases
Database
Data
Warehouse
SaaS app
Queries
Analytics
Interactions
Kafka
Storage
Flink
Compute
Stream Processing
Process your data once, process your data right
Maximized data reusability &
consistency
Improved cost-efficiency from
cleaning & enriching data once
Real-time apps & data systems
reflect current state
Stream processing enables users to filter, join, and enrich
streams on-the-fly to drive greater data reuse
Heatmap service
Payment service
Supply chain systems
Watch lists
Profile mgmt
Incident mgmt
Customer
profile data
ITSM systems
Central log systems
Fraud & SIEM systems
Alerting systems
AI/ML engines
Visualization apps
Threat vector
Transactions
Payments
Mainframe data
Inventory
Weather
Telemetry
IoT data
Notification engine
Payroll systems
CRM systems
Mobile application
Personalization
Web application
Clickstreams
Customer loyalty
Change logs
Customer data
Recommendation
engine
Why Apache Flink is becoming
the de facto standard
Flink growth has
mirrored the growth of
Kafka, the de facto
standard for streaming
data
>75% of the Fortune 500 estimated to
be using Kafka
>100,000+ orgs using Kafka
>41,000 Kafka meetup attendees
>750 Kafka Improvement Proposals
>12,000 Jiras for Apache Kafka
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users
Innovative companies have adopted both Kafka & Flink
Digital natives leverage Flink to disrupt markets and gain
competitive advantage
UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
Developers choose Flink because of its performance and rich
feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Developers choose Flink because of its performance and rich
feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Flink’s powerful runtime offers limitless scalability
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel
Tasks
Trigger Checkpoints
Submit Job
Results
Applications are parallelized into possibly
thousands of tasks that are distributed and
concurrently executed in a cluster
Leverage in-memory performance
. . .
Durable
Storage
Logic State Logic State Logic State
Input
Tasks
Output
In-Memory or
On-Disk State
Local State
Access
Periodic, Asynchronous,
Incremental Snapshots
Stateful Flink applications are optimized for fast access to local state by maintaining task
state in memory or on-disk data structures, resulting in low latency processing.
Developers choose Flink because of its performance and rich
feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Flink checkpoints and savepoints enable fault tolerance and
stateful processing
CHECKPOINTS SAVEPOINTS
Automatic snapshot
created by Flink periodically
● Used to recover from failures
● Optimized for quick recovery
● Automatically created and managed
by Flink
User-triggered snapshot at
a specific point in time
● Enables manual operational tasks,
such as upgrades
● Optimized for operational flexibility
● Created and managed by the user
Flink recovers from failures in a timely and efficient manner
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel
Tasks
Trigger Checkpoints
Submit Job
Results
If a task managers fails, the job manager will
detect the failure and arrange for the job to be
restarted from the most recent state snapshot
Developers choose Flink because of its performance and rich
feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Flink offers layered APIs at different levels of of abstraction to
handle both common and specialized use cases
Flink SQL
Table API
DataStream API
ProcessFunction Apache Flink Runtime
Low-level Stream Operator API
DataStream
API
ProcessFunction
Table / SQL API
Table / SQL API
Flink SQL
High-level, declarative API that allows you to write SQL
queries to process data streams and batch data as
dynamic tables
Table API
Programmatic equivalent of Flink SQL, allowing you to
define your business logic in either Java or Python, or
combine it with SQL
DataStream API
Low-level, expressive API that exposes the building blocks
for stream processing, giving you direct access to things
like state and timers
ProcessFunction
The most low-level API, allowing for fine-grained
processing of individual elements for complex event-
driven processing logic and state management
Process real-time
data streams with
Flink SQL
Flink SQL is an ANSI-compliant SQL
engine that can define both simple and
complex queries, making it well-suited
for most stream processing use cases,
particularly building real-time data
products and pipelines.
GROUP BY color
events
results
COUNT
WHERE color <> orange
4
3
Developers choose Flink because of its performance and rich
feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Flink supports unified stream and batch processing
● Entire pipeline must always be running ● Execution proceeds in stages, running as needed
● Input must be processed as it arrives ● Input may be pre-sorted by time and key
● Results are reported as they become ready ● Results are reported at the end of the job
● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Effectively exactly-once guarantees are more
straightforward
Flink SQL operators work across both stream and batch
processing modes
STREAMING AND BATCH
BATCH ONLY
● SELECT FROM [WHERE]
● GROUP BY [HAVING]
(includes time-based windowing)
● OVER aggregations
(including Top-N and Deduplication queries)
● INNER + OUTER JOINs
● MATCH_RECOGNIZE (pattern matching)
● Set Operations
● User-Defined Functions
● Statement Sets
STREAMING ONLY
● ORDER BY time ascending only
● INNER JOIN with
○ Temporal (versioned) table
○ External lookup table
● ORDER BY anything
Enhancing Apache Flink as a
cloud-native service
Operating Flink on your own (along with the Kafka storage
layer) is difficult
Deployment
Complexity
Setting up Flink requires a
deep understanding of
resource allocation and
management
Management &
Monitoring
Picking relevant metrics can
be overwhelming for a
DevOps team just starting
with stream processing
Limited
Ecosystem
Flink lacks pre-built integrations
with observability, metadata
management, data governance,
and security tooling
Cost &
Risk
Self-supporting Flink incurs
significant costs & resources
in terms of infra footprint
and Dev & Ops FTEs
Effortlessly filter, join, and enrich your
data streams with Flink, the de facto
standard for stream processing
Enable high-performance and efficient
stream processing at any scale, without
the complexities of infrastructure
management
Experience Kafka and Flink as a
unified platform, with fully integrated
monitoring, security, and governance
Apache Flink® on Confluent
Cloud
Simple, Serverless Stream Processing
Easily build high-quality,
reusable data streams with
the industry’s only cloud-
native, serverless Flink service
Effortlessly filter, join, and enrich your data streams with Apache Flink
Real-time processing
Power low-latency applications and pipelines that react to
real-time events and provide timely insights
Data reusability
Share consistent and reusable data streams widely with
downstream applications and systems
Data enrichment
Curate, filter, and augment data on-the-fly with additional
context to improve completeness, accuracy, & compliance
Efficiency
Improve resource utilization and cost-effectiveness by
avoiding redundant processing across silos
“With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart
cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This
enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security
incidents without any added operational burden.”
Recognize patterns
and react to events in
a timely manner
Develop applications using fine-
grained control over how time
progresses and data is grouped
together using:
● Hopping, tumbling, session windows
● OVER aggregations
● Pattern matching with
MATCH_RECOGNIZE
EVENT-DRIVEN APPLICATIONS
C
price>lag(price)
D
price<lag(price)
C
price>lag(price)
B
price<lag(price)
A
Double Bottom
Period & Volume
Price
Analyze real-time data
streams to generate
important business
insights
Get up-to-date results to power
dashboards or applications requiring
continuous updates using:
● Materialized views
● Temporal analytic functions
● Interactive queries
Account Balance
A $15
B $2
C $15
Account A,
+$10
Account B,
+$12
Account C, +$5
Account B, -
$10
Account C,
+$10
Account A, -$5
Account A,
+$10
Time
REAL-TIME ANALYTICS
Build streaming data
pipelines to inform
real-time decision
making
Create new enriched and curated
streams of higher value using:
● Data transformations
● Streaming joins, temporal joins,
lookup joins, and versioned joins
● Fan out queries, multi-cluster queries
35
t1, 21.5 USD
t3, 55 EUR
t5, 35.3
EUR
t0, EUR:USD=1.00
t2, EUR:USD=1.05
t4: EUR:USD=1.10
t1, 21.5 USD
t3, 57.75 USD
t5, 38.83 USD
Currency rate
Orders
STREAMING DATA PIPELINES
Fully managed
Easily develop Flink applications with a serverless, SaaS-
based experience instantly available & without ops burden
Elastic scalability
Automatically scale up or down to meet the demands of
the most complex workloads without overprovisioning
Usage-based billing
Pay only for resources used instead of infrastructure
provisioned, with scale-to-zero pricing
Continuous, no touch updates
Build using an always up-to-date platform with
declarative, versionless APIs and interfaces
Throughput/Data Traffic Over Time
Capacity Demand
Enable high-performance and efficient stream processing at any scale
"Offloading that day-to-day burden of operations has been a huge help. A lot of overall operations-type work gets
offloaded when you move to Confluent Cloud… Where we’re saving time now is on the DevOps side of maintenance of
all those systems — patching underlying systems or upgrading(them) — those were big things to be able to offload."
Go from zero to production in minutes versus months
Minutes
Weeks
Open Source
Apache Flink
In-house development and
maintenance without support
Cloud-hosted
Flink services
Manual Day 2 operations with
basic tooling and/or support
Apache Flink on
Confluent Cloud
Fully managed, elastic,
and automated product
capabilities with zero overhead
Months
Tap into a next-generation, serverless SQL experience …
SQL client in Confluent
Cloud CLI
Different teams with different skills and needs can access
stream processing using the interface of their choice
Rich SQL editing
user interface
"When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream
processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of
infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that
differentiate the business."
Enterprise-grade security
Secure stream processing with built-in identity and access
management, RBAC, and audit logs
Stream governance
Enforce data policies and avoid metadata duplication
leveraging native integration with Stream Governance
Monitoring
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Connectors
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Monitoring Connectors
Enterprise-grade
Security
Stream
Governance
Experience Kafka and Flink seamlessly integrated as a unified platform
Automate metadata
synchronization for
effortless data
exploration
Integration with Schema Registry
enables Flink to easily access and
process data from multiple Kafka
clusters and Confluent environments in
a consistent and unified way:
● Kafka topics → Flink tables
● Confluent environments → catalogs
● Kafka clusters → databases …
…
…
Connect your entire business with just a few clicks
70+
fully
managed
connectors
Amazon S3
Amazon Redshift
Amazon DynamoDB
Google Cloud
Spanner
AWS Lambda
Amazon SQS
Amazon Kinesis
Azure Service Bus
Azure Event Hubs
Azure Synapse
Analytics
Azure Blob
Storage
Azure Functions Azure Data Lake
Google
BigTable
Explain Hands-on Workshop
We will create a Loyalty program around shoes
● We will create a promotion program for our best
customers based on given data events
○ Giving shoes for free for customers buying much
from our store
○ This is a typical business use case help to minimize
churn customer rate
● The Architecture
○ We running completely in Confluent Cloud
○ The Data is coming in real-time from our database
via Connectors (here data gen simulation)
○ We analyse the data in real-time and looking for best
buying customers and generate promotions for them
based on their buying history
■ Get one shoe pair for free, after bought 10
■ etc.
The Hands-on Architecture
1: Basic Cluster with Schema Registry
2: Source Connectors
3: Flink SQL Pool
4: Flink SQL Stream Processing
5: Notification Client
Please be aware that all Flink SQL Jobs will stop after 4 hours (we are
working without Service Accounts)
The Mapping within Flink SQL - Step 1
The Mapping within Flink SQL - Step 2
The Mapping within Flink SQL - Step 3
The Mapping within Flink SQL - Step 4
The Mapping within Flink SQL - Step 5 - Finished
In our LABs we are doing JOINS, mainly INNER JOIN
● Within the LABs we are running INNER JOINS only
● We also do lot of aggregations
○ Group by column
■ Having Count(*) OF RECORDS
● What if to use LEFT JOINS?
● Or OUTER JOINS?
Tools we use: Console Workspace, Shell, Monitoring
Cloud Console Workspace: Flink Shell: Flink Monitoring:
HINT: We do not use Service Account for our job execution
(INSERT), therefore jobs will be stopped after 4 hours
Please read more here: https://docs.confluent.io/cloud/current/flink/index.html#security
Short Summary:
● We are completely working in Confluent Cloud
● You already did setup a cluster, Schema Registry, 3 topics and 3 connectors
○ manually or - https://github.com/griga23/shoe-store/blob/main/prereq.md
○ with terraform - https://github.com/griga23/shoe-store/blob/main/terraform/README.md
● We will now continue with:
○ Lab1 and
○ lab2
The main Workshop is described here: https://github.com/griga23/shoe-store
Hint:
With terraform-complete you will deploy the finished workshop, everything is running, and the notification
client can be started as well (after setting your token for Pushover). By the way terraform-complete is
running jobs with APP-Manager Service Account, here the jobs did not stop
HANDS-ON PART 1
Github Repo:
https://github.com/griga23/shoe-store/blob/main/lab1.md
Let’s code PART 1
Operations: Autoscale, Increase without Downtime
● Autoscale within CFUs
● Increase CFUs without downtime
● Delete Pool(s)
Our developed Data pipeline and products
Stream Lineage Data Portal
Flink SQL (Feb. 2024) Limitation
● Only running in AWS/Azure/GCP specific regions.
● Supported SQL statements:
○ CREATE TABLE (without the AS, PARTITION BY, and LIKE keywords)
○ ALTER TABLE (only for ADD/MODIFY WATERMARK; ADD COLUMN, DROP COLUMN, and other alterations aren’t supported)
○ DESCRIBE
○ DESCRIBE EXTENDED
○ INSERT INTO (persistent queries)
○ EXECUTE STATEMENT SET
○ SELECT
○ SHOW CATALOG / DATABASE / TABLE
○ SET
○ USE / USE CATALOG
○ SHOW CREATE TABLE
● Joins
○ Regular Joins
○ Interval Joins
○ Temporal Table Join between a non-compacted and compacted Apache Kafka® topic
○ Star Schema Denormalization (N-Way Join), as long as temporary tables are not used
○ Lateral Table Join, as long as temporary views are not used
● Unsupported features: No UDF, DROP Table and more…
● Unsupported Statements: Add JAR, etc.
Please see the complete list here.
HANDS-ON PART2
Let’s continue coding PART 2
Github Repo:
https://github.com/griga23/shoe-store/blob/main/lab2.md
Don’t forget to delete everything in Confluent Cloud
Recap
100% elasticity during the workshop, CFUs are growing based
on workload
O CFU
max CFU
Full elasticity based on workloads and usage based billing, if the service is not used no costs.
Flink SQL is Multi-tenant and is able for elastic scaling
● We run Flink SQL in HA
○ All components like Job
Manager and Task Manager
are redundant including
storage runtime infra
○ State Checkpoints are
written to storageDir
● The Adaptive Scheduler can
adjust the parallelism of a job
based on available slots. It will
automatically reduce the
parallelism if not enough slots
are available to run the job
with the originally configured
parallelism
See docu: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/
Confluent Cloud Flink at Open Preview
66
Serverless Flink SQL
Rich Experience
Complete and Secure
● ANSI-SQL with powerful streaming operators
● Rich CLI Experience
● SQL Editor with "workspaces" in CC UI
● Flink Shell
● Full terraform support
● Integration with Schema Registry and
Governance
● Support for user-authentication and Service
account
De-duplicate topic by key
Continuously copies a topic, only
emitting messages with unique
keys, see sample
Query this topic
Navigates to Flink SQL editor, pre-
populated with e.g.:
SELECT * FROM my_table LIMIT 10;
Join this topic with…
Joins one topic with another based
on join fields specified
Filter this topic
Filter a topic based on simple criteria,
ultimately generating a WHERE
clause.
Copy this topic
Specify a set of fields to copy,
emitting copied messages to a new
topic
Apply a transformation
Joins one topic with another based
on join fields specified
Flink for Topic Actions
68
Advanced SQL Streaming Operators
Time windows Pattern Matching Streaming Joins
● Time-based windows
● Event-density windows
● Event-based windows: every
single event can trigger a new
window
● Complex Event
Processing
● See sample
● Stream-to-stream joins
● Temporal joins
● Lookup joins
● Versioned joins
etc.
Be fully integrated into Confluent Cloud
Fully integrated out of the box
● Connected via Confluent
Connector
● Environments are Catalogs
● Kafka Clusters as Databases
● Topics are Tables
● RBAC for managing flink
Resources
○ Keep in mind: A statement’s
access level is determined
entirely by the permissions
that you attach to the
statement
● Schema Registry, Data Portal,
Lineage, Consumer/Producer
Monitoring, Metric API…
● Cluster and Pool need to be in
the same region and same CSP
● All over the Confluent
Organisation including all
environments and clusters
Flink SQL
And finally we did very easily
Implement a promotion and Loyalty use case
What’s Next
Our goals for Apache Flink on Confluent Cloud
Cloud-Native Complete Everywhere
Deployment flexibility
Integrated platform
Leverage Flink fully integrated
with Confluent’s complete feature
set, enabling developers to build
stream processing applications
quickly, reliably, and securely
+
Serverless experience
Eliminate the operational burden of
managing Flink with a fully
managed, cloud-native service that
is simple, secure, and scalable
Seamlessly process your data
everywhere it resides with a Flink
service that spans across the three
major cloud providers
Flink at GA
73
Production Ready
Autoscale
Everywhere
● 99.99% SLA
● Terraform support
● Powerful Autoscale
● Scale to zero (aka auto-pause)
● Available in AWS, Azure, GCP
● AVRO, JSON, Protobuf schemas
● Topic Actions
Apps
UDFs (Java, Python)
Programmatic Flink
APIs in addition to
SQL
(Java, Python)
Security
Private Networking
(AWS, Azure, GCP)
BYOK
Fast follow with additional features
Performance
Batch Execution
Materialized views
Data Serving
Intelligence
OpenAI integration
Flink ML
Enrich real-time data streams with Generative AI directly from
Flink SQL
INSERT INTO enriched_reviews
SELECT id
, review
,
invoke_openai(prompt,review
) as score
FROM product_reviews
;
K
N
Kate
4 hours ago
This was the worst decision ever.
Nikola
1 day ago
Not bad. Could have been cheaper.
K
N
B
Kate
★★★★★ 4 hours ago
This was the worst decision ever.
Nikola
★★★★★ 1 day ago
Not bad. Could have been cheaper.
Brian
★★★★★ 3 days ago
Amazing! Game Changer!
The Prompt
“Score the following text on a scale of 1
and 5 where 1 is negative and 5 is
positive returning only the number”
DATA STREAMING PLATFORM
B
Brian
3 days ago
Amazing! Game Changer!
COMING SOON
Thank you!

More Related Content

What's hot

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)kakugawa
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
 
代表的な型のグラフを読む
代表的な型のグラフを読む代表的な型のグラフを読む
代表的な型のグラフを読むNaoya Urata
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Gain 3 Benefits with Delta Sharing
Gain 3 Benefits with Delta SharingGain 3 Benefits with Delta Sharing
Gain 3 Benefits with Delta SharingDatabricks
 
Unsupervised Learning - Teaching AI to Understand Our World
Unsupervised Learning - Teaching AI to Understand Our WorldUnsupervised Learning - Teaching AI to Understand Our World
Unsupervised Learning - Teaching AI to Understand Our WorldSakha Global
 

What's hot (20)

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Orange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization ToolOrange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization Tool
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
代表的な型のグラフを読む
代表的な型のグラフを読む代表的な型のグラフを読む
代表的な型のグラフを読む
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Gain 3 Benefits with Delta Sharing
Gain 3 Benefits with Delta SharingGain 3 Benefits with Delta Sharing
Gain 3 Benefits with Delta Sharing
 
Unsupervised Learning - Teaching AI to Understand Our World
Unsupervised Learning - Teaching AI to Understand Our WorldUnsupervised Learning - Teaching AI to Understand Our World
Unsupervised Learning - Teaching AI to Understand Our World
 

Similar to Workshop híbrido: Stream Processing con Flink

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
FluentD for end to end monitoring
FluentD for end to end monitoringFluentD for end to end monitoring
FluentD for end to end monitoringPhil Wilkins
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...confluent
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaFlorent Ramiere
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019Juan Fabian
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieVMware Tanzu
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
 
"Wie passen Serverless & Autonomous zusammen?"
"Wie passen Serverless & Autonomous zusammen?""Wie passen Serverless & Autonomous zusammen?"
"Wie passen Serverless & Autonomous zusammen?"Volker Linz
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
 

Similar to Workshop híbrido: Stream Processing con Flink (20)

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
FluentD for end to end monitoring
FluentD for end to end monitoringFluentD for end to end monitoring
FluentD for end to end monitoring
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
"Wie passen Serverless & Autonomous zusammen?"
"Wie passen Serverless & Autonomous zusammen?""Wie passen Serverless & Autonomous zusammen?"
"Wie passen Serverless & Autonomous zusammen?"
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
 

More from confluent

Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performanceconfluent
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloudconfluent
 

More from confluent (20)

Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 

Recently uploaded

Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 

Recently uploaded (20)

Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 

Workshop híbrido: Stream Processing con Flink

  • 1. Welcome to the Flink SQL Hands-on Workshop by
  • 2. Workshop: Stream Processing made easy with Flink 27th February 2024, Madrid
  • 3. Today's speakers and moderators Juan Soto Senior Customer Success Technical Architect Spain Rui Fernandes Senior Customer Success Technical Architect Spain Tomas Dias Almeida Customer Success Technical Architect Spain Salvo Alessandro Enterprise Solutions Engineer Spain Angelica Tacca Solutions Engineer Spain
  • 4. Remember? Prerequisites? We need a Confluent Cloud cluster on AWS running ● in an environment with Schema Registry enabled, where ● 3 topics exist and ● events are generated by our Datagen Source connector See here: https://github.com/griga23/shoe-store/blob/main/prereq.md … and do not forget to clean up Confluent Cloud resources like cluster, connectors, Flink pool etc. after the workshop (!)
  • 5. 09:00 09:30 10:30 12:00 12:30 13:30 14:00 Registracion y networking Introducion: Qué son los análisis en tiempo real y los análisis de procesamiento, y cuando se utilizan? Stream Processing usando Confluent Hands-on: Intro to Flink SQL Pausa para el cafe’ Hands-on: Implementación de casos de usos con Flink SQL Recap, Roadmap, Q&A. Almuerzo y networking Agenda — Workshop 5
  • 6. Intro: What is Real-Time Analytics and Stream Processing with Confluent
  • 7. Stream processing is a critical part of data streaming Enable frictionless access to up-to-date trustworthy data products Share Reimagine data streaming everywhere, on-prem and in every major public cloud Stream Make data in motion self-service, secure, compliant and trustworthy Govern Drive greater data reuse with always- on stream processing Process Make it easy to on- ramp and off-ramp data from existing systems and apps Connect
  • 8. Stream processing acts as the compute layer to Kafka, powering real-time applications & pipelines DATA IN MOTION Streaming Applications Apache Flink Apache Kafka DATA AT REST Application Layer Processing Layer Storage Layer Traditional Databases File Systems Web Applications
  • 9. Processing Kafka Custom apps 3rd party apps Databases Database Data Warehouse SaaS app Queries Analytics Interactions Processing Processing Processing down stream of Kafka increases latency, adds costs and redundancy, and inhibits data reuse Increased complexity from redundant processing Data systems & applications built on stale data Expensive & inefficient to clean and enrich data multiple times
  • 10. Processing data at ingest improves latency, data portability, and cost effectiveness Custom apps 3rd party apps Databases Database Data Warehouse SaaS app Queries Analytics Interactions Kafka Storage Flink Compute Stream Processing Process your data once, process your data right Maximized data reusability & consistency Improved cost-efficiency from cleaning & enriching data once Real-time apps & data systems reflect current state
  • 11. Stream processing enables users to filter, join, and enrich streams on-the-fly to drive greater data reuse Heatmap service Payment service Supply chain systems Watch lists Profile mgmt Incident mgmt Customer profile data ITSM systems Central log systems Fraud & SIEM systems Alerting systems AI/ML engines Visualization apps Threat vector Transactions Payments Mainframe data Inventory Weather Telemetry IoT data Notification engine Payroll systems CRM systems Mobile application Personalization Web application Clickstreams Customer loyalty Change logs Customer data Recommendation engine
  • 12. Why Apache Flink is becoming the de facto standard
  • 13. Flink growth has mirrored the growth of Kafka, the de facto standard for streaming data >75% of the Fortune 500 estimated to be using Kafka >100,000+ orgs using Kafka >41,000 Kafka meetup attendees >750 Kafka Improvement Proposals >12,000 Jiras for Apache Kafka 0 50,000 100,000 150,000 2020 2021 2022 2016 2017 2018 Flink Kafka Two Apache Projects, Born a Few Years Apart Monthly Unique Users
  • 14. Innovative companies have adopted both Kafka & Flink
  • 15. Digital natives leverage Flink to disrupt markets and gain competitive advantage UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
  • 16. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 17. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 18. Flink’s powerful runtime offers limitless scalability Job Manager Client . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot Data Streams Deploy, Stop, Cancel Tasks Trigger Checkpoints Submit Job Results Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster
  • 19. Leverage in-memory performance . . . Durable Storage Logic State Logic State Logic State Input Tasks Output In-Memory or On-Disk State Local State Access Periodic, Asynchronous, Incremental Snapshots Stateful Flink applications are optimized for fast access to local state by maintaining task state in memory or on-disk data structures, resulting in low latency processing.
  • 20. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 21. Flink checkpoints and savepoints enable fault tolerance and stateful processing CHECKPOINTS SAVEPOINTS Automatic snapshot created by Flink periodically ● Used to recover from failures ● Optimized for quick recovery ● Automatically created and managed by Flink User-triggered snapshot at a specific point in time ● Enables manual operational tasks, such as upgrades ● Optimized for operational flexibility ● Created and managed by the user
  • 22. Flink recovers from failures in a timely and efficient manner Job Manager Client . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot Data Streams Deploy, Stop, Cancel Tasks Trigger Checkpoints Submit Job Results If a task managers fails, the job manager will detect the failure and arrange for the job to be restarted from the most recent state snapshot
  • 23. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 24. Flink offers layered APIs at different levels of of abstraction to handle both common and specialized use cases Flink SQL Table API DataStream API ProcessFunction Apache Flink Runtime Low-level Stream Operator API DataStream API ProcessFunction Table / SQL API Table / SQL API Flink SQL High-level, declarative API that allows you to write SQL queries to process data streams and batch data as dynamic tables Table API Programmatic equivalent of Flink SQL, allowing you to define your business logic in either Java or Python, or combine it with SQL DataStream API Low-level, expressive API that exposes the building blocks for stream processing, giving you direct access to things like state and timers ProcessFunction The most low-level API, allowing for fine-grained processing of individual elements for complex event- driven processing logic and state management
  • 25. Process real-time data streams with Flink SQL Flink SQL is an ANSI-compliant SQL engine that can define both simple and complex queries, making it well-suited for most stream processing use cases, particularly building real-time data products and pipelines. GROUP BY color events results COUNT WHERE color <> orange 4 3
  • 26. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 27. Flink supports unified stream and batch processing ● Entire pipeline must always be running ● Execution proceeds in stages, running as needed ● Input must be processed as it arrives ● Input may be pre-sorted by time and key ● Results are reported as they become ready ● Results are reported at the end of the job ● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart ● Flink guarantees effectively exactly-once results despite out-of-order data and restarts due to failures, etc. ● Effectively exactly-once guarantees are more straightforward
  • 28. Flink SQL operators work across both stream and batch processing modes STREAMING AND BATCH BATCH ONLY ● SELECT FROM [WHERE] ● GROUP BY [HAVING] (includes time-based windowing) ● OVER aggregations (including Top-N and Deduplication queries) ● INNER + OUTER JOINs ● MATCH_RECOGNIZE (pattern matching) ● Set Operations ● User-Defined Functions ● Statement Sets STREAMING ONLY ● ORDER BY time ascending only ● INNER JOIN with ○ Temporal (versioned) table ○ External lookup table ● ORDER BY anything
  • 29. Enhancing Apache Flink as a cloud-native service
  • 30. Operating Flink on your own (along with the Kafka storage layer) is difficult Deployment Complexity Setting up Flink requires a deep understanding of resource allocation and management Management & Monitoring Picking relevant metrics can be overwhelming for a DevOps team just starting with stream processing Limited Ecosystem Flink lacks pre-built integrations with observability, metadata management, data governance, and security tooling Cost & Risk Self-supporting Flink incurs significant costs & resources in terms of infra footprint and Dev & Ops FTEs
  • 31. Effortlessly filter, join, and enrich your data streams with Flink, the de facto standard for stream processing Enable high-performance and efficient stream processing at any scale, without the complexities of infrastructure management Experience Kafka and Flink as a unified platform, with fully integrated monitoring, security, and governance Apache Flink® on Confluent Cloud Simple, Serverless Stream Processing Easily build high-quality, reusable data streams with the industry’s only cloud- native, serverless Flink service
  • 32. Effortlessly filter, join, and enrich your data streams with Apache Flink Real-time processing Power low-latency applications and pipelines that react to real-time events and provide timely insights Data reusability Share consistent and reusable data streams widely with downstream applications and systems Data enrichment Curate, filter, and augment data on-the-fly with additional context to improve completeness, accuracy, & compliance Efficiency Improve resource utilization and cost-effectiveness by avoiding redundant processing across silos “With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security incidents without any added operational burden.”
  • 33. Recognize patterns and react to events in a timely manner Develop applications using fine- grained control over how time progresses and data is grouped together using: ● Hopping, tumbling, session windows ● OVER aggregations ● Pattern matching with MATCH_RECOGNIZE EVENT-DRIVEN APPLICATIONS C price>lag(price) D price<lag(price) C price>lag(price) B price<lag(price) A Double Bottom Period & Volume Price
  • 34. Analyze real-time data streams to generate important business insights Get up-to-date results to power dashboards or applications requiring continuous updates using: ● Materialized views ● Temporal analytic functions ● Interactive queries Account Balance A $15 B $2 C $15 Account A, +$10 Account B, +$12 Account C, +$5 Account B, - $10 Account C, +$10 Account A, -$5 Account A, +$10 Time REAL-TIME ANALYTICS
  • 35. Build streaming data pipelines to inform real-time decision making Create new enriched and curated streams of higher value using: ● Data transformations ● Streaming joins, temporal joins, lookup joins, and versioned joins ● Fan out queries, multi-cluster queries 35 t1, 21.5 USD t3, 55 EUR t5, 35.3 EUR t0, EUR:USD=1.00 t2, EUR:USD=1.05 t4: EUR:USD=1.10 t1, 21.5 USD t3, 57.75 USD t5, 38.83 USD Currency rate Orders STREAMING DATA PIPELINES
  • 36. Fully managed Easily develop Flink applications with a serverless, SaaS- based experience instantly available & without ops burden Elastic scalability Automatically scale up or down to meet the demands of the most complex workloads without overprovisioning Usage-based billing Pay only for resources used instead of infrastructure provisioned, with scale-to-zero pricing Continuous, no touch updates Build using an always up-to-date platform with declarative, versionless APIs and interfaces Throughput/Data Traffic Over Time Capacity Demand Enable high-performance and efficient stream processing at any scale "Offloading that day-to-day burden of operations has been a huge help. A lot of overall operations-type work gets offloaded when you move to Confluent Cloud… Where we’re saving time now is on the DevOps side of maintenance of all those systems — patching underlying systems or upgrading(them) — those were big things to be able to offload."
  • 37. Go from zero to production in minutes versus months Minutes Weeks Open Source Apache Flink In-house development and maintenance without support Cloud-hosted Flink services Manual Day 2 operations with basic tooling and/or support Apache Flink on Confluent Cloud Fully managed, elastic, and automated product capabilities with zero overhead Months
  • 38. Tap into a next-generation, serverless SQL experience … SQL client in Confluent Cloud CLI Different teams with different skills and needs can access stream processing using the interface of their choice Rich SQL editing user interface
  • 39. "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that differentiate the business." Enterprise-grade security Secure stream processing with built-in identity and access management, RBAC, and audit logs Stream governance Enforce data policies and avoid metadata duplication leveraging native integration with Stream Governance Monitoring Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Connectors Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Monitoring Connectors Enterprise-grade Security Stream Governance Experience Kafka and Flink seamlessly integrated as a unified platform
  • 40. Automate metadata synchronization for effortless data exploration Integration with Schema Registry enables Flink to easily access and process data from multiple Kafka clusters and Confluent environments in a consistent and unified way: ● Kafka topics → Flink tables ● Confluent environments → catalogs ● Kafka clusters → databases … … …
  • 41. Connect your entire business with just a few clicks 70+ fully managed connectors Amazon S3 Amazon Redshift Amazon DynamoDB Google Cloud Spanner AWS Lambda Amazon SQS Amazon Kinesis Azure Service Bus Azure Event Hubs Azure Synapse Analytics Azure Blob Storage Azure Functions Azure Data Lake Google BigTable
  • 43. We will create a Loyalty program around shoes ● We will create a promotion program for our best customers based on given data events ○ Giving shoes for free for customers buying much from our store ○ This is a typical business use case help to minimize churn customer rate ● The Architecture ○ We running completely in Confluent Cloud ○ The Data is coming in real-time from our database via Connectors (here data gen simulation) ○ We analyse the data in real-time and looking for best buying customers and generate promotions for them based on their buying history ■ Get one shoe pair for free, after bought 10 ■ etc.
  • 44. The Hands-on Architecture 1: Basic Cluster with Schema Registry 2: Source Connectors 3: Flink SQL Pool 4: Flink SQL Stream Processing 5: Notification Client Please be aware that all Flink SQL Jobs will stop after 4 hours (we are working without Service Accounts)
  • 45. The Mapping within Flink SQL - Step 1
  • 46. The Mapping within Flink SQL - Step 2
  • 47. The Mapping within Flink SQL - Step 3
  • 48. The Mapping within Flink SQL - Step 4
  • 49. The Mapping within Flink SQL - Step 5 - Finished
  • 50. In our LABs we are doing JOINS, mainly INNER JOIN ● Within the LABs we are running INNER JOINS only ● We also do lot of aggregations ○ Group by column ■ Having Count(*) OF RECORDS ● What if to use LEFT JOINS? ● Or OUTER JOINS?
  • 51. Tools we use: Console Workspace, Shell, Monitoring Cloud Console Workspace: Flink Shell: Flink Monitoring:
  • 52. HINT: We do not use Service Account for our job execution (INSERT), therefore jobs will be stopped after 4 hours Please read more here: https://docs.confluent.io/cloud/current/flink/index.html#security
  • 53. Short Summary: ● We are completely working in Confluent Cloud ● You already did setup a cluster, Schema Registry, 3 topics and 3 connectors ○ manually or - https://github.com/griga23/shoe-store/blob/main/prereq.md ○ with terraform - https://github.com/griga23/shoe-store/blob/main/terraform/README.md ● We will now continue with: ○ Lab1 and ○ lab2 The main Workshop is described here: https://github.com/griga23/shoe-store Hint: With terraform-complete you will deploy the finished workshop, everything is running, and the notification client can be started as well (after setting your token for Pushover). By the way terraform-complete is running jobs with APP-Manager Service Account, here the jobs did not stop
  • 56. Operations: Autoscale, Increase without Downtime ● Autoscale within CFUs ● Increase CFUs without downtime ● Delete Pool(s)
  • 57. Our developed Data pipeline and products Stream Lineage Data Portal
  • 58. Flink SQL (Feb. 2024) Limitation ● Only running in AWS/Azure/GCP specific regions. ● Supported SQL statements: ○ CREATE TABLE (without the AS, PARTITION BY, and LIKE keywords) ○ ALTER TABLE (only for ADD/MODIFY WATERMARK; ADD COLUMN, DROP COLUMN, and other alterations aren’t supported) ○ DESCRIBE ○ DESCRIBE EXTENDED ○ INSERT INTO (persistent queries) ○ EXECUTE STATEMENT SET ○ SELECT ○ SHOW CATALOG / DATABASE / TABLE ○ SET ○ USE / USE CATALOG ○ SHOW CREATE TABLE ● Joins ○ Regular Joins ○ Interval Joins ○ Temporal Table Join between a non-compacted and compacted Apache Kafka® topic ○ Star Schema Denormalization (N-Way Join), as long as temporary tables are not used ○ Lateral Table Join, as long as temporary views are not used ● Unsupported features: No UDF, DROP Table and more… ● Unsupported Statements: Add JAR, etc. Please see the complete list here.
  • 59.
  • 61. Let’s continue coding PART 2 Github Repo: https://github.com/griga23/shoe-store/blob/main/lab2.md
  • 62. Don’t forget to delete everything in Confluent Cloud
  • 63. Recap
  • 64. 100% elasticity during the workshop, CFUs are growing based on workload O CFU max CFU Full elasticity based on workloads and usage based billing, if the service is not used no costs.
  • 65. Flink SQL is Multi-tenant and is able for elastic scaling ● We run Flink SQL in HA ○ All components like Job Manager and Task Manager are redundant including storage runtime infra ○ State Checkpoints are written to storageDir ● The Adaptive Scheduler can adjust the parallelism of a job based on available slots. It will automatically reduce the parallelism if not enough slots are available to run the job with the originally configured parallelism See docu: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/
  • 66. Confluent Cloud Flink at Open Preview 66 Serverless Flink SQL Rich Experience Complete and Secure ● ANSI-SQL with powerful streaming operators ● Rich CLI Experience ● SQL Editor with "workspaces" in CC UI ● Flink Shell ● Full terraform support ● Integration with Schema Registry and Governance ● Support for user-authentication and Service account
  • 67. De-duplicate topic by key Continuously copies a topic, only emitting messages with unique keys, see sample Query this topic Navigates to Flink SQL editor, pre- populated with e.g.: SELECT * FROM my_table LIMIT 10; Join this topic with… Joins one topic with another based on join fields specified Filter this topic Filter a topic based on simple criteria, ultimately generating a WHERE clause. Copy this topic Specify a set of fields to copy, emitting copied messages to a new topic Apply a transformation Joins one topic with another based on join fields specified Flink for Topic Actions
  • 68. 68 Advanced SQL Streaming Operators Time windows Pattern Matching Streaming Joins ● Time-based windows ● Event-density windows ● Event-based windows: every single event can trigger a new window ● Complex Event Processing ● See sample ● Stream-to-stream joins ● Temporal joins ● Lookup joins ● Versioned joins etc.
  • 69. Be fully integrated into Confluent Cloud Fully integrated out of the box ● Connected via Confluent Connector ● Environments are Catalogs ● Kafka Clusters as Databases ● Topics are Tables ● RBAC for managing flink Resources ○ Keep in mind: A statement’s access level is determined entirely by the permissions that you attach to the statement ● Schema Registry, Data Portal, Lineage, Consumer/Producer Monitoring, Metric API… ● Cluster and Pool need to be in the same region and same CSP ● All over the Confluent Organisation including all environments and clusters Flink SQL
  • 70. And finally we did very easily Implement a promotion and Loyalty use case
  • 72. Our goals for Apache Flink on Confluent Cloud Cloud-Native Complete Everywhere Deployment flexibility Integrated platform Leverage Flink fully integrated with Confluent’s complete feature set, enabling developers to build stream processing applications quickly, reliably, and securely + Serverless experience Eliminate the operational burden of managing Flink with a fully managed, cloud-native service that is simple, secure, and scalable Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud providers
  • 73. Flink at GA 73 Production Ready Autoscale Everywhere ● 99.99% SLA ● Terraform support ● Powerful Autoscale ● Scale to zero (aka auto-pause) ● Available in AWS, Azure, GCP ● AVRO, JSON, Protobuf schemas ● Topic Actions
  • 74. Apps UDFs (Java, Python) Programmatic Flink APIs in addition to SQL (Java, Python) Security Private Networking (AWS, Azure, GCP) BYOK Fast follow with additional features Performance Batch Execution Materialized views Data Serving Intelligence OpenAI integration Flink ML
  • 75. Enrich real-time data streams with Generative AI directly from Flink SQL INSERT INTO enriched_reviews SELECT id , review , invoke_openai(prompt,review ) as score FROM product_reviews ; K N Kate 4 hours ago This was the worst decision ever. Nikola 1 day ago Not bad. Could have been cheaper. K N B Kate ★★★★★ 4 hours ago This was the worst decision ever. Nikola ★★★★★ 1 day ago Not bad. Could have been cheaper. Brian ★★★★★ 3 days ago Amazing! Game Changer! The Prompt “Score the following text on a scale of 1 and 5 where 1 is negative and 5 is positive returning only the number” DATA STREAMING PLATFORM B Brian 3 days ago Amazing! Game Changer! COMING SOON