SlideShare a Scribd company logo
1 of 60
Download to read offline
Cross the Streams!
Creating Data Pipelines with Apache Flink + Pulsar
Caito Scherr – Developer Advocate – Ververica
Agenda
00 Who am I?
01 Intro to Flink SQL
02 Flink SQL Demo
03 Flink + Pulsar
@CAITO_200_OK
Who am I?
00 Caito Scherr
01 Apache Flink
02 DevRel @ Ververica
03 Portland, Oregon
@CAITO_200_OK
Who am I?
00 Caito Scherr
01 Apache Flink
02 DevRel @ Ververica
03 Portland, Oregon
@CAITO_200_OK
Who am I?
00 Caito Scherr
01 Apache Flink
02 DevRel @ Ververica
03 Portland, Oregon
@CAITO_200_OK
Who am I?
00 Caito Scherr
01 Apache Flink
02 DevRel @ Ververica
03 Portland, Oregon
@CAITO_200_OK
Intro Flink SQL
Stream Processing
@CAITO_200_OK
Stream Processing > The Challenges
@CAITO_200_OK
● You can’t pause to fix it
● Lots of data, FAST
● Ingesting multiple formats
● Failure recovery
● Needs to scale
Flink > Addressing Stream Processing’s Challenges
@CAITO_200_OK
Flink > Addressing Stream Processing’s Challenges
@CAITO_200_OK
Flink > Basics
@CAITO_200_OK
12
Building Blocks (events, state, (event) time)
DataStream API (streams, windows)
Table API (dynamic tables)
Flink SQL
PyFlink
Ease of Use
Expressiven
ess
Streaming Analytics
& ML
Stateful Stream
Processing
Flink > Summary
@CAITO_200_OK
Flexible APIs
● Ease of use/Expressiveness
● Wide Range of Use Cases
High Performance
● Local State Access
● High Throughput/Low Latency
Stateful Processing
● State = First-class Citizen
● Event-time Support
Fault Tolerance
● Distributed State Snapshots
● Exactly-once Guarantees
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL
@CAITO_200_OK
● Stream processing: real-time processing
● Stream processing is complex
● Flink is highly performant streaming
● Flink solves many problems in streaming
● Flink is complex
● Flink SQL: access to Flink’s benefits
● Abstracts away the complexity
Flink SQL Demo
@CAITO_200_OK
● Making the complex simple
● You could start a data pipeline anywhere!
● Language agnostic
From: Free Guy movie
Flink SQL Demo > Regular SQL
@CAITO_200_OK
user cnt
Mary 2
Bob 1
SELECT user_id,
COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
Take a snapshot when the
query starts
A final result is
produced
A row that was added after the query
was started is not considered
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
The query
terminates
Image: Marta Paes @morsapaes
Flink SQL Demo > Flink SQL
@CAITO_200_OK
user cTime url
user cnt
SELECT user_id,
COUNT(url) AS cnt
FROM clicks
GROUP BY user_id;
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1
Mary 2
Ingest all changes as
they happen
Continuously update the
result
The result is identical to the one-time query (at this point)
Image: Marta Paes @morsapaes
Flink SQL Demo
@CAITO_200_OK
Flink SQL Demo
@CAITO_200_OK
● Check Java version
● Download Flink Snapshot
● Un-tar it
26
What Next? >> Flink SQL Cookbook
27
28
29
30
31
Flink SQL Demo
@CAITO_200_OK
● Flink SQL + DataGen
● Same startup steps
● True stream processing example
© 2020 Ververica
© 2020 Ververica
Pulsar + Flink
Flink + Pulsar
@CAITO_200_OK
“Stream as a unified view
on data”
“Batch as a special case
of streaming”
Flink + Pulsar
@CAITO_200_OK
● Pub/Sub messaging layer (streaming)
● Durable storage layer (batch)
Flink + Pulsar > Unified Processing with Flink
@CAITO_200_OK
● Mix historic & real-time
● Reuse code & logic
● Simplify operations
now
bounded query
unbounded query
past future
bounded query
start of the stream
unbounded query
Flink + Pulsar > Unified data stack
@CAITO_200_OK
Unified Processing
Engine
(Batch / Streaming)
Unified Storage
(Segments / Pub/Sub)
Demo > Twier Firehose
@CAITO_200_OK
Demo: Marta Paes @morsapaes
Demo > Twier Firehose
@CAITO_200_OK
Demo: Marta Paes @morsapaes
Demo > Twier Firehose
@CAITO_200_OK
CREATE CATALOG pulsar WITH (
'type' = 'pulsar',
'service-url' = 'pulsar://pulsar:6650',
'admin-url' = 'http://pulsar:8080',
'format' = 'json'
);
Catalog DDL
Demo: Marta Paes @morsapaes
Demo > Twier Firehose
@CAITO_200_OK
Not cool. 👹
Demo: Marta Paes @morsapaes
Demo > Get Relevant Timestamps
@CAITO_200_OK
CREATE TABLE pulsar_tweets (
publishTime TIMESTAMP(3) METADATA,
WATERMARK FOR publishTime AS publishTime - INTERVAL '5'
SECOND
) WITH (
'connector' = 'pulsar',
'topic' = 'persistent://public/default/tweets',
'value.format' = 'json',
'service-url' = 'pulsar://pulsar:6650',
'admin-url' = 'http://pulsar:8080',
'scan.startup.mode' = 'earliest-offset'
)
LIKE tweets;
Derive schema from the original topic
Define the source connector (Pulsar)
Read and use Pulsar message metadata
Demo: Marta Paes @morsapaes
Demo > Windowed Aggregation
@CAITO_200_OK
CREATE TABLE pulsar_tweets_agg (
tmstmp TIMESTAMP(3),
tweet_cnt BIGINT
) WITH (
'connector'='pulsar',
'topic'='persistent://public/default/tweets_agg',
'value.format'='json',
'service-url'='pulsar://pulsar:6650',
'admin-url'='http://pulsar:8080'
);
Sink Table DDL
INSERT INTO pulsar_tweets_agg
SELECT TUMBLE_START(publishTime, INTERVAL '10'
SECOND) AS wStart,
COUNT(id) AS tweet_cnt
FROM pulsar_tweets
GROUP BY TUMBLE(publishTime, INTERVAL '10'
SECOND);
Continuous SQL Query
Demo: Marta Paes @morsapaes
Demo > Tweet Count in Windows
@CAITO_200_OK
Demo: Marta Paes @morsapaes
What’s Next?
What Next
@CAITO_200_OK
What Next
@CAITO_200_OK
What Next
@CAITO_200_OK
How to Get Involved
@CAITO_200_OK
● Geing involved page: one source for Flink
community resources
● hps://flink.apache.org/community.html
Contribute
@CAITO_200_OK
● Github
● Issue Tracker
● Becoming a Commier
@CAITO_200_OK
New Slack Space!
@CAITO_200_OK
● Go-to space for user troubleshooting
● 800 members in less than 2 months
● Members include most of the Flink commiers
+ PMC members
New Slack Space!
@CAITO_200_OK
Hangout With Us
@CAITO_200_OK
● Regional meetups
● Virtual and in person options
● hps://www.meetup.com/topics/apache-flink/
Stay Connected
@CAITO_200_OK
● Twier
● Website
● Blog - Flink
● Blog - Ververica
● Youtube
Thank you
info@ververica.com
www.ververica.com
@VervericaData
Questions?
● caito@ververica.com
● @CAITO_200_OK
info@ververica.com
www.ververica.com
@VervericaData
Resources
● Flink Ahead: What Comes After Batch & Streaming: https://youtu.be/h5OYmy9Yx7Y
● Apache Pulsar as one Storage System for Real Time & Historical Data Analysis:
https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017
● Flink Table API & SQL:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#ope
rations
● Flink SQL Cookbook: https://github.com/ververica/flink-sql-cookbook
● When Flink & Pulsar Come Together:
https://flink.apache.org/2019/05/03/pulsar-flink.html
● How to Query Pulsar Streams in Flink:
https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.ht
ml
● What’s New in the Flink/Pulsar Connector:
● https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html
● Marta’s Demo: https://github.com/morsapaes/flink-sql-pulsar
60
@Caito_200_OK

More Related Content

Similar to Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apache Pulsar - Pulsar Summit SF 2022

Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
SQL Server PowerShell - Community Tools
SQL Server PowerShell - Community ToolsSQL Server PowerShell - Community Tools
SQL Server PowerShell - Community ToolsLars Platzdasch
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3takezoe
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
 
How to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushHow to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushMu Chun Wang
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastUwe Korn
 
PowerUpSQL - 2018 Blackhat USA Arsenal Presentation
PowerUpSQL - 2018 Blackhat USA Arsenal PresentationPowerUpSQL - 2018 Blackhat USA Arsenal Presentation
PowerUpSQL - 2018 Blackhat USA Arsenal PresentationScott Sutherland
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
AWS CDK introduction
AWS CDK introductionAWS CDK introduction
AWS CDK introductionleo lapworth
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecFIRAT GULEC
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeAman Kohli
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...Natan Silnitsky
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebAll Things Open
 

Similar to Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apache Pulsar - Pulsar Summit SF 2022 (20)

Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
SQL Server PowerShell - Community Tools
SQL Server PowerShell - Community ToolsSQL Server PowerShell - Community Tools
SQL Server PowerShell - Community Tools
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
 
How to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushHow to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & Push
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
 
SQL Server On SANs
SQL Server On SANsSQL Server On SANs
SQL Server On SANs
 
PowerUpSQL - 2018 Blackhat USA Arsenal Presentation
PowerUpSQL - 2018 Blackhat USA Arsenal PresentationPowerUpSQL - 2018 Blackhat USA Arsenal Presentation
PowerUpSQL - 2018 Blackhat USA Arsenal Presentation
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
AWS CDK introduction
AWS CDK introductionAWS CDK introduction
AWS CDK introduction
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Rmoug ashmaster
Rmoug ashmasterRmoug ashmaster
Rmoug ashmaster
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 

Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apache Pulsar - Pulsar Summit SF 2022

  • 1. Cross the Streams! Creating Data Pipelines with Apache Flink + Pulsar Caito Scherr – Developer Advocate – Ververica
  • 2. Agenda 00 Who am I? 01 Intro to Flink SQL 02 Flink SQL Demo 03 Flink + Pulsar @CAITO_200_OK
  • 3. Who am I? 00 Caito Scherr 01 Apache Flink 02 DevRel @ Ververica 03 Portland, Oregon @CAITO_200_OK
  • 4. Who am I? 00 Caito Scherr 01 Apache Flink 02 DevRel @ Ververica 03 Portland, Oregon @CAITO_200_OK
  • 5. Who am I? 00 Caito Scherr 01 Apache Flink 02 DevRel @ Ververica 03 Portland, Oregon @CAITO_200_OK
  • 6. Who am I? 00 Caito Scherr 01 Apache Flink 02 DevRel @ Ververica 03 Portland, Oregon @CAITO_200_OK
  • 9. Stream Processing > The Challenges @CAITO_200_OK ● You can’t pause to fix it ● Lots of data, FAST ● Ingesting multiple formats ● Failure recovery ● Needs to scale
  • 10. Flink > Addressing Stream Processing’s Challenges @CAITO_200_OK
  • 11. Flink > Addressing Stream Processing’s Challenges @CAITO_200_OK
  • 12. Flink > Basics @CAITO_200_OK 12 Building Blocks (events, state, (event) time) DataStream API (streams, windows) Table API (dynamic tables) Flink SQL PyFlink Ease of Use Expressiven ess Streaming Analytics & ML Stateful Stream Processing
  • 13. Flink > Summary @CAITO_200_OK Flexible APIs ● Ease of use/Expressiveness ● Wide Range of Use Cases High Performance ● Local State Access ● High Throughput/Low Latency Stateful Processing ● State = First-class Citizen ● Event-time Support Fault Tolerance ● Distributed State Snapshots ● Exactly-once Guarantees
  • 14. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 15. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 16. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 17. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 18. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 19. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 20. Flink SQL @CAITO_200_OK ● Stream processing: real-time processing ● Stream processing is complex ● Flink is highly performant streaming ● Flink solves many problems in streaming ● Flink is complex ● Flink SQL: access to Flink’s benefits ● Abstracts away the complexity
  • 21. Flink SQL Demo @CAITO_200_OK ● Making the complex simple ● You could start a data pipeline anywhere! ● Language agnostic From: Free Guy movie
  • 22. Flink SQL Demo > Regular SQL @CAITO_200_OK user cnt Mary 2 Bob 1 SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; Take a snapshot when the query starts A final result is produced A row that was added after the query was started is not considered user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… The query terminates Image: Marta Paes @morsapaes
  • 23. Flink SQL Demo > Flink SQL @CAITO_200_OK user cTime url user cnt SELECT user_id, COUNT(url) AS cnt FROM clicks GROUP BY user_id; Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1 Mary 2 Ingest all changes as they happen Continuously update the result The result is identical to the one-time query (at this point) Image: Marta Paes @morsapaes
  • 25. Flink SQL Demo @CAITO_200_OK ● Check Java version ● Download Flink Snapshot ● Un-tar it
  • 26. 26 What Next? >> Flink SQL Cookbook
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. Flink SQL Demo @CAITO_200_OK ● Flink SQL + DataGen ● Same startup steps ● True stream processing example
  • 36. Flink + Pulsar @CAITO_200_OK “Stream as a unified view on data” “Batch as a special case of streaming”
  • 37. Flink + Pulsar @CAITO_200_OK ● Pub/Sub messaging layer (streaming) ● Durable storage layer (batch)
  • 38. Flink + Pulsar > Unified Processing with Flink @CAITO_200_OK ● Mix historic & real-time ● Reuse code & logic ● Simplify operations now bounded query unbounded query past future bounded query start of the stream unbounded query
  • 39. Flink + Pulsar > Unified data stack @CAITO_200_OK Unified Processing Engine (Batch / Streaming) Unified Storage (Segments / Pub/Sub)
  • 40. Demo > Twier Firehose @CAITO_200_OK Demo: Marta Paes @morsapaes
  • 41. Demo > Twier Firehose @CAITO_200_OK Demo: Marta Paes @morsapaes
  • 42. Demo > Twier Firehose @CAITO_200_OK CREATE CATALOG pulsar WITH ( 'type' = 'pulsar', 'service-url' = 'pulsar://pulsar:6650', 'admin-url' = 'http://pulsar:8080', 'format' = 'json' ); Catalog DDL Demo: Marta Paes @morsapaes
  • 43. Demo > Twier Firehose @CAITO_200_OK Not cool. 👹 Demo: Marta Paes @morsapaes
  • 44. Demo > Get Relevant Timestamps @CAITO_200_OK CREATE TABLE pulsar_tweets ( publishTime TIMESTAMP(3) METADATA, WATERMARK FOR publishTime AS publishTime - INTERVAL '5' SECOND ) WITH ( 'connector' = 'pulsar', 'topic' = 'persistent://public/default/tweets', 'value.format' = 'json', 'service-url' = 'pulsar://pulsar:6650', 'admin-url' = 'http://pulsar:8080', 'scan.startup.mode' = 'earliest-offset' ) LIKE tweets; Derive schema from the original topic Define the source connector (Pulsar) Read and use Pulsar message metadata Demo: Marta Paes @morsapaes
  • 45. Demo > Windowed Aggregation @CAITO_200_OK CREATE TABLE pulsar_tweets_agg ( tmstmp TIMESTAMP(3), tweet_cnt BIGINT ) WITH ( 'connector'='pulsar', 'topic'='persistent://public/default/tweets_agg', 'value.format'='json', 'service-url'='pulsar://pulsar:6650', 'admin-url'='http://pulsar:8080' ); Sink Table DDL INSERT INTO pulsar_tweets_agg SELECT TUMBLE_START(publishTime, INTERVAL '10' SECOND) AS wStart, COUNT(id) AS tweet_cnt FROM pulsar_tweets GROUP BY TUMBLE(publishTime, INTERVAL '10' SECOND); Continuous SQL Query Demo: Marta Paes @morsapaes
  • 46. Demo > Tweet Count in Windows @CAITO_200_OK Demo: Marta Paes @morsapaes
  • 51. How to Get Involved @CAITO_200_OK ● Geing involved page: one source for Flink community resources ● hps://flink.apache.org/community.html
  • 52. Contribute @CAITO_200_OK ● Github ● Issue Tracker ● Becoming a Commier
  • 54. New Slack Space! @CAITO_200_OK ● Go-to space for user troubleshooting ● 800 members in less than 2 months ● Members include most of the Flink commiers + PMC members
  • 56. Hangout With Us @CAITO_200_OK ● Regional meetups ● Virtual and in person options ● hps://www.meetup.com/topics/apache-flink/
  • 57. Stay Connected @CAITO_200_OK ● Twier ● Website ● Blog - Flink ● Blog - Ververica ● Youtube
  • 60. Resources ● Flink Ahead: What Comes After Batch & Streaming: https://youtu.be/h5OYmy9Yx7Y ● Apache Pulsar as one Storage System for Real Time & Historical Data Analysis: https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017 ● Flink Table API & SQL: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#ope rations ● Flink SQL Cookbook: https://github.com/ververica/flink-sql-cookbook ● When Flink & Pulsar Come Together: https://flink.apache.org/2019/05/03/pulsar-flink.html ● How to Query Pulsar Streams in Flink: https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.ht ml ● What’s New in the Flink/Pulsar Connector: ● https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html ● Marta’s Demo: https://github.com/morsapaes/flink-sql-pulsar 60 @Caito_200_OK