SlideShare a Scribd company logo
1 of 82
Download to read offline
Ā© 2023 Cloudera, Inc. All rights reserved.
Getting Started With Real-time
Cloud Native Streaming With Java
Apache Pulsar Development 101 with Java
Tim Spann
Principal Developer Advocate
15-March-2023
Ā© 2023 Cloudera, Inc. All rights reserved.
TOPICS
Ā© 2023 Cloudera, Inc. All rights reserved. 3
Topics
ā— Introduction to Streaming
ā— Introduction to Apache Pulsar
ā— Introduction to Apache Kafka
ā— FLaNK Stack
ā— Demos
Ā© 2023 Cloudera, Inc. All rights reserved.
Ā© 2023 Cloudera, Inc. All rights reserved.
FLiPN-FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://github.com/tspannhw/EverythingApacheNiFi
https://medium.com/@tspann
Apache NiFi x Apache Kafka x Apache Flink x Java
Ā© 2023 Cloudera, Inc. All rights reserved. 6
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark, Java and Open Source friends.
https://bit.ly/32dAJft
Ā© 2023 Cloudera, Inc. All rights reserved. 7
Largest Java Conference in the US!
12 tracks on Java, Cloud, Frameworks, Streaming, etcā€¦
Devnexus.com
Join me! Save with SEEMESPEAK
https://devnexus.com/presentations/apache-pulsar-development-101-with-java
Ā© 2023 Cloudera, Inc. All rights reserved.
STREAMING
Ā© 2023 Cloudera, Inc. All rights reserved. 9
STREAMING FROM ā€¦ TO .. WHILE ..
Data distribution as a ļ¬rst class citizen
IOT
Devices
LOG DATA
SOURCES
ON-PREM
DATA SOURCES
BIG DATA CLOUD
SERVICES
CLOUD BUSINESS
PROCESS SERVICES *
CLOUD DATA*
ANALYTICS /SERVICE
(Cloudera DW)
App
Logs
Laptops
/Servers Mobile
Apps
Security
Agents
CLOUD
WAREHOUSE
UNIVERSAL
DATA DISTRIBUTION
(Ingest, Transform, Deliver)
Ingest
Processors
Ingest
Gateway
Router, Filter &
Transform
Processors
Destination
Processors
Ā© 2023 Cloudera, Inc. All rights reserved. 10
End to End Streaming Pipeline Example
Enterprise
sources
Weather
Errors
Aggregates
Alerts
Stocks
ETL
Analytics
Clickstream Market data
Machine logs Social
SQL
Ā© 2023 Cloudera, Inc. All rights reserved. 11
Streaming for Java Developers
Multiple users, frameworks, languages, devices, data sources & clusters
ā€¢ Expert in ETL (Eating, Ties
and Laziness)
ā€¢ Deep SME in Buzzwords
ā€¢ No Coding Skills
ā€¢ R&D into Lasers
CAT AI
ā€¢ Will Drive your Car?
ā€¢ Will Fix Your Code?
ā€¢ Will Beat You At Q-Bert
ā€¢ Will Write my Next Talk
STREAMING ENGINEER
ā€¢ Coding skills in Python,
Java
ā€¢ Experience with Apache
Kafka
ā€¢ Knowledge of database
query languages such as
SQL
ā€¢ Knowledge of tools such
as Apache Flink, Apache
Spark and Apache NiFi
JAVA DEVELOPER
ā€¢ Frameworks like Spring,
Quarkus and micronaut
ā€¢ Relational Databases, SQL
ā€¢ Cloud
ā€¢ Dev and Build Tools
Ā© 2023 Cloudera, Inc. All rights reserved.
APACHE PULSAR
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.11.0/apache-pulsar-2.11.0-bin.
tar.gz
tar xvfz apache-pulsar-2.11.0-bin.tar.gz
cd apache-pulsar-2.11.0
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone/
<or> Run in Docker
docker run -it 
-p 6650:6650 
-p 8080:8080 
--mount source=pulsardata,target=/pulsar/data 
--mount source=pulsarconf,target=/pulsar/conf 
apachepulsar/pulsar:2.11.0 
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create meetup
bin/pulsar-admin namespaces create meetup/philly
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list meetup
bin/pulsar-admin topics create persistent://meetup/philly/first
bin/pulsar-admin topics list meetup/philly
https://github.com/tspannhw/Meetup-YourFirstEventDrivenApp
CLI Message Producing & Consuming
bin/pulsar-client produce
"persistent://meetup/philly/first" --messages 'Hello
Pulsar!'
bin/pulsar-client consume
"persistent://meetup/philly/first" -s first-reader -n 0
Monitoring and Metrics Check
curl http://localhost:8080/admin/v2/persistent/meetup/philly/first/stats |
python3 -m json.tool
bin/pulsar-admin topics stats-internal persistent://meetup/philly/first
curl http://localhost:8080/metrics/
bin/pulsar-admin topics peek-messages --count 5 --subscription first-reader
persistent://meetup/philly/first
bin/pulsar-admin topics subscriptions persistent://meetup/philly/first
Cleanup
bin/pulsar-admin topics delete persistent://meetup/philly/first
bin/pulsar-admin namespaces delete meetup/philly
bin/pulsar-admin tenants delete meetup
101
Uniļ¬ed
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Inļ¬nite
Scalability
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Tenants / Namespaces / Topics
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Messages - The Basic Unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Pulsar Cluster
ā— ā€œBookiesā€
ā— Stores messages and cursors
ā— Messages are grouped in
segments/ledgers
ā— A group of bookies form an
ā€œensembleā€ to store a ledger
ā— ā€œBrokersā€
ā— Handles message routing and
connections
ā— Stateless, but with caches
ā— Automatic load-balancing
ā— Topics are composed of
multiple segments
ā—
ā— Stores metadata for
both Pulsar and
BookKeeper
ā— Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Apache Pulsar: Messaging vs Streaming
Message Queueing - Queueing
systems are ideal for work queues
that do not require tasks to be
performed in a particular order.
Streaming - Streaming works
best in situations where the
order of messages is important.
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single active
consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Flexible Pub/Sub API for Pulsar - Shared
Consumer consumer =
client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();
Flexible Pub/Sub API for Pulsar - Failover
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();
Data Offloaders
(Tiered Storage)
Client Libraries
StreamNative Pulsar Ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
ā€¦ and more!
ā€¦ and more!
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP
On Pulsar
(AoP)
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Building Real-Time Requires a Team
Pulsar - Spring
https://github.com/spring-projects-experimental/spring-pulsar
Pulsar - Spring - Code
@Autowired
private PulsarTemplate<Observation> pulsarTemplate;
this.pulsarTemplate.setSchema(Schema.
JSON(Observation.class));
MessageId msgid = pulsarTemplate.newMessage(observation)
.withMessageCustomizer((mb) -> mb.key(uuidKey.toString()))
.send();
@PulsarListener(subscriptionName = "aq-spring-reader", subscriptionType = Shared,
schemaType = SchemaType.
JSON, topics = "persistent://public/default/aq-pm25")
void echoObservation(Observation message) {
this.log.info("PM2.5 Message received: {}", message);
}
Pulsar - Spring - Conļ¬guration
spring:
pulsar:
client:
service-url: pulsar+ssl://sn-academy.sndevadvocate.snio.cloud:6651
auth-plugin-class-name: org.apache.pulsar.client.impl.auth.oauth2.AuthenticationOAuth2
authentication:
issuer-url: https://auth.streamnative.cloud/
private-key: file:///scr/sndevadvocate-tspann.json
audience: urn:sn:pulsar:sndevadvocate:my-instance
producer:
batching-enabled: false
send-timeout-ms: 90000
producer-name: airqualityjava
topic-name: persistent://public/default/airquality
Spring - Pulsar as Kafka
https://www.baeldung.com/spring-kafka
@Bean
public KafkaTemplate<String, Observation> kafkaTemplate() {
KafkaTemplate<String, Observation> kafkaTemplate =
new KafkaTemplate<String, Observation>(producerFactory());
return kafkaTemplate;
}
ProducerRecord<String, Observation> producerRecord = new ProducerRecord<>(topicName,
uuidKey.toString(),
message);
kafkaTemplate.send(producerRecord);
Spring - MQTT - Pulsar
https://roytuts.com/publish-subscribe-message-onto-mqtt-using-spring/
@Bean
public IMqttClient mqttClient(
@Value("${mqtt.clientId}") String clientId,
@Value("${mqtt.hostname}") String hostname,
@Value("${mqtt.port}") int port)
throws MqttException {
IMqttClient mqttClient = new MqttClient(
"tcp://" + hostname + ":" + port, clientId);
mqttClient.connect(mqttConnectOptions());
return mqttClient;
}
MqttMessage mqttMessage = new MqttMessage();
mqttMessage.setPayload(DataUtility.serialize(payload));
mqttMessage.setQos(0);
mqttMessage.setRetained(true);
mqttClient.publish(topicName, mqttMessage);
Spring - AMQP - Pulsar
https://www.baeldung.com/spring-amqp
rabbitTemplate.convertAndSend(topicName,
DataUtility.serializeToJSON(observation));
@Bean
public CachingConnectionFactory
connectionFactory() {
CachingConnectionFactory ccf =
new CachingConnectionFactory();
ccf.setAddresses(serverName);
return ccf;
}
Reactive Spring - Pulsar
Reactive Spring - Pulsar
The FLiPN kitten crosses the stream 4 ways with Apache Pulsar
Demo
Ā© 2023 Cloudera, Inc. All rights reserved. 47
REST + Spring Boot + Pulsar + Friends
Ā© 2023 Cloudera, Inc. All rights reserved.
APACHE PULSAR JAVA FUNCTION
https://github.com/tspannhw/pulsar-airquality-function
public class AirQualityFunction implements Function<byte[], Void> {
@Override
public Void process(byte[] input, Context context) {
if ( input == null || context == null ) {
return null;
}
//context.getInputTopics().toString()
if ( context.getLogger() != null && context.getLogger().isDebugEnabled() ) {
context.getLogger().debug("LOG:" + input.toString());
}
context.newOutputMessage(ā€œNewTopicNameā€, JSONSchema.of(Observation.class))
.key(UUID.randomUUID().toString())
.property(ā€œLanguageā€, ā€œjavaā€)
.value(observation)
.send();
}
}
Ā© 2023 Cloudera, Inc. All rights reserved.
APACHE KAFKA
Ā© 2023 Cloudera, Inc. All rights reserved. 51
Apache Kafka
ā€¢ Highly reliable distributed
messaging system
ā€¢ Decouple applications, enables
many-to-many patterns
ā€¢ Publish-Subscribe semantics
ā€¢ Horizontal scalability
ā€¢ Eļ¬ƒcient implementation to
operate at speed with big data
volumes
ā€¢ Organized by topic to support
several use cases
Source
System
Source
System
Source
System
Kafka
Fraud
Detection
Security
Systems
Real-Time
Monitoring
Source
System
Source
System
Source
System
Fraud
Detection
Security
Systems
Real-Time
Monitoring
Many-To-Many
Publish-Subscribe
Point-To-Point
Request-Response
Ā© 2023 Cloudera, Inc. All rights reserved.
STREAM TEAM
Ā© 2023 Cloudera, Inc. All rights reserved. 53
Clouderaā€™s Data-In-Motion Services
Cloudera Offers Two Core Data-In-Motion Services: DataFlow & Stream Processing
CLOUDERA SDX ā€” Secure, Monitor and Govern your
Streaming workloads with the same tooling using Apache
Ranger & Apache Atlas.
STREAM PROCESSING ā€” Powered by Apache Flink and
Kafka, it provides a complete, enterprise-grade stream
management and stateful processing solution. With
support for industry standard interfaces like SQL,
developers, data analysts, and data scientist can easily
build a wide variety of hybrid real-time applications.
DATAFLOW ā€” Powered by Apache NiFi, it enables
developers to connect to any data source anywhere with
any structure, process it, and deliver to any destination
using a low-code authoring experience.
Ā© 2023 Cloudera, Inc. All rights reserved. 54
CSP Community
Edition
ā€¢ Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
ā€¢ Runs in Docker
ā€¢ Try new features quickly
ā€¢ Develop applications locally
ā— Docker compose ļ¬le of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
ā—‹ $>docker compose up
ā— Licensed under the Cloudera Community License
ā— Unsupported
ā— Community Group Hub for CSP
ā— Find it on docs.cloudera.com under Applications
Ā© 2023 Cloudera, Inc. All rights reserved. 55
STREAMING DATA WITH CLOUDERA DATA FLOW (CDF)
01 03
04
05
Cloudera Flow
Management
Cloudera Stream
Processing
Streams
Replication
Manager
Cloudera Flow
Management
Cloudera
Machine
Learning
02
Collect
Buffer
Replicate
Distribute
Score
Regional Data Centers Global Public Cloud
Edge
Ā© 2023 Cloudera, Inc. All rights reserved. 56
ENABLING ANALYTICS AND INSIGHTS ANYWHERE
Driving enterprise business value
REAL-TIME
STREAMING
ENGINE
ANALYTICS &
DATA WAREHOUSE
DATA SCIENCE/
MACHINE LEARNING
CENTRALIZED DATA
PLATFORM
STORAGE & PROCESSING
ANALYTICS & INSIGHTS
Stream
Ingest
Ingest ā€“ Data
at Rest
Deploy
Models
BI
Solutions
SQL Predictive
Analytics
ā€¢ Model Building
ā€¢ Model Training
ā€¢ Model Scoring
Actions &
Alerts
[SQL]
Real-Time
Apps
STREAMING DATA
SOURCES
Clickstream Market data
Machine logs Social
ENTERPRISE DATA
SOURCES
CRM
Customer
history
Research
Compliance
Data
Risk Data
Lending
Ā© 2023 Cloudera, Inc. All rights reserved.
Ā© 2019 Cloudera, Inc. All rights reserved. 57
EVENT-DRIVEN ORGANIZATION
Modernize your data and applications
CDF Event Streaming Platform
Integration - Processing - Management - Cloud
Stream
ETL
Cloud
Storage
Application
Data Lake Data Stores
Make
Payment
ĀµServices
Streams
Edge - IoT Dashboard
Ā© 2023 Cloudera, Inc. All rights reserved.
DATAFLOW
APACHE NIFI
Ā© 2023 Cloudera, Inc. All rights reserved. 59
CLOUDERA FLOW AND EDGE MANAGEMENT
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data
center) to any downstream system with built in end-to-end security and provenance
Advanced tooling to industrialize
ļ¬‚ow development (Flow Development
Life Cycle)
ACQUIRE
ā€¢ Over 300 Prebuilt Processors
ā€¢ Easy to build your own
ā€¢ Parse, Enrich & Apply Schema
ā€¢ Filter, Split, Merger & Route
ā€¢ Throttle & Backpressure
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
PROCESS
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ENCRYPT
TALL
EVALUATE
EXECUTE
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
ROUTE RATE
DISTRIBUTE LOAD
DELIVER
ā€¢ Guaranteed Delivery
ā€¢ Full data provenance from
acquisition to delivery
ā€¢ Diverse, Non-Traditional Sources
ā€¢ Eco-system integration
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
Ā© 2023 Cloudera, Inc. All rights reserved. 60
Cloudera DataFlow: Universal Data Distribution Service
Process
Route
Filter
Enrich
Transform
Distribute
Connectors
Any
destination
Deliver
Ingest
Active
Passive
Connectors
Gateway
Endpoint
Connect & Pull
Send
Data born in
the cloud
Data born
outside the
cloud
UNIVERSAL DATA DISTRIBUTION WITH CLOUDERA DATAFLOW (CDF)
Connect to Any Data Source Anywhere then Process and Deliver to Any Destination
61
Ā© 2023 Cloudera, Inc. All rights reserved.
WHAT IS APACHE NIFI?
Apache NiFi is a scalable, real-time streaming data
platform that collects, curates, and analyzes data so
customers gain key insights for immediate
actionable intelligence.
Ā© 2023 Cloudera, Inc. All rights reserved. 62
APACHE NIFI
Enable easy ingestion, routing, management and delivery of
any data anywhere (Edge, cloud, data center) to any
downstream system with built in end-to-end security and
provenance
ACQUIRE PROCESS DELIVER
ā€¢ Over 300 Prebuilt Processors
ā€¢ Easy to build your own
ā€¢ Parse, Enrich & Apply Schema
ā€¢ Filter, Split, Merger & Route
ā€¢ Throttle & Backpressure
ā€¢ Guaranteed Delivery
ā€¢ Full data provenance from acquisition to
delivery
ā€¢ Diverse, Non-Traditional Sources
ā€¢ Eco-system integration
Advanced tooling to industrialize ļ¬‚ow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
Ā© 2023 Cloudera, Inc. All rights reserved. 63
https://www.datainmotion.dev/2020/06/no-more-spaghetti-ļ¬‚ows.html
ā— Reduce, Reuse, Recycle. Use Parameters to reuse
common modules.
ā— Put ļ¬‚ows, reusable chunks into separate Process
Groups.
ā— Write custom processors if you need new or
specialized features
ā— Use Cloudera supported NiFi Processors
ā— Use Record Processors everywhere
No More Spaghetti Flows
Ā© 2023 Cloudera, Inc. All rights reserved.
JAVA DEV FOR NIFI
Ā© 2023 Cloudera, Inc. All rights reserved. 65
https://github.com/tspannhw/nifi-tensorflow-processor
public class TensorFlowProcessor extends AbstractProcessor {
public static final PropertyDescriptor MODEL_DIR = new
PropertyDescriptor.Builder().name(MODEL_DIR_NAME)
.description("Model Directory").required(true).expressionLanguageSupported(true)
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR).build();
@Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws
ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
flowFile = session.create();
}
try {
flowFile.getAttributes();
}
Ā© 2023 Cloudera, Inc. All rights reserved.
APACHE FLINK
Ā© 2023 Cloudera, Inc. All rights reserved. 67
Flink SQL
https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html
ā— Streaming Analytics
ā— Continuous SQL
ā— Continuous ETL
ā— Complex Event Processing
ā— Standard SQL Powered by Apache Calcite
Ā© 2023 Cloudera, Inc. All rights reserved. 68
Flink SQL
-- specify Kafka partition key on output
SELECT foo AS _eventKey FROM sensors
-- use event time timestamp from kafka
-- exactly once compatible
SELECT eventTimestamp FROM sensors
-- nested structures access
SELECT foo.ā€™barā€™ FROM table; -- must quote nested
column
-- timestamps
SELECT * FROM payments
WHERE eventTimestamp > CURRENT_TIMESTAMP-interval
'10' second;
-- unnest
SELECT b.*, u.*
FROM bgp_avro b,
UNNEST(b.path) AS u(pathitem)
-- aggregations and windows
SELECT card,
MAX(amount) as theamount,
TUMBLE_END(eventTimestamp, interval '5' minute) as
ts
FROM payments
WHERE lat IS NOT NULL
AND lon IS NOT NULL
GROUP BY card,
TUMBLE(eventTimestamp, interval '5' minute)
HAVING COUNT(*) > 4 -- >4==fraud
-- try to do this ksql!
SELECT us_west.user_score+ap_south.user_score
FROM kafka_in_zone_us_west us_west
FULL OUTER JOIN kafka_in_zone_ap_south ap_south
ON us_west.user_id = ap_south.user_id;
Key Takeaway: Rich SQL grammar with advanced time and aggregation tools
69
Ā© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simpliļ¬es access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
Ā© 2023 Cloudera, Inc. All rights reserved.
DEMO
Ā© 2023 Cloudera, Inc. All rights reserved. 71
End to End Streaming Demo Pipeline
Enterprise
sources
Weather
Errors
Aggregates
Alerts
Stocks
ETL
Analytics
Streaming SQL
Clickstream Market data
Machine logs Social
https://github.com/tspannhw/CloudDemo2021
AI BASED ENHANCEMENTS
SERVE
SOURCES
Data Warehouse
Report
Sensorid
Sensor conditions
Machine Learning
Predict, Automate
Control System
REPORT
Visualize
CLOUD
Collect
COLLECT
Message Broker
Data Flow
Distribute
Sensor id
Temperature
COLLECT and DISTRIBUTE DATA
1. Data is collected from sensors that use mqtt protocol via
CEM and sent to CDP Public Cloud
2. Two CDF flows run in the cloud to accomplish our two
goals: streaming analytics and batch analytics
ENRICH, REPORT
Report, Automate
Real time alerting
SQL
Stream Builder
Data Visualization
Edge
Management
Humidity
Timestamp
Visualize
Data Visualization
USE DATA
3. Streaming Use Case: some conditions of our greenhouse
must be avoided and have to be controlled in real time.
Some warnings have been defined to alert us in case
alerting conditions are met, control system is
automatically activated to adjust environmental
variables.
4. Batch analytics: to ensure the optimal growth of our
plants the ideal conditions have to be met for each
plant. Each 6 hours the plant conditions are monitored
and in case some control adjustment is required a ML
model gives a suggestion about getting the optimal
point minimizing the cost.
EDGE
Control System
Ā© 2023 Cloudera, Inc. All rights reserved. 73
Ā© 2023 Cloudera, Inc. All rights reserved.
RESOURCES AND WRAP-UP
Ā© 2023 Cloudera, Inc. All rights reserved. 75
ā— https://streamnative.io/blog/engineering/2022-11-29-spring-into-pulsar-part-2-spri
ng-based-microservices-for-multiple-protocols-with-apache-pulsar/
ā— https://streamnative.io/blog/release/2022-09-21-announcing-spring-for-apache-pu
lsar/
ā— https://docs.spring.io/spring-pulsar/docs/current-SNAPSHOT/reference/html/
ā— https://spring.io/blog/2022/08/16/introducing-experimental-spring-support-for-apa
che-pulsar
ā— https://medium.com/@tspann/using-the-new-spring-boot-apache-pulsar-integratio
n-8a38447dce7b
Spring + Pulsar References
Ā© 2023 Cloudera, Inc. All rights reserved. 76
ā— https://spring.io/guides/gs/spring-boot/
ā— https://spring.io/projects/spring-amqp/
ā— https://spring.io/projects/spring-kafka/
ā— https://github.com/spring-projects/spring-integration-kafka
ā— https://github.com/spring-projects/spring-integration
ā— https://github.com/spring-projects/spring-data-relational
ā— https://github.com/spring-projects/spring-kafka
ā— https://github.com/spring-projects/spring-amqp
Spring Things
Ā© 2023 Cloudera, Inc. All rights reserved. 77
STREAMING RESOURCES
ā€¢ https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
ā€¢ https://ļ¬‚ipstackweekly.com/
ā€¢ https://www.datainmotion.dev/
ā€¢ https://www.ļ¬‚ankstack.dev/
ā€¢ https://github.com/tspannhw
ā€¢ https://medium.com/@tspann
ā€¢ https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
ā€¢ https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
Ā© 2023 Cloudera, Inc. All rights reserved. 78
78
Apache
Pulsar
in Action
Please enjoy Davidā€™s complete
book which is the ultimate
guide to Pulsar.
Ā© 2021 Cloudera, Inc. All rights reserved. 79
https://github.com/tspannhw https://www.datainmotion.dev/
Ā© 2021 Cloudera, Inc. All rights reserved. 80
March 16 - Virtual
March 17 - Trenton
April 4 - Atlanta
April 24 - San Francisco
Upcoming Events
Ā© 2023 Cloudera, Inc. All rights reserved. 81
Resources
82
TH N Y U

More Related Content

Similar to PhillyJug Getting Started With Real-time Cloud Native Streaming With Java

Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
Ā 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Timothy Spann
Ā 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
Ā 
Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
Ā 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
Ā 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
Ā 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
Ā 
Living the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Ā 
Living the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootTimothy Spann
Ā 
Apache kafka
Apache kafkaApache kafka
Apache kafkaKumar Shivam
Ā 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
Ā 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
Ā 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
Ā 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19ssuser73434e
Ā 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
Ā 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
Ā 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Ā 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
Ā 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Timothy Spann
Ā 

Similar to PhillyJug Getting Started With Real-time Cloud Native Streaming With Java (20)

Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Ā 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
Ā 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Ā 
Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Ā 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Ā 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Ā 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Ā 
Living the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring Boot
Ā 
Living the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring Boot
Ā 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ā 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Ā 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ā 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
Ā 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Ā 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Ā 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Ā 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Ā 
Apache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt LtdApache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt Ltd
Ā 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Ā 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Ā 

More from Timothy Spann

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Ā 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
Ā 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
Ā 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Ā 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
Ā 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
Ā 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
Ā 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
Ā 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
Ā 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
Ā 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
Ā 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
Ā 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
Ā 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
Ā 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
Ā 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
Ā 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
Ā 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
Ā 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
Ā 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
Ā 

More from Timothy Spann (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Ā 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
Ā 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
Ā 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Ā 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
Ā 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Ā 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
Ā 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
Ā 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Ā 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Ā 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Ā 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Ā 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Ā 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
Ā 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Ā 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Ā 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Ā 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Ā 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Ā 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Ā 

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
Ā 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
Ā 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
Ā 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
Ā 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
Ā 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
Ā 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
Ā 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
Ā 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
Ā 
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”soniya singh
Ā 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
Ā 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
Ā 
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...gurkirankumar98700
Ā 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
Ā 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
Ā 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
Ā 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
Ā 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
Ā 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
Ā 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
Ā 

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Ā 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
Ā 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
Ā 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
Ā 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
Ā 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
Ā 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
Ā 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Ā 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
Ā 
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Call Girls in Naraina Delhi šŸ’ÆCall Us šŸ”8264348440šŸ”
Ā 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Ā 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Ā 
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...
(Genuine) Escort Service Lucknow | Starting ā‚¹,5K To @25k with A/C šŸ§‘šŸ½ā€ā¤ļøā€šŸ§‘šŸ» 89...
Ā 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
Ā 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
Ā 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
Ā 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Ā 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
Ā 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Ā 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Ā 

PhillyJug Getting Started With Real-time Cloud Native Streaming With Java

  • 1. Ā© 2023 Cloudera, Inc. All rights reserved. Getting Started With Real-time Cloud Native Streaming With Java Apache Pulsar Development 101 with Java Tim Spann Principal Developer Advocate 15-March-2023
  • 2. Ā© 2023 Cloudera, Inc. All rights reserved. TOPICS
  • 3. Ā© 2023 Cloudera, Inc. All rights reserved. 3 Topics ā— Introduction to Streaming ā— Introduction to Apache Pulsar ā— Introduction to Apache Kafka ā— FLaNK Stack ā— Demos
  • 4. Ā© 2023 Cloudera, Inc. All rights reserved.
  • 5. Ā© 2023 Cloudera, Inc. All rights reserved. FLiPN-FLaNK Stack Tim Spann @PaasDev // Blog: www.datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://github.com/tspannhw/EverythingApacheNiFi https://medium.com/@tspann Apache NiFi x Apache Kafka x Apache Flink x Java
  • 6. Ā© 2023 Cloudera, Inc. All rights reserved. 6 FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark, Java and Open Source friends. https://bit.ly/32dAJft
  • 7. Ā© 2023 Cloudera, Inc. All rights reserved. 7 Largest Java Conference in the US! 12 tracks on Java, Cloud, Frameworks, Streaming, etcā€¦ Devnexus.com Join me! Save with SEEMESPEAK https://devnexus.com/presentations/apache-pulsar-development-101-with-java
  • 8. Ā© 2023 Cloudera, Inc. All rights reserved. STREAMING
  • 9. Ā© 2023 Cloudera, Inc. All rights reserved. 9 STREAMING FROM ā€¦ TO .. WHILE .. Data distribution as a ļ¬rst class citizen IOT Devices LOG DATA SOURCES ON-PREM DATA SOURCES BIG DATA CLOUD SERVICES CLOUD BUSINESS PROCESS SERVICES * CLOUD DATA* ANALYTICS /SERVICE (Cloudera DW) App Logs Laptops /Servers Mobile Apps Security Agents CLOUD WAREHOUSE UNIVERSAL DATA DISTRIBUTION (Ingest, Transform, Deliver) Ingest Processors Ingest Gateway Router, Filter & Transform Processors Destination Processors
  • 10. Ā© 2023 Cloudera, Inc. All rights reserved. 10 End to End Streaming Pipeline Example Enterprise sources Weather Errors Aggregates Alerts Stocks ETL Analytics Clickstream Market data Machine logs Social SQL
  • 11. Ā© 2023 Cloudera, Inc. All rights reserved. 11
  • 12. Streaming for Java Developers Multiple users, frameworks, languages, devices, data sources & clusters ā€¢ Expert in ETL (Eating, Ties and Laziness) ā€¢ Deep SME in Buzzwords ā€¢ No Coding Skills ā€¢ R&D into Lasers CAT AI ā€¢ Will Drive your Car? ā€¢ Will Fix Your Code? ā€¢ Will Beat You At Q-Bert ā€¢ Will Write my Next Talk STREAMING ENGINEER ā€¢ Coding skills in Python, Java ā€¢ Experience with Apache Kafka ā€¢ Knowledge of database query languages such as SQL ā€¢ Knowledge of tools such as Apache Flink, Apache Spark and Apache NiFi JAVA DEVELOPER ā€¢ Frameworks like Spring, Quarkus and micronaut ā€¢ Relational Databases, SQL ā€¢ Cloud ā€¢ Dev and Build Tools
  • 13. Ā© 2023 Cloudera, Inc. All rights reserved. APACHE PULSAR
  • 14.
  • 15. Run a Local Standalone Bare Metal wget https://archive.apache.org/dist/pulsar/pulsar-2.11.0/apache-pulsar-2.11.0-bin. tar.gz tar xvfz apache-pulsar-2.11.0-bin.tar.gz cd apache-pulsar-2.11.0 bin/pulsar standalone https://pulsar.apache.org/docs/en/standalone/
  • 16. <or> Run in Docker docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.11.0 bin/pulsar standalone https://pulsar.apache.org/docs/en/standalone-docker/
  • 17. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create meetup bin/pulsar-admin namespaces create meetup/philly bin/pulsar-admin tenants list bin/pulsar-admin namespaces list meetup bin/pulsar-admin topics create persistent://meetup/philly/first bin/pulsar-admin topics list meetup/philly https://github.com/tspannhw/Meetup-YourFirstEventDrivenApp
  • 18. CLI Message Producing & Consuming bin/pulsar-client produce "persistent://meetup/philly/first" --messages 'Hello Pulsar!' bin/pulsar-client consume "persistent://meetup/philly/first" -s first-reader -n 0
  • 19. Monitoring and Metrics Check curl http://localhost:8080/admin/v2/persistent/meetup/philly/first/stats | python3 -m json.tool bin/pulsar-admin topics stats-internal persistent://meetup/philly/first curl http://localhost:8080/metrics/ bin/pulsar-admin topics peek-messages --count 5 --subscription first-reader persistent://meetup/philly/first bin/pulsar-admin topics subscriptions persistent://meetup/philly/first
  • 20. Cleanup bin/pulsar-admin topics delete persistent://meetup/philly/first bin/pulsar-admin namespaces delete meetup/philly bin/pulsar-admin tenants delete meetup
  • 22. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  • 23. Tenants / Namespaces / Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster
  • 24. Messages - The Basic Unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 25. Pulsar Cluster ā— ā€œBookiesā€ ā— Stores messages and cursors ā— Messages are grouped in segments/ledgers ā— A group of bookies form an ā€œensembleā€ to store a ledger ā— ā€œBrokersā€ ā— Handles message routing and connections ā— Stateless, but with caches ā— Automatic load-balancing ā— Topics are composed of multiple segments ā— ā— Stores metadata for both Pulsar and BookKeeper ā— Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Storage
  • 26. Producer-Consumer Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic
  • 27. Apache Pulsar: Messaging vs Streaming Message Queueing - Queueing systems are ideal for work queues that do not require tasks to be performed in a particular order. Streaming - Streaming works best in situations where the order of messages is important.
  • 28. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 29. Flexible Pub/Sub API for Pulsar - Shared Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("work-q-1") .subscriptionType(SubType.Shared) .subscribe();
  • 30. Flexible Pub/Sub API for Pulsar - Failover Consumer consumer = client.newConsumer() .topic("my-topic") .subscriptionName("stream-1") .subscriptionType(SubType.Failover) .subscribe();
  • 31. Data Offloaders (Tiered Storage) Client Libraries StreamNative Pulsar Ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines ā€¦ and more! ā€¦ and more!
  • 35. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 38. Pulsar - Spring - Code @Autowired private PulsarTemplate<Observation> pulsarTemplate; this.pulsarTemplate.setSchema(Schema. JSON(Observation.class)); MessageId msgid = pulsarTemplate.newMessage(observation) .withMessageCustomizer((mb) -> mb.key(uuidKey.toString())) .send(); @PulsarListener(subscriptionName = "aq-spring-reader", subscriptionType = Shared, schemaType = SchemaType. JSON, topics = "persistent://public/default/aq-pm25") void echoObservation(Observation message) { this.log.info("PM2.5 Message received: {}", message); }
  • 39. Pulsar - Spring - Conļ¬guration spring: pulsar: client: service-url: pulsar+ssl://sn-academy.sndevadvocate.snio.cloud:6651 auth-plugin-class-name: org.apache.pulsar.client.impl.auth.oauth2.AuthenticationOAuth2 authentication: issuer-url: https://auth.streamnative.cloud/ private-key: file:///scr/sndevadvocate-tspann.json audience: urn:sn:pulsar:sndevadvocate:my-instance producer: batching-enabled: false send-timeout-ms: 90000 producer-name: airqualityjava topic-name: persistent://public/default/airquality
  • 40. Spring - Pulsar as Kafka https://www.baeldung.com/spring-kafka @Bean public KafkaTemplate<String, Observation> kafkaTemplate() { KafkaTemplate<String, Observation> kafkaTemplate = new KafkaTemplate<String, Observation>(producerFactory()); return kafkaTemplate; } ProducerRecord<String, Observation> producerRecord = new ProducerRecord<>(topicName, uuidKey.toString(), message); kafkaTemplate.send(producerRecord);
  • 41. Spring - MQTT - Pulsar https://roytuts.com/publish-subscribe-message-onto-mqtt-using-spring/ @Bean public IMqttClient mqttClient( @Value("${mqtt.clientId}") String clientId, @Value("${mqtt.hostname}") String hostname, @Value("${mqtt.port}") int port) throws MqttException { IMqttClient mqttClient = new MqttClient( "tcp://" + hostname + ":" + port, clientId); mqttClient.connect(mqttConnectOptions()); return mqttClient; } MqttMessage mqttMessage = new MqttMessage(); mqttMessage.setPayload(DataUtility.serialize(payload)); mqttMessage.setQos(0); mqttMessage.setRetained(true); mqttClient.publish(topicName, mqttMessage);
  • 42. Spring - AMQP - Pulsar https://www.baeldung.com/spring-amqp rabbitTemplate.convertAndSend(topicName, DataUtility.serializeToJSON(observation)); @Bean public CachingConnectionFactory connectionFactory() { CachingConnectionFactory ccf = new CachingConnectionFactory(); ccf.setAddresses(serverName); return ccf; }
  • 45. The FLiPN kitten crosses the stream 4 ways with Apache Pulsar
  • 46. Demo
  • 47. Ā© 2023 Cloudera, Inc. All rights reserved. 47 REST + Spring Boot + Pulsar + Friends
  • 48. Ā© 2023 Cloudera, Inc. All rights reserved. APACHE PULSAR JAVA FUNCTION
  • 49. https://github.com/tspannhw/pulsar-airquality-function public class AirQualityFunction implements Function<byte[], Void> { @Override public Void process(byte[] input, Context context) { if ( input == null || context == null ) { return null; } //context.getInputTopics().toString() if ( context.getLogger() != null && context.getLogger().isDebugEnabled() ) { context.getLogger().debug("LOG:" + input.toString()); } context.newOutputMessage(ā€œNewTopicNameā€, JSONSchema.of(Observation.class)) .key(UUID.randomUUID().toString()) .property(ā€œLanguageā€, ā€œjavaā€) .value(observation) .send(); } }
  • 50. Ā© 2023 Cloudera, Inc. All rights reserved. APACHE KAFKA
  • 51. Ā© 2023 Cloudera, Inc. All rights reserved. 51 Apache Kafka ā€¢ Highly reliable distributed messaging system ā€¢ Decouple applications, enables many-to-many patterns ā€¢ Publish-Subscribe semantics ā€¢ Horizontal scalability ā€¢ Eļ¬ƒcient implementation to operate at speed with big data volumes ā€¢ Organized by topic to support several use cases Source System Source System Source System Kafka Fraud Detection Security Systems Real-Time Monitoring Source System Source System Source System Fraud Detection Security Systems Real-Time Monitoring Many-To-Many Publish-Subscribe Point-To-Point Request-Response
  • 52. Ā© 2023 Cloudera, Inc. All rights reserved. STREAM TEAM
  • 53. Ā© 2023 Cloudera, Inc. All rights reserved. 53 Clouderaā€™s Data-In-Motion Services Cloudera Offers Two Core Data-In-Motion Services: DataFlow & Stream Processing CLOUDERA SDX ā€” Secure, Monitor and Govern your Streaming workloads with the same tooling using Apache Ranger & Apache Atlas. STREAM PROCESSING ā€” Powered by Apache Flink and Kafka, it provides a complete, enterprise-grade stream management and stateful processing solution. With support for industry standard interfaces like SQL, developers, data analysts, and data scientist can easily build a wide variety of hybrid real-time applications. DATAFLOW ā€” Powered by Apache NiFi, it enables developers to connect to any data source anywhere with any structure, process it, and deliver to any destination using a low-code authoring experience.
  • 54. Ā© 2023 Cloudera, Inc. All rights reserved. 54 CSP Community Edition ā€¢ Kafka, KConnect, SMM, SR, Flink, and SSB in Docker ā€¢ Runs in Docker ā€¢ Try new features quickly ā€¢ Develop applications locally ā— Docker compose ļ¬le of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ā—‹ $>docker compose up ā— Licensed under the Cloudera Community License ā— Unsupported ā— Community Group Hub for CSP ā— Find it on docs.cloudera.com under Applications
  • 55. Ā© 2023 Cloudera, Inc. All rights reserved. 55 STREAMING DATA WITH CLOUDERA DATA FLOW (CDF) 01 03 04 05 Cloudera Flow Management Cloudera Stream Processing Streams Replication Manager Cloudera Flow Management Cloudera Machine Learning 02 Collect Buffer Replicate Distribute Score Regional Data Centers Global Public Cloud Edge
  • 56. Ā© 2023 Cloudera, Inc. All rights reserved. 56 ENABLING ANALYTICS AND INSIGHTS ANYWHERE Driving enterprise business value REAL-TIME STREAMING ENGINE ANALYTICS & DATA WAREHOUSE DATA SCIENCE/ MACHINE LEARNING CENTRALIZED DATA PLATFORM STORAGE & PROCESSING ANALYTICS & INSIGHTS Stream Ingest Ingest ā€“ Data at Rest Deploy Models BI Solutions SQL Predictive Analytics ā€¢ Model Building ā€¢ Model Training ā€¢ Model Scoring Actions & Alerts [SQL] Real-Time Apps STREAMING DATA SOURCES Clickstream Market data Machine logs Social ENTERPRISE DATA SOURCES CRM Customer history Research Compliance Data Risk Data Lending
  • 57. Ā© 2023 Cloudera, Inc. All rights reserved. Ā© 2019 Cloudera, Inc. All rights reserved. 57 EVENT-DRIVEN ORGANIZATION Modernize your data and applications CDF Event Streaming Platform Integration - Processing - Management - Cloud Stream ETL Cloud Storage Application Data Lake Data Stores Make Payment ĀµServices Streams Edge - IoT Dashboard
  • 58. Ā© 2023 Cloudera, Inc. All rights reserved. DATAFLOW APACHE NIFI
  • 59. Ā© 2023 Cloudera, Inc. All rights reserved. 59 CLOUDERA FLOW AND EDGE MANAGEMENT Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance Advanced tooling to industrialize ļ¬‚ow development (Flow Development Life Cycle) ACQUIRE ā€¢ Over 300 Prebuilt Processors ā€¢ Easy to build your own ā€¢ Parse, Enrich & Apply Schema ā€¢ Filter, Split, Merger & Route ā€¢ Throttle & Backpressure FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG PROCESS HASH MERGE EXTRACT DUPLICATE SPLIT ENCRYPT TALL EVALUATE EXECUTE GEOENRICH SCAN REPLACE TRANSLATE CONVERT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT ROUTE RATE DISTRIBUTE LOAD DELIVER ā€¢ Guaranteed Delivery ā€¢ Full data provenance from acquisition to delivery ā€¢ Diverse, Non-Traditional Sources ā€¢ Eco-system integration FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG
  • 60. Ā© 2023 Cloudera, Inc. All rights reserved. 60 Cloudera DataFlow: Universal Data Distribution Service Process Route Filter Enrich Transform Distribute Connectors Any destination Deliver Ingest Active Passive Connectors Gateway Endpoint Connect & Pull Send Data born in the cloud Data born outside the cloud UNIVERSAL DATA DISTRIBUTION WITH CLOUDERA DATAFLOW (CDF) Connect to Any Data Source Anywhere then Process and Deliver to Any Destination
  • 61. 61 Ā© 2023 Cloudera, Inc. All rights reserved. WHAT IS APACHE NIFI? Apache NiFi is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence.
  • 62. Ā© 2023 Cloudera, Inc. All rights reserved. 62 APACHE NIFI Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance ACQUIRE PROCESS DELIVER ā€¢ Over 300 Prebuilt Processors ā€¢ Easy to build your own ā€¢ Parse, Enrich & Apply Schema ā€¢ Filter, Split, Merger & Route ā€¢ Throttle & Backpressure ā€¢ Guaranteed Delivery ā€¢ Full data provenance from acquisition to delivery ā€¢ Diverse, Non-Traditional Sources ā€¢ Eco-system integration Advanced tooling to industrialize ļ¬‚ow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 63. Ā© 2023 Cloudera, Inc. All rights reserved. 63 https://www.datainmotion.dev/2020/06/no-more-spaghetti-ļ¬‚ows.html ā— Reduce, Reuse, Recycle. Use Parameters to reuse common modules. ā— Put ļ¬‚ows, reusable chunks into separate Process Groups. ā— Write custom processors if you need new or specialized features ā— Use Cloudera supported NiFi Processors ā— Use Record Processors everywhere No More Spaghetti Flows
  • 64. Ā© 2023 Cloudera, Inc. All rights reserved. JAVA DEV FOR NIFI
  • 65. Ā© 2023 Cloudera, Inc. All rights reserved. 65 https://github.com/tspannhw/nifi-tensorflow-processor public class TensorFlowProcessor extends AbstractProcessor { public static final PropertyDescriptor MODEL_DIR = new PropertyDescriptor.Builder().name(MODEL_DIR_NAME) .description("Model Directory").required(true).expressionLanguageSupported(true) .addValidator(StandardValidators.NON_EMPTY_VALIDATOR).build(); @Override public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException { FlowFile flowFile = session.get(); if (flowFile == null) { flowFile = session.create(); } try { flowFile.getAttributes(); }
  • 66. Ā© 2023 Cloudera, Inc. All rights reserved. APACHE FLINK
  • 67. Ā© 2023 Cloudera, Inc. All rights reserved. 67 Flink SQL https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html ā— Streaming Analytics ā— Continuous SQL ā— Continuous ETL ā— Complex Event Processing ā— Standard SQL Powered by Apache Calcite
  • 68. Ā© 2023 Cloudera, Inc. All rights reserved. 68 Flink SQL -- specify Kafka partition key on output SELECT foo AS _eventKey FROM sensors -- use event time timestamp from kafka -- exactly once compatible SELECT eventTimestamp FROM sensors -- nested structures access SELECT foo.ā€™barā€™ FROM table; -- must quote nested column -- timestamps SELECT * FROM payments WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '10' second; -- unnest SELECT b.*, u.* FROM bgp_avro b, UNNEST(b.path) AS u(pathitem) -- aggregations and windows SELECT card, MAX(amount) as theamount, TUMBLE_END(eventTimestamp, interval '5' minute) as ts FROM payments WHERE lat IS NOT NULL AND lon IS NOT NULL GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute) HAVING COUNT(*) > 4 -- >4==fraud -- try to do this ksql! SELECT us_west.user_score+ap_south.user_score FROM kafka_in_zone_us_west us_west FULL OUTER JOIN kafka_in_zone_ap_south ap_south ON us_west.user_id = ap_south.user_id; Key Takeaway: Rich SQL grammar with advanced time and aggregation tools
  • 69. 69 Ā© 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simpliļ¬es access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 70. Ā© 2023 Cloudera, Inc. All rights reserved. DEMO
  • 71. Ā© 2023 Cloudera, Inc. All rights reserved. 71 End to End Streaming Demo Pipeline Enterprise sources Weather Errors Aggregates Alerts Stocks ETL Analytics Streaming SQL Clickstream Market data Machine logs Social https://github.com/tspannhw/CloudDemo2021
  • 72. AI BASED ENHANCEMENTS SERVE SOURCES Data Warehouse Report Sensorid Sensor conditions Machine Learning Predict, Automate Control System REPORT Visualize CLOUD Collect COLLECT Message Broker Data Flow Distribute Sensor id Temperature COLLECT and DISTRIBUTE DATA 1. Data is collected from sensors that use mqtt protocol via CEM and sent to CDP Public Cloud 2. Two CDF flows run in the cloud to accomplish our two goals: streaming analytics and batch analytics ENRICH, REPORT Report, Automate Real time alerting SQL Stream Builder Data Visualization Edge Management Humidity Timestamp Visualize Data Visualization USE DATA 3. Streaming Use Case: some conditions of our greenhouse must be avoided and have to be controlled in real time. Some warnings have been defined to alert us in case alerting conditions are met, control system is automatically activated to adjust environmental variables. 4. Batch analytics: to ensure the optimal growth of our plants the ideal conditions have to be met for each plant. Each 6 hours the plant conditions are monitored and in case some control adjustment is required a ML model gives a suggestion about getting the optimal point minimizing the cost. EDGE Control System
  • 73. Ā© 2023 Cloudera, Inc. All rights reserved. 73
  • 74. Ā© 2023 Cloudera, Inc. All rights reserved. RESOURCES AND WRAP-UP
  • 75. Ā© 2023 Cloudera, Inc. All rights reserved. 75 ā— https://streamnative.io/blog/engineering/2022-11-29-spring-into-pulsar-part-2-spri ng-based-microservices-for-multiple-protocols-with-apache-pulsar/ ā— https://streamnative.io/blog/release/2022-09-21-announcing-spring-for-apache-pu lsar/ ā— https://docs.spring.io/spring-pulsar/docs/current-SNAPSHOT/reference/html/ ā— https://spring.io/blog/2022/08/16/introducing-experimental-spring-support-for-apa che-pulsar ā— https://medium.com/@tspann/using-the-new-spring-boot-apache-pulsar-integratio n-8a38447dce7b Spring + Pulsar References
  • 76. Ā© 2023 Cloudera, Inc. All rights reserved. 76 ā— https://spring.io/guides/gs/spring-boot/ ā— https://spring.io/projects/spring-amqp/ ā— https://spring.io/projects/spring-kafka/ ā— https://github.com/spring-projects/spring-integration-kafka ā— https://github.com/spring-projects/spring-integration ā— https://github.com/spring-projects/spring-data-relational ā— https://github.com/spring-projects/spring-kafka ā— https://github.com/spring-projects/spring-amqp Spring Things
  • 77. Ā© 2023 Cloudera, Inc. All rights reserved. 77 STREAMING RESOURCES ā€¢ https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative ā€¢ https://ļ¬‚ipstackweekly.com/ ā€¢ https://www.datainmotion.dev/ ā€¢ https://www.ļ¬‚ankstack.dev/ ā€¢ https://github.com/tspannhw ā€¢ https://medium.com/@tspann ā€¢ https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 ā€¢ https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 78. Ā© 2023 Cloudera, Inc. All rights reserved. 78 78 Apache Pulsar in Action Please enjoy Davidā€™s complete book which is the ultimate guide to Pulsar.
  • 79. Ā© 2021 Cloudera, Inc. All rights reserved. 79 https://github.com/tspannhw https://www.datainmotion.dev/
  • 80. Ā© 2021 Cloudera, Inc. All rights reserved. 80 March 16 - Virtual March 17 - Trenton April 4 - Atlanta April 24 - San Francisco Upcoming Events
  • 81. Ā© 2023 Cloudera, Inc. All rights reserved. 81 Resources