ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar

Timothy Spann
Timothy SpannDeveloper Advocate
Deep Dive into Building
Streaming Applications with
Apache Pulsar
Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming
technologies including Pulsar, Flink, Spark, NiFi, Spring,
Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT
and more.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
Why Apache Pulsar?
Unified
Messaging Platform
Guaranteed
Message Delivery Resiliency Infinite
Scalability
Building
Microservices
Asynchronous
Communication
Building Real Time
Applications
Highly Resilient
Tiered storage
6
Pulsar Benefits
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Messages - the basic unit of Pulsar
Key Pulsar Concepts:
Messaging vs Streaming
Message Queueing - Queueing systems are ideal
for work queues that do not require tasks to be
performed in a particular order.
Streaming - Streaming works best in situations
where the order of messages is important.
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP On
Pulsar
(AoP)
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
A serverless event streaming
framework
18
19
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute 20
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
21
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.10.1/apache-pulsar-2.10.1-
bin.tar.gz
tar xvfz apache-pulsar-2.10.1-bin.tar.gz
cd apache-pulsar-2.10.1
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
22
<or> Run in Docker
docker run -it 
-p 6650:6650 
-p 8080:8080 
--mount source=pulsardata,target=/pulsar/data 
--mount source=pulsarconf,target=/pulsar/conf 
apachepulsar/pulsar:2.10.1 
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
23
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conf
bin/pulsar-admin namespaces create conf/europe
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conf
bin/pulsar-admin topics create persistent://conf/europe/first
bin/pulsar-admin topics list conf/europe
24
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.10.1[all]'
Includes AARCH64, ARM, M2, INTEL, …
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
https://pypi.org/project/pulsar-client/2.10.0/#files
25
Building a Python 3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer
client.create_producer('persistent://conf/ete/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
26
Building a Python 3 Cloud Producer Oath
python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t
persistent://public/default/pyth --auth-params
'{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json",
"audience":"urn:sn:pulsar:name:myclustr"}'
from pulsar import Client, AuthenticationOauth2
parse = argparse.ArgumentParser(prog=prod.py')
parse.add_argument('-su', '--service-url', dest='service_url', type=str,
required=True)
args = parse.parse_args()
client = pulsar.Client(args.service_url,
authentication=AuthenticationOauth2(args.auth_params))
https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden 27
Example Avro Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import AvroSchema
class thermal(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
thermalschema = AvroSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/pi-thermal-avro',
schema=thermalschema,properties={"producer-name": "thrm" })
thermalRec = thermal()
thermalRec.uuid = "unique-name"
producer.send(thermalRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Thermal
28
Example Json Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import JsonSchema
class weather(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
wsc = JsonSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro
perties={"producer-name": "wthr" })
weatherRec = weather()
weatherRec.uuid = "unique-name"
producer.send(weatherRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Weather
https://github.com/tspannhw/FLiP-PulsarDevPython101
29
Building a Python3 Consumer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer =
client.subscribe('persistent://conf/ete/first',subscription_name='mine')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
30
MQTT from Python
pip3 install paho-mqtt
import paho.mqtt.client as mqtt
client = mqtt.Client("rpi4iot")
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
client.connect("pulsar-server.com", 1883, 180)
client.publish("persistent://public/default/mqtt-2",
payload=json_string,qos=0,retain=True)
https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022
31
Web Sockets from Python
pip3 install websocket-client
import websocket, base64, json
topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1'
ws = websocket.create_connection(topic)
message = "Hello Philly ETE Conference"
message_bytes = message.encode('ascii')
base64_bytes = base64.b64encode(message_bytes)
base64_message = base64_bytes.decode('ascii')
ws.send(json.dumps({'payload' : base64_message,'properties': {'device' :
'macbook'},'context' : 5}))
response = json.loads(ws.recv())
https://pulsar.apache.org/docs/en/client-libraries-websocket/
https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py
https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py
32
Kafka from Python
pip3 install kafka-python
from kafka import KafkaProducer
from kafka.errors import KafkaError
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3)
producer.send('topic-kafka-1', json.dumps(row).encode('utf-8'))
producer.flush()
https://github.com/streamnative/kop
https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts
33
Deploy Python Functions
bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py
--classname "sentiment.Chat" --inputs "persistent://public/default/chat"
--log-topic "persistent://public/default/logs" --name Chat --output
"persistent://public/default/chatresult"
https://github.com/tspannhw/pulsar-pychat-function
34
Pulsar IO Function in Python 3.9+
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
msg_id = context.get_message_id()
fields = json.loads(input)
https://github.com/tspannhw/pulsar-pychat-function
35
Building a Golang Pulsar App
http://pulsar.apache.org/docs/en/client-libraries-go/
go get -u "github.com/apache/pulsar-client-go/pulsar"
import (
"log"
"time"
"github.com/apache/pulsar-client-go/pulsar"
)
func main() {
client, err := pulsar.NewClient(pulsar.ClientOptions{
URL: "pulsar://localhost:6650",OperationTimeout: 30 * time.Second,
ConnectionTimeout: 30 * time.Second,
})
if err != nil {
log.Fatalf("Could not instantiate Pulsar client: %v", err)
}
defer client.Close()
}
36
Pulsar Producer
import java.util.UUID;
import java.net.URL;
import org.apache.pulsar.client.api.Producer;
import org.apache.pulsar.client.api.ProducerBuilder;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.MessageId;
import org.apache.pulsar.client.impl.auth.oauth2.AuthenticationFactoryOAuth2;
PulsarClient client = PulsarClient.builder()
.serviceUrl(serviceUrl)
.authentication(
AuthenticationFactoryOAuth2.clientCredentials(
new URL(issuerUrl), new URL(credentialsUrl.), audience))
.build();
37
Spring RabbitMQ/AMQP Producer
rabbitTemplate.convertAndSend(topicName,
DataUtility.serializeToJSON(observation));
38
Spring MQTT Producer
MqttMessage mqttMessage = new MqttMessage();
mqttMessage.setPayload(DataUtility.serialize(payload));
mqttMessage.setQos(1);
mqttMessage.setRetained(true);
mqttClient.publish(topicName, mqttMessage);
39
Spring Kafka Producer
ProducerRecord<String, String> producerRecord = new
ProducerRecord<>(topicName, uuidKey.toString(),
DataUtility.serializeToJSON(message));
kafkaTemplate.send(producerRecord);
40
Pulsar Simple Producer
String pulsarKey = UUID.randomUUID().toString();
String OS = System.getProperty("os.name").toLowerCase();
ProducerBuilder<byte[]> producerBuilder = client.newProducer().topic(topic)
.producerName("demo");
Producer<byte[]> producer = producerBuilder.create();
MessageId msgID = producer.newMessage().key(pulsarKey).value("msg".getBytes())
.property("device",OS).send();
producer.close();
client.close();
41
import java.util.function.Function;
public class MyFunction implements Function<String, String> {
public String apply(String input) {
return doBusinessLogic(input);
}
}
Your Code Here
Pulsar Function Java
42
import org.apache.pulsar.client.impl.schema.JSONSchema;
import org.apache.pulsar.functions.api.*;
public class AirQualityFunction implements Function<byte[], Void> {
@Override
public Void process(byte[] input, Context context) {
context.getLogger().debug("File:” + new String(input));
context.newOutputMessage(“topicname”,
JSONSchema.of(Observation.class))
.key(UUID.randomUUID().toString())
.property(“prop1”, “value1”)
.value(observation)
.send();
}
}
Your Code Here
Pulsar Function SDK
43
Setting Subscription Type Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscriptionType(SubscriptionType.Shared)
.subscribe();
44
Subscribing to a Topic and Setting Subscription Name
Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscribe();
45
Producing Object Events From Java
ProducerBuilder<Observation> producerBuilder =
pulsarClient.newProducer(JSONSchema.of(Observation.class))
.topic(topicName)
.producerName(producerName).sendTimeout(60,
TimeUnit.SECONDS);
Producer<Observation> producer = producerBuilder.create();
msgID = producer.newMessage()
.key(someUniqueKey)
.value(observation)
.send();
46
Monitoring and Metrics Check
curl http://pulsar1:8080/admin/v2/persistent/conf/europe/first/stats |
python3 -m json.tool
bin/pulsar-admin topics stats-internal persistent://conf/europe/first
curl http://pulsar1:8080/metrics/
bin/pulsar-admin topics stats-internal persistent://conf/europe/first
bin/pulsar-admin topics peek-messages --count 5 --subscription
ete-reader persistent://conf/europe/first
bin/pulsar-admin topics subscriptions persistent://conf/europe/first
47
Metrics: Broker
Broker metrics are exposed under "/metrics" at port 8080.
You can change the port by updating webServicePort to a different port
in the broker.conf configuration file.
All the metrics exposed by a broker are labeled with
cluster=${pulsar_cluster}.
The name of Pulsar cluster is the value of ${pulsar_cluster},
configured in the broker.conf file.
For more information: https://pulsar.apache.org/docs/en/reference-metrics/#broker
48
Metrics: Broker
These metrics are available for brokers:
● Namespace metrics
○ Replication metrics
● Topic metrics
○ Replication metrics
● ManagedLedgerCache metrics
● ManagedLedger metrics
● LoadBalancing metrics
○ BundleUnloading metrics
○ BundleSplit metrics
● Subscription metrics
● Consumer metrics
● ManagedLedger bookie client metrics
49
Cleanup
bin/pulsar-admin topics delete persistent://conf/europe/first
bin/pulsar-admin namespaces delete conf/europe
bin/pulsar-admin tenants delete conf
50
Java for Pulsar
● https://github.com/tspannhw/airquality
● https://github.com/tspannhw/FLiPN-AirQuality-REST
● https://github.com/tspannhw/pulsar-airquality-function
● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022
● https://github.com/tspannhw/FLiP-Py-ADS-B
● https://github.com/tspannhw/pulsar-adsb-function
● https://github.com/tspannhw/airquality-amqp-consumer
● https://github.com/tspannhw/airquality-mqtt-consumer
● https://github.com/tspannhw/airquality-consumer
● https://github.com/tspannhw/airquality-kafka-consumer
51
Python For Pulsar on Pi
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
● https://github.com/tspannhw/airquality
52
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark, Java and Open Source friends.
https://bit.ly/32dAJft
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
1 of 54

Recommended

OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar by
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
822 views56 slides
Deep Dive into Building Streaming Applications with Apache Pulsar by
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Timothy Spann
298 views61 slides
CODEONTHEBEACH_Streaming Applications with Apache Pulsar by
CODEONTHEBEACH_Streaming Applications with Apache PulsarCODEONTHEBEACH_Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache PulsarTimothy Spann
47 views66 slides
Building Modern Data Streaming Apps with Python by
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonTimothy Spann
532 views28 slides
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py) by
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
172 views49 slides
Python web conference 2022 apache pulsar development 101 with python (f li-... by
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
282 views49 slides

More Related Content

Similar to ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar

Using FLiP with influxdb for edgeai iot at scale 2022 by
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
465 views61 slides
[March sn meetup] apache pulsar + apache nifi for cloud data lake by
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
903 views55 slides
Apache Pulsar Development 101 with Python by
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
1.2K views45 slides
Cloud lunch and learn real-time streaming in azure by
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
663 views75 slides
DBCC 2021 - FLiP Stack for Cloud Data Lakes by
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
717 views36 slides
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices by
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesTimothy Spann
443 views29 slides

Similar to ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar(20)

Using FLiP with influxdb for edgeai iot at scale 2022 by Timothy Spann
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann465 views
[March sn meetup] apache pulsar + apache nifi for cloud data lake by Timothy Spann
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann903 views
Apache Pulsar Development 101 with Python by Timothy Spann
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann1.2K views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices by Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 views
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by Timothy Spann
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann197 views
Sink Your Teeth into Streaming at Any Scale by Timothy Spann
Sink Your Teeth into Streaming at Any ScaleSink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Timothy Spann12 views
Sink Your Teeth into Streaming at Any Scale by ScyllaDB
Sink Your Teeth into Streaming at Any ScaleSink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
ScyllaDB317 views
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann305 views
All Day DevOps - FLiP Stack for Cloud Data Lakes by Timothy Spann
All Day DevOps - FLiP Stack for Cloud Data LakesAll Day DevOps - FLiP Stack for Cloud Data Lakes
All Day DevOps - FLiP Stack for Cloud Data Lakes
Timothy Spann493 views
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by Timothy Spann
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann366 views
Virtual Flink Forward 2020: Build your next-generation stream platform based ... by Flink Forward
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Flink Forward1.1K views
bigdata 2022_ FLiP Into Pulsar Apps by Timothy Spann
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann460 views
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp... by Timothy Spann
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann470 views
Timothy Spann: Apache Pulsar for ML by Edunomica
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica37 views
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019 by Thomas Weise
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise1.4K views
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming by Timothy Spann
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann214 views
Real time cloud native open source streaming of any data to apache solr by Timothy Spann
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann759 views
Building an Event Streaming Architecture with Apache Pulsar by ScyllaDB
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB136 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

Dapr Unleashed: Accelerating Microservice Development by
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice DevelopmentMiroslav Janeski
16 views29 slides
Transport Management System - Shipment & Container Tracking by
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container TrackingFreightoscope
6 views3 slides
Flask-Python by
Flask-PythonFlask-Python
Flask-PythonTriloki Gupta
10 views12 slides
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
17 views17 slides
Understanding HTML terminology by
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminologyartembondar5
8 views8 slides
Techstack Ltd at Slush 2023, Ukrainian delegation by
Techstack Ltd at Slush 2023, Ukrainian delegationTechstack Ltd at Slush 2023, Ukrainian delegation
Techstack Ltd at Slush 2023, Ukrainian delegationViktoriiaOpanasenko
7 views4 slides

Recently uploaded(20)

Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski16 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
Supercharging your Python Development Environment with VS Code and Dev Contai... by Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages5 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite7 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 5 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino8 views

ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar

  • 1. Deep Dive into Building Streaming Applications with Apache Pulsar
  • 2. Tim Spann Developer Advocate ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems/ Data Architect ● Experience: ○ 15+ years of experience with batch and streaming technologies including Pulsar, Flink, Spark, NiFi, Spring, Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT and more.
  • 3. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 4. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 5. Why Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  • 7. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Key Pulsar Concepts: Architecture MetaData Storage
  • 8. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. Messages - the basic unit of Pulsar
  • 9. Key Pulsar Concepts: Messaging vs Streaming Message Queueing - Queueing systems are ideal for work queues that do not require tasks to be performed in a particular order. Streaming - Streaming works best in situations where the order of messages is important.
  • 10. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 11. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 15. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  • 16. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 17. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 18. 18
  • 19. 19
  • 20. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute 20
  • 21. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions 21
  • 22. Run a Local Standalone Bare Metal wget https://archive.apache.org/dist/pulsar/pulsar-2.10.1/apache-pulsar-2.10.1- bin.tar.gz tar xvfz apache-pulsar-2.10.1-bin.tar.gz cd apache-pulsar-2.10.1 bin/pulsar standalone (For Pulsar SQL Support) bin/pulsar sql-worker start https://pulsar.apache.org/docs/en/standalone/ 22
  • 23. <or> Run in Docker docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.10.1 bin/pulsar standalone https://pulsar.apache.org/docs/en/standalone-docker/ 23
  • 24. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create conf bin/pulsar-admin namespaces create conf/europe bin/pulsar-admin tenants list bin/pulsar-admin namespaces list conf bin/pulsar-admin topics create persistent://conf/europe/first bin/pulsar-admin topics list conf/europe 24
  • 25. Install Python 3 Pulsar Client pip3 install pulsar-client=='2.10.1[all]' Includes AARCH64, ARM, M2, INTEL, … For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi https://pulsar.apache.org/docs/en/client-libraries-python/ https://pypi.org/project/pulsar-client/2.10.0/#files 25
  • 26. Building a Python 3 Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer client.create_producer('persistent://conf/ete/first') producer.send(('Simple Text Message').encode('utf-8')) client.close() 26
  • 27. Building a Python 3 Cloud Producer Oath python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t persistent://public/default/pyth --auth-params '{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json", "audience":"urn:sn:pulsar:name:myclustr"}' from pulsar import Client, AuthenticationOauth2 parse = argparse.ArgumentParser(prog=prod.py') parse.add_argument('-su', '--service-url', dest='service_url', type=str, required=True) args = parse.parse_args() client = pulsar.Client(args.service_url, authentication=AuthenticationOauth2(args.auth_params)) https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py https://github.com/tspannhw/FLiP-Pi-BreakoutGarden 27
  • 28. Example Avro Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import AvroSchema class thermal(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') thermalschema = AvroSchema(thermal) producer = client.create_producer(topic='persistent://public/default/pi-thermal-avro', schema=thermalschema,properties={"producer-name": "thrm" }) thermalRec = thermal() thermalRec.uuid = "unique-name" producer.send(thermalRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Thermal 28
  • 29. Example Json Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import JsonSchema class weather(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') wsc = JsonSchema(thermal) producer = client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro perties={"producer-name": "wthr" }) weatherRec = weather() weatherRec.uuid = "unique-name" producer.send(weatherRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Weather https://github.com/tspannhw/FLiP-PulsarDevPython101 29
  • 30. Building a Python3 Consumer import pulsar client = pulsar.Client('pulsar://localhost:6650') consumer = client.subscribe('persistent://conf/ete/first',subscription_name='mine') while True: msg = consumer.receive() print("Received message: '%s'" % msg.data()) consumer.acknowledge(msg) client.close() 30
  • 31. MQTT from Python pip3 install paho-mqtt import paho.mqtt.client as mqtt client = mqtt.Client("rpi4iot") row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() client.connect("pulsar-server.com", 1883, 180) client.publish("persistent://public/default/mqtt-2", payload=json_string,qos=0,retain=True) https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022 31
  • 32. Web Sockets from Python pip3 install websocket-client import websocket, base64, json topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1' ws = websocket.create_connection(topic) message = "Hello Philly ETE Conference" message_bytes = message.encode('ascii') base64_bytes = base64.b64encode(message_bytes) base64_message = base64_bytes.decode('ascii') ws.send(json.dumps({'payload' : base64_message,'properties': {'device' : 'macbook'},'context' : 5})) response = json.loads(ws.recv()) https://pulsar.apache.org/docs/en/client-libraries-websocket/ https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py 32
  • 33. Kafka from Python pip3 install kafka-python from kafka import KafkaProducer from kafka.errors import KafkaError row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3) producer.send('topic-kafka-1', json.dumps(row).encode('utf-8')) producer.flush() https://github.com/streamnative/kop https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts 33
  • 34. Deploy Python Functions bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py --classname "sentiment.Chat" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult" https://github.com/tspannhw/pulsar-pychat-function 34
  • 35. Pulsar IO Function in Python 3.9+ from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() msg_id = context.get_message_id() fields = json.loads(input) https://github.com/tspannhw/pulsar-pychat-function 35
  • 36. Building a Golang Pulsar App http://pulsar.apache.org/docs/en/client-libraries-go/ go get -u "github.com/apache/pulsar-client-go/pulsar" import ( "log" "time" "github.com/apache/pulsar-client-go/pulsar" ) func main() { client, err := pulsar.NewClient(pulsar.ClientOptions{ URL: "pulsar://localhost:6650",OperationTimeout: 30 * time.Second, ConnectionTimeout: 30 * time.Second, }) if err != nil { log.Fatalf("Could not instantiate Pulsar client: %v", err) } defer client.Close() } 36
  • 37. Pulsar Producer import java.util.UUID; import java.net.URL; import org.apache.pulsar.client.api.Producer; import org.apache.pulsar.client.api.ProducerBuilder; import org.apache.pulsar.client.api.PulsarClient; import org.apache.pulsar.client.api.MessageId; import org.apache.pulsar.client.impl.auth.oauth2.AuthenticationFactoryOAuth2; PulsarClient client = PulsarClient.builder() .serviceUrl(serviceUrl) .authentication( AuthenticationFactoryOAuth2.clientCredentials( new URL(issuerUrl), new URL(credentialsUrl.), audience)) .build(); 37
  • 39. Spring MQTT Producer MqttMessage mqttMessage = new MqttMessage(); mqttMessage.setPayload(DataUtility.serialize(payload)); mqttMessage.setQos(1); mqttMessage.setRetained(true); mqttClient.publish(topicName, mqttMessage); 39
  • 40. Spring Kafka Producer ProducerRecord<String, String> producerRecord = new ProducerRecord<>(topicName, uuidKey.toString(), DataUtility.serializeToJSON(message)); kafkaTemplate.send(producerRecord); 40
  • 41. Pulsar Simple Producer String pulsarKey = UUID.randomUUID().toString(); String OS = System.getProperty("os.name").toLowerCase(); ProducerBuilder<byte[]> producerBuilder = client.newProducer().topic(topic) .producerName("demo"); Producer<byte[]> producer = producerBuilder.create(); MessageId msgID = producer.newMessage().key(pulsarKey).value("msg".getBytes()) .property("device",OS).send(); producer.close(); client.close(); 41
  • 42. import java.util.function.Function; public class MyFunction implements Function<String, String> { public String apply(String input) { return doBusinessLogic(input); } } Your Code Here Pulsar Function Java 42
  • 43. import org.apache.pulsar.client.impl.schema.JSONSchema; import org.apache.pulsar.functions.api.*; public class AirQualityFunction implements Function<byte[], Void> { @Override public Void process(byte[] input, Context context) { context.getLogger().debug("File:” + new String(input)); context.newOutputMessage(“topicname”, JSONSchema.of(Observation.class)) .key(UUID.randomUUID().toString()) .property(“prop1”, “value1”) .value(observation) .send(); } } Your Code Here Pulsar Function SDK 43
  • 44. Setting Subscription Type Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscriptionType(SubscriptionType.Shared) .subscribe(); 44
  • 45. Subscribing to a Topic and Setting Subscription Name Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscribe(); 45
  • 46. Producing Object Events From Java ProducerBuilder<Observation> producerBuilder = pulsarClient.newProducer(JSONSchema.of(Observation.class)) .topic(topicName) .producerName(producerName).sendTimeout(60, TimeUnit.SECONDS); Producer<Observation> producer = producerBuilder.create(); msgID = producer.newMessage() .key(someUniqueKey) .value(observation) .send(); 46
  • 47. Monitoring and Metrics Check curl http://pulsar1:8080/admin/v2/persistent/conf/europe/first/stats | python3 -m json.tool bin/pulsar-admin topics stats-internal persistent://conf/europe/first curl http://pulsar1:8080/metrics/ bin/pulsar-admin topics stats-internal persistent://conf/europe/first bin/pulsar-admin topics peek-messages --count 5 --subscription ete-reader persistent://conf/europe/first bin/pulsar-admin topics subscriptions persistent://conf/europe/first 47
  • 48. Metrics: Broker Broker metrics are exposed under "/metrics" at port 8080. You can change the port by updating webServicePort to a different port in the broker.conf configuration file. All the metrics exposed by a broker are labeled with cluster=${pulsar_cluster}. The name of Pulsar cluster is the value of ${pulsar_cluster}, configured in the broker.conf file. For more information: https://pulsar.apache.org/docs/en/reference-metrics/#broker 48
  • 49. Metrics: Broker These metrics are available for brokers: ● Namespace metrics ○ Replication metrics ● Topic metrics ○ Replication metrics ● ManagedLedgerCache metrics ● ManagedLedger metrics ● LoadBalancing metrics ○ BundleUnloading metrics ○ BundleSplit metrics ● Subscription metrics ● Consumer metrics ● ManagedLedger bookie client metrics 49
  • 50. Cleanup bin/pulsar-admin topics delete persistent://conf/europe/first bin/pulsar-admin namespaces delete conf/europe bin/pulsar-admin tenants delete conf 50
  • 51. Java for Pulsar ● https://github.com/tspannhw/airquality ● https://github.com/tspannhw/FLiPN-AirQuality-REST ● https://github.com/tspannhw/pulsar-airquality-function ● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022 ● https://github.com/tspannhw/FLiP-Py-ADS-B ● https://github.com/tspannhw/pulsar-adsb-function ● https://github.com/tspannhw/airquality-amqp-consumer ● https://github.com/tspannhw/airquality-mqtt-consumer ● https://github.com/tspannhw/airquality-consumer ● https://github.com/tspannhw/airquality-kafka-consumer 51
  • 52. Python For Pulsar on Pi ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 ● https://github.com/tspannhw/airquality 52
  • 53. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark, Java and Open Source friends. https://bit.ly/32dAJft
  • 54. Let’s Keep in Touch! Tim Spann Developer Advocate PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw