Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost

Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures

Legend & Referenced Technologies
Pulsar Beam
Pulsar Topic
AWS CodePipeline
Pulsar Brokers
Kubernetes
Golang
Amazon S3
CouchDB
ReactJS
Docker
AWS IAM
GCP Cloud Build
GCP IAM
GCP Cloud Storage
Google Cloud Functions
Pulsar Function
Flink Job
Sonotype Nexus

Data + Contact = Modular design
+

Modular Design
Reusable functions

Might need to manually satisfy contract at first

Might need to manually satisfy contract at firstUntil you can get to where the data is originated

Build tool Artifact Storage
Build data
Build tool Artifact Storage Storage data
(1)
(2)
Filter to
artifact data
Store
Filter to
artifact data
Store
Push to gate
keeping system
Push to gate
keeping system
Push to deployment
pipeline for desired
environment
Push to deployment
pipeline for desired
environment

{
"type": "function",
"artifactPathOrUrl": "http://path-to-artifact/example-ignite-function-1.0.1-20200125.003935-3-
jar-with-dependencies.jar",
"tenant": "exampleTenant",
"namespace": "exampleNamespace",
"name": "exampleIgniteFunction-backfill”,
"className": "com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"userConfig": {
"username": "igniteUser",
"password": "exampleHashedPass",
"cache_name": "example-ignite-cache-backfill”,
"hosts_with_ports": "igniteserver1.domain.com:10800, igniteserver2.domain.com:10800,
igniteserver3.domain.com:10800, igniteserver4.domain.com:10800
},
"inputs": [
"persistent://feeds/exampleProject/data-to-dump-into-ignite-backfill”
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite-backfill”,
"logTopic": "persistent://public/default/function-log-topic-backfill”
}

Using the Java Admin API to consume from a Pulsar topic
Pulsar REST
Admin API
Consumer/Producer
{
"type": "function",
"artifactPathOrUrl": "http://path-to-artifact/example-ignite-
function-1.0.1-20200125.003935-3-jar-with-dependencies.jar",
"name": "exampleIgniteFunction",
"className":
"com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"userConfig": {
"cache_name": "example-ignite-cache",
"hosts_with_ports": "igniteserver1.domain.com:10800,
igniteserver2.domain.com:10800, igniteserver3.domain.com:10800,
igniteserver4.domain.com:10800
},
"inputs": [
"persistent://feeds/exampleProject/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-
enriched-from-ignite",
"logTopic": "persistent://public/default/function-log-topic"
}
Pulsar Brokers
via Java
Admin API

More direct, faster, cleaner, and half the code volume
Pulsar REST
Admin API
Consumer/Producer
{
"type": "function",
"className":
"userConfig": {
},
"inputs": [
],
}
Pulsar Brokers

Higher-availability option
Consumer/Producer
Consumer/Producer
Consumer/Producer
Pulsar REST
Admin API
{
"type": "function",
"className":
"userConfig": {
},
"inputs": [
],
}
Pulsar Brokers
via Java Admin API
via Java Admin API
via Java Admin API

Fast-deploy
Pulsar REST
Admin API
{
"type": "function",
"className":
"userConfig": {
},
"inputs": [
],
}
Pulsar Brokers
Or, as a Pulsar function

Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router

The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": "environment",
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
Creates /ops/deployment-automation/[environment]

The Router Function
Creates /ops/deployment-automation/[environment]
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": “generator-type”,
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}

The Router Function
{
"environment": "test",
"conﬁgs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
}]
}
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there

{
"environment": "test",
"configs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
"logTopic": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite-log”
},
{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-filter-function-1.0.0-7-jar-with-dependencies.jar",
"name": "exampleFilterFunction",
"className": "com.yourcompany.pulsar.functions.ExampleFilterFunction",
"inputs": [
"persistent://feeds/exampleProject/raw-data”
],
"output": "persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite",
"logTopic": "persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite-log”
}
]
}

Synchronous Artifact
Download/Upload
(1)
(2)
Push for real-
time updates
Pull to get
all data
UI Tool
Server Sent Events (SSE’s)
Option 1 - Basic function CI/CD ﬂow

UI Tool
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Option 2 - more advanced function CI/CD flow for reusable functions

Option 3 - more advanced function CI/CD ﬂow for reusable functions with more decoupling from DB
UI Tool
Download/Upload
(1)
(2)
been used.
this data.
Update conﬁgs
to use new
artifact
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result

Build System Storage
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
WebHook Filter/Transform

Build System Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
AWS CodePipeline S3
Github Web Hook (1)
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar

Github Web Hook
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
GCP Cloud Build
GCP IAM
(1)
Build System
Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)

Filter/Transform
This was best done in Scala
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Download/Upload
(1)
(2)
Security checking logic, such as package
vulnerability checks
Option 1 - Basic function CI/CD flow
Push for real-
time updates
Pull to get
all data
fast-deploy-go
fast-deploy-go
Router
UI Tool
WebHook
Download artifact to store in CouchDB

Option 2 - more advanced function CI/CD flow for reusable functions
fast-deploy-go
fast-deploy-go
Router
UI Tool
Download/Upload
(1)
(2)
been used.
this data.
Update configs
to use new
artifact
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Filter/Transform
WebHook

Option 3 - more advanced function CI/CD ﬂow for reusable functions with more decoupling from DB
fast-deploy-go
fast-deploy-go
Router
UI Tool
Download/Upload
(1)
(2)
been used.
this data.
Update conﬁgs
to use new
artifact
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
Filter/Transform
WebHook

-backfill
-backfill
-backfill
-backfill
-backfill

User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.

Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workﬂow is triggered on write)
SNOW = Service Now
SNOW Request
Could be modiﬁed to use custom UI instead
User

{
"project": "<team-or-project-or-category>",
"name": "<name-of-the-datasource>",
"backfill": <true or false>
}

Topic Passthrough Topic
Backfill-Topic Backfill-Passthrough
ValidationTap
/feeds/[project]/[name]
SourceTap
SourceTap
/[project]/ingest/[name]
/[project]/ingest/[name]-backfill
/discovery/taps/[project]_[name]-SourceTap
/validation/tap/[project]_[name]-FeedTap
/discovery/taps/[project]_[name]-backfill-SourceTap
/[project]/ingest/[name]-Passthrough
/[project]/ingest/[name]-backfill-Passthrough

Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
Add into single
JSON array of
function configs
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template

Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Router
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.

Save configs of what was created
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.

Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/

Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost

Similar to Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost (20)

More from StreamNative

More from StreamNative (20)

Recently uploaded

Recently uploaded (20)

Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost