We examine real-world architectural patterns involving Apache Pulsar to automate the creation of function and pub/sub flows for improved operational scalability and ease of management. We’ll cover CI/CD automation patterns and reveal our innovative approach of leveraging streaming data to create a self-service platform that automates the provisioning of new users. We will also demonstrate the innovative approach of creating function flows through patterns and configuration, enabling non-developer users to create entire function flows simply by changing configurations. These patterns enable us to drive the automation of managing Pulsar to a whole new level. We also cover CI/CD for on-prem, GCP, and AWS users.
This is Part 2 of this presentation: https://www.youtube.com/watch?v=pmaCG...
In summary, we will cover:
CI/CD for on-prem, GCP, and AWS users
Automated creation of function flows by configuration
Automated provisioning of pub/sub users and topics
Architectural patterns and best practices that enable automation
Overstock has leveraged Pulsar as the backbone of a self-service data fabric, a unified data platform to enable users to publish and consume data across the company and integrate with other services. We utilized Pulsar to solve a data governance problem, and Pulsar has performed marvelously. To support our real-world production use cases, we have developed message flows, integrations, and architectural patterns to solve common use cases, maximize value, simplify ease-of-use, automate management, and unify company data and services around this new platform.
At Clever Cloud, we are working on extremely light virtual machines to run WebAssembly binaries. As it’s WASM, we can write code using a lot of languages. We use a custom unikernel to run this WASM as Function-as-a-Service, using one VM per function execution. These VM can run on events from messages coming through Pulsar, or from HTTP invocation, the run is on-demand as only the consumers stay up. This can be a new model: Pulsar functions for real isolation in multi-tenancy use cases. This talk will show the use case, explain the virtualization underneath and demonstrate the multi-tenancy use case.
Securing your Pulsar Cluster with Vault_Chris KelloggStreamNative
Learn how to secure a Pulsar cluster with Hashicorp Vault and deploy it on Kubernetes. Vault provides a secure way to generate tokens and store sensitive data and Pulsar has a pluggable architecture for authentication, authorization and secret management. This talk will walk through how to create custom plugins for Vault, integrate them with Pulsar and then deploy a Pulsar cluster on Kubernetes.
Lessons from managing a Pulsar cluster (Nutanix)StreamNative
In this presentation, we will cover:
- How to performance test and optimize a Pulsar cluster. We will present how we load tested Pulsar with locust and, following this, how we tuned our configurations for our use cases.
- Event sourcing pattern with Apache Pulsar. Avro schema usage, compatibility choices and schema evolution on pulsar topics that worked for us.
- Bonus: How we source Apache Flink from apache pulsar and run our workflows.
By attending this webinar, you can expect to come away with:
- How to performance test a Pulsar cluster for your use case.
- How to leverage the highly configurable broker and Bookkeeper to suit your needs.
- Event sourcing patterns on top of Apache Pulsar.
- Avro schema usage, compatibility choices, and evolution.
- Familiarise with pulsar connector for Flink and possible use cases.
Stream-Native Processing with Pulsar FunctionsStreamlio
The Apache Pulsar messaging solution can perform lightweight, extensible processing on messaging as they stream through the system. This presentation provides an overview of this new functionality.
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative
Kafka-on-Pulsar has been one of the most anticipated features in the Pulsar ecosystem. The Kafka-on-Pulsar project was initiated by StreamNative and the OVHCloud team quickly joined the project to collaborate on its development. Kafka-on-Pulsar enables Kafka applications to leverage Pulsar’s powerful features, such as streamlined operations with enterprise-grade multi-tenancy, without modifying code.
In this webinar, Sijie Guo, from StreamNative, and Pierre Zemb, from OVHCloud, will introduce KoP and discuss the following:
1. What are the key benefits?
2. What is the protocol handler and how does it work?
3. How KoP is implemented?
4. What are the new use cases it unlocks?
5. Watch a Live Demo!
No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021StreamNative
The session will cover the details on how geo replicated topic works under the hood while also touching lightly on the replicated subscriptions. Then we will steer towards the pulsar’s behaviour in various scenarios like updating replicated topic, changes in cluster topology, outages, etc and end with the metrics & configurations to look out for. We will also look into configurations to have predictable failover for replicated subscriptions when dealing with unbounded cross-region lag or subscription lag itself.
Pulsar is a great technology, but it is also a new, less well-known technology competing against incumbent technologies, which is always a bit of a tough sell.
In this talk, we will go over the whole end-to-end process of how we researched, advocated, built, integrated, and established Apache Pulsar at Instructure in less than a year. We will share details of how Pulsar's capabilities differentiate it, how we deploy Pulsar, and how we focused on an ecosystem of tools to accelerate adoption. We will also discuss one major motivating use case of change-data-capture for hundreds of databases servers at scale.
At Clever Cloud, we are working on extremely light virtual machines to run WebAssembly binaries. As it’s WASM, we can write code using a lot of languages. We use a custom unikernel to run this WASM as Function-as-a-Service, using one VM per function execution. These VM can run on events from messages coming through Pulsar, or from HTTP invocation, the run is on-demand as only the consumers stay up. This can be a new model: Pulsar functions for real isolation in multi-tenancy use cases. This talk will show the use case, explain the virtualization underneath and demonstrate the multi-tenancy use case.
Securing your Pulsar Cluster with Vault_Chris KelloggStreamNative
Learn how to secure a Pulsar cluster with Hashicorp Vault and deploy it on Kubernetes. Vault provides a secure way to generate tokens and store sensitive data and Pulsar has a pluggable architecture for authentication, authorization and secret management. This talk will walk through how to create custom plugins for Vault, integrate them with Pulsar and then deploy a Pulsar cluster on Kubernetes.
Lessons from managing a Pulsar cluster (Nutanix)StreamNative
In this presentation, we will cover:
- How to performance test and optimize a Pulsar cluster. We will present how we load tested Pulsar with locust and, following this, how we tuned our configurations for our use cases.
- Event sourcing pattern with Apache Pulsar. Avro schema usage, compatibility choices and schema evolution on pulsar topics that worked for us.
- Bonus: How we source Apache Flink from apache pulsar and run our workflows.
By attending this webinar, you can expect to come away with:
- How to performance test a Pulsar cluster for your use case.
- How to leverage the highly configurable broker and Bookkeeper to suit your needs.
- Event sourcing patterns on top of Apache Pulsar.
- Avro schema usage, compatibility choices, and evolution.
- Familiarise with pulsar connector for Flink and possible use cases.
Stream-Native Processing with Pulsar FunctionsStreamlio
The Apache Pulsar messaging solution can perform lightweight, extensible processing on messaging as they stream through the system. This presentation provides an overview of this new functionality.
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative
Kafka-on-Pulsar has been one of the most anticipated features in the Pulsar ecosystem. The Kafka-on-Pulsar project was initiated by StreamNative and the OVHCloud team quickly joined the project to collaborate on its development. Kafka-on-Pulsar enables Kafka applications to leverage Pulsar’s powerful features, such as streamlined operations with enterprise-grade multi-tenancy, without modifying code.
In this webinar, Sijie Guo, from StreamNative, and Pierre Zemb, from OVHCloud, will introduce KoP and discuss the following:
1. What are the key benefits?
2. What is the protocol handler and how does it work?
3. How KoP is implemented?
4. What are the new use cases it unlocks?
5. Watch a Live Demo!
No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021StreamNative
The session will cover the details on how geo replicated topic works under the hood while also touching lightly on the replicated subscriptions. Then we will steer towards the pulsar’s behaviour in various scenarios like updating replicated topic, changes in cluster topology, outages, etc and end with the metrics & configurations to look out for. We will also look into configurations to have predictable failover for replicated subscriptions when dealing with unbounded cross-region lag or subscription lag itself.
Pulsar is a great technology, but it is also a new, less well-known technology competing against incumbent technologies, which is always a bit of a tough sell.
In this talk, we will go over the whole end-to-end process of how we researched, advocated, built, integrated, and established Apache Pulsar at Instructure in less than a year. We will share details of how Pulsar's capabilities differentiate it, how we deploy Pulsar, and how we focused on an ecosystem of tools to accelerate adoption. We will also discuss one major motivating use case of change-data-capture for hundreds of databases servers at scale.
Strata London 2018: Multi-everything with Apache PulsarStreamlio
Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access. Ivan explores the features built into Pulsar that will help your organization stay in compliance with key requirements and regulations, for multi-data center replication, multi-tenancy, role-based access control, and end-to-end encryption. Ivan concludes by explaining why Pulsar’s multi-data center story will alleviate headaches for the operations teams ensuring compliance with GDPR.
Scaling customer engagement with apache pulsarStreamNative
Iterable's platform is used by marketers to reach hundreds of millions of users every day, and those numbers are quickly growing. Iterable's infrastructure is built with pub-sub messaging at it's core, so the reliability, scalability and flexibility provided by that system are business critical.
In this talk we'll discuss why Iterable chose Pulsar as a pub-sub messaging system, as well as how Iterable is taking advantage of some of more recently added features in Pulsar. We'll also talk about some of the challenges we encountered, where we think Pulsar can improve, and some contributions we've made to the open source community around Pulsar.
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative
OVHcloud is the biggest European cloud provider. From dedicated servers to Managed Kubernetes, from VMware® based Hosted Private Cloud to OpenStack-based Public Cloud, we have over 1.4 million customers worldwide.
Internally, we have been running Apache Kafka for years, and despite all the skills obtained operating multiples clusters with millions of messages per second, we decided to shift and build the foundation of our 'topic-as-a-service' product called ioStream on Apache Pulsar.
In this talk, you will have the insights of why we decided to use Apache Pulsar instead of Apache Kafka as the core of ioStream. We will tell you our journey to use Apache Pulsar, from our deployments to the management, what did work and what did not.
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
Since its stable release in 2016, Akka Streams is quickly becoming the de facto standard integration layer between various Streaming systems and products. Enterprises like PayPal, Intel, Samsung and Norwegian Cruise Lines see this is a game changer in terms of designing Reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages.
This comes from the Reactive Streams initiative in part, which has been long led by Lightbend and others, allowing multiple streaming libraries to inter-operate between each other in a performant and resilient fashion, providing back-pressure all the way. But perhaps even more so thanks to the various integration drivers that have sprung up in the community and the Akka team—including drivers for Apache Kafka, Apache Cassandra, Streaming HTTP, Websockets and much more.
In this webinar for JVM Architects, Konrad Malawski explores the what and why of Reactive integrations, with examples featuring technologies like Akka Streams, Apache Kafka, and Alpakka, a new community project for building Streaming connectors that seeks to “back-pressurize” traditional Apache Camel endpoints.
* An overview of Reactive Streams and what it will look like in JDK 9, and the Akka Streams API implementation for Java and Scala.
* Introduction to Alpakka, a modern, Reactive version of Apache Camel, and its growing community of Streams connectors (e.g. Akka Streams Kafka, MQTT, AMQP, Streaming HTTP/TCP/FileIO and more).
* How Akka Streams and Akka HTTP work with Websockets, HTTP and TCP, with examples in both in Java and Scala.
Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.
#PulsarBeijingMeetup
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent
Do you ever feel that your stream processor gets in the way of expressing business requirements? Most processors are frameworks, which are highly opinionated in the design and implementation of apps. Performing Complex Event Processing invariably leads to calling out to other technologies, but what if that integration didn’t require an RPC call or could be modeled into your stream itself? This talk will explore how to build rich domain, low latency, back-pressured, and stateful streaming applications that require very little infrastructure, using Akka Streams and the Alpakka Kafka connector.
We will explore how Alpakka Kafka maps to Kafka features in order to provide a comprehensive understanding of how to build a robust streaming platform. We’ll explore transactional message delivery, defensive consumer group rebalancing, stateful stages, and state durability/persistence. Akka Streams is built on top of Akka, an asynchronous messaging-driven middleware toolkit that can be used to build Erlang-like Actor Systems in Java or Scala. It is used as a JVM library to facilitate common streaming semantics within an existing or standalone application. It’s different from other stream processors in several ways. It natively supports back-pressure flow control inside a single JVM instance or across distributed systems to help prevent overloading downstream infrastructure. It’s perfect for modeling Complex Event Processing with its easy integration into existing apps and Akka Actor systems. Also, unlike most acyclic stream processors, Akka Streams can support sophisticated pipelines, or Graphs, by allowing the user to model cycles (loops) when there’s a need.
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...StreamNative
Suppose you want to know analytics on your Pulsar topics, or you want to debug those hard corner cases that fail to be sent, or even you want to monitor your Pulsar deployment: how do you do it?
A tool exists to do this and more: Pulsar SQL. Since the 2.2.0 release, Pulsar SQL provides an abstraction layer to run any SQL query we may want against Pulsar effortlessly and without affecting performance. There is nothing like it on the pub-sub ecosystem.
In this short session, we will revisit what Pulsar SQL is, how to make the best out of it, how to deploy it, and how to use it!
Building Out Your Kafka Developer CDC Ecosystemconfluent
Building Out Your Kafka Developer CDC Ecosystem, Neil Buesing, VP of Streaming Technologies for Object Partners (OPI)
Meetup Link: https://www.meetup.com/TwinCities-Apache-Kafka/events/272944023/
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
My talk at Scala Bay Meetup at Netflix about Powering the Partner APIs with Scalatra and Netflix OSS. This talk was delivered on September 9th 2013, at 8 PM at Netflix, Los Gatos.
CCI2018 - Automatizzare la creazione di risorse con ARM template e PowerShellwalk2talk srl
Su Azure è possibile creare risorse in maniera veloce e standardizzata tramite template json che descrivono le risorse da creare sulla piattaforma. Vediamo insieme cosa possono fare, e come possono essere estesi con custom script extension e Powershell Desired State Configuration.
By Marco Obinu
Packer and TerraForm are fundamental components of Infrastructure as Code. I recently gave a talk at a DevOps meetup, which allowed me the opportunity to discuss the basics of these two tools, and how DevOps teams should be using them
Strata London 2018: Multi-everything with Apache PulsarStreamlio
Ivan Kelly offers an overview of Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper, that provides the enterprise features necessary to guarantee that your data is where is should be and only accessible by those who should have access. Ivan explores the features built into Pulsar that will help your organization stay in compliance with key requirements and regulations, for multi-data center replication, multi-tenancy, role-based access control, and end-to-end encryption. Ivan concludes by explaining why Pulsar’s multi-data center story will alleviate headaches for the operations teams ensuring compliance with GDPR.
Scaling customer engagement with apache pulsarStreamNative
Iterable's platform is used by marketers to reach hundreds of millions of users every day, and those numbers are quickly growing. Iterable's infrastructure is built with pub-sub messaging at it's core, so the reliability, scalability and flexibility provided by that system are business critical.
In this talk we'll discuss why Iterable chose Pulsar as a pub-sub messaging system, as well as how Iterable is taking advantage of some of more recently added features in Pulsar. We'll also talk about some of the challenges we encountered, where we think Pulsar can improve, and some contributions we've made to the open source community around Pulsar.
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative
OVHcloud is the biggest European cloud provider. From dedicated servers to Managed Kubernetes, from VMware® based Hosted Private Cloud to OpenStack-based Public Cloud, we have over 1.4 million customers worldwide.
Internally, we have been running Apache Kafka for years, and despite all the skills obtained operating multiples clusters with millions of messages per second, we decided to shift and build the foundation of our 'topic-as-a-service' product called ioStream on Apache Pulsar.
In this talk, you will have the insights of why we decided to use Apache Pulsar instead of Apache Kafka as the core of ioStream. We will tell you our journey to use Apache Pulsar, from our deployments to the management, what did work and what did not.
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
Since its stable release in 2016, Akka Streams is quickly becoming the de facto standard integration layer between various Streaming systems and products. Enterprises like PayPal, Intel, Samsung and Norwegian Cruise Lines see this is a game changer in terms of designing Reactive streaming applications by connecting pipelines of back-pressured asynchronous processing stages.
This comes from the Reactive Streams initiative in part, which has been long led by Lightbend and others, allowing multiple streaming libraries to inter-operate between each other in a performant and resilient fashion, providing back-pressure all the way. But perhaps even more so thanks to the various integration drivers that have sprung up in the community and the Akka team—including drivers for Apache Kafka, Apache Cassandra, Streaming HTTP, Websockets and much more.
In this webinar for JVM Architects, Konrad Malawski explores the what and why of Reactive integrations, with examples featuring technologies like Akka Streams, Apache Kafka, and Alpakka, a new community project for building Streaming connectors that seeks to “back-pressurize” traditional Apache Camel endpoints.
* An overview of Reactive Streams and what it will look like in JDK 9, and the Akka Streams API implementation for Java and Scala.
* Introduction to Alpakka, a modern, Reactive version of Apache Camel, and its growing community of Streams connectors (e.g. Akka Streams Kafka, MQTT, AMQP, Streaming HTTP/TCP/FileIO and more).
* How Akka Streams and Akka HTTP work with Websockets, HTTP and TCP, with examples in both in Java and Scala.
Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.
#PulsarBeijingMeetup
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent
Do you ever feel that your stream processor gets in the way of expressing business requirements? Most processors are frameworks, which are highly opinionated in the design and implementation of apps. Performing Complex Event Processing invariably leads to calling out to other technologies, but what if that integration didn’t require an RPC call or could be modeled into your stream itself? This talk will explore how to build rich domain, low latency, back-pressured, and stateful streaming applications that require very little infrastructure, using Akka Streams and the Alpakka Kafka connector.
We will explore how Alpakka Kafka maps to Kafka features in order to provide a comprehensive understanding of how to build a robust streaming platform. We’ll explore transactional message delivery, defensive consumer group rebalancing, stateful stages, and state durability/persistence. Akka Streams is built on top of Akka, an asynchronous messaging-driven middleware toolkit that can be used to build Erlang-like Actor Systems in Java or Scala. It is used as a JVM library to facilitate common streaming semantics within an existing or standalone application. It’s different from other stream processors in several ways. It natively supports back-pressure flow control inside a single JVM instance or across distributed systems to help prevent overloading downstream infrastructure. It’s perfect for modeling Complex Event Processing with its easy integration into existing apps and Akka Actor systems. Also, unlike most acyclic stream processors, Akka Streams can support sophisticated pipelines, or Graphs, by allowing the user to model cycles (loops) when there’s a need.
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...StreamNative
Suppose you want to know analytics on your Pulsar topics, or you want to debug those hard corner cases that fail to be sent, or even you want to monitor your Pulsar deployment: how do you do it?
A tool exists to do this and more: Pulsar SQL. Since the 2.2.0 release, Pulsar SQL provides an abstraction layer to run any SQL query we may want against Pulsar effortlessly and without affecting performance. There is nothing like it on the pub-sub ecosystem.
In this short session, we will revisit what Pulsar SQL is, how to make the best out of it, how to deploy it, and how to use it!
Building Out Your Kafka Developer CDC Ecosystemconfluent
Building Out Your Kafka Developer CDC Ecosystem, Neil Buesing, VP of Streaming Technologies for Object Partners (OPI)
Meetup Link: https://www.meetup.com/TwinCities-Apache-Kafka/events/272944023/
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
My talk at Scala Bay Meetup at Netflix about Powering the Partner APIs with Scalatra and Netflix OSS. This talk was delivered on September 9th 2013, at 8 PM at Netflix, Los Gatos.
CCI2018 - Automatizzare la creazione di risorse con ARM template e PowerShellwalk2talk srl
Su Azure è possibile creare risorse in maniera veloce e standardizzata tramite template json che descrivono le risorse da creare sulla piattaforma. Vediamo insieme cosa possono fare, e come possono essere estesi con custom script extension e Powershell Desired State Configuration.
By Marco Obinu
Packer and TerraForm are fundamental components of Infrastructure as Code. I recently gave a talk at a DevOps meetup, which allowed me the opportunity to discuss the basics of these two tools, and how DevOps teams should be using them
Altitude SF 2017: Nomad and next-gen application architecturesFastly
Armon Dadgar offers an overview of Nomad, an application scheduler designed for both long-running services and batch jobs. Along the way, Armon explores the benefits of using schedulers for empowering developers and increasing resource utilization and how schedulers enable new next-generation application architectures.
Node Interactive: Node.js Performance and Highly Scalable Micro-ServicesChris Bailey
The fundamental performance characteristics of Node.js, along with the improvements driven through the community benchmarking workgroup, makes Node.js ideal for highly performing micro-service workloads. Translating that into highly responsive, scalable solutions however is still far from easy. This session will discuss why Node.js is right for micro-services, introduce the best practices for building scalable deployments, and show you how to monitor and profile your applications to identify and resolve performance bottlenecks.
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
With AWS, companies now have the ability to develop and run their applications with speed and flexibility like never before. Working with an infrastructure that can be 100 percent API driven enables businesses to use lean methodologies and realize these benefits. This in turn leads to greater success for those who make use of these practices. In this session, we talk about some key concepts and design patterns for continuous deployment and continuous integration, two elements of lean development of applications and infrastructures.
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivAmazon Web Services
Today’s modern infrastructure allows product teams to take full advantage of “infrastructure-as-code” and deliver value to their customers faster through a seamless & smart delivery pipeline.This delivery pipeline is built using AWS and 3rd party tools such as CloudFormation, Lambda, Terraform, Jenkins, Beanstalk, CodeDeploy, Ansible, and Docker. In the presentation we will walk you through the best practices of combining all the above into a “smart-delivery-pipeline” for your team. By Oron Adam, Emind CTO
Docker Online Meetup: Infrakit update and Q&ADocker, Inc.
While working on Docker for AWS and Azure, we realized the need for a standard way to create and manage infrastructure state that was portable across any type of infrastructure, from different cloud providers to on-prem. One challenge is that each vendor has differentiated IP invested in how they handle certain aspects of their cloud infrastructure. It is not enough to just provision five servers; what IT ops teams need is a simple and consistent way to declare the number of servers, what size they should be, and what sort of base software configuration is required. And in the case of server failures (especially unplanned), that sudden change needs to be reconciled against the desired state to ensure that any required servers are re-provisioned with the necessary configuration. We started InfraKit to solves these problems and to provide the ability to create a self healing infrastructure for distributed systems.
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
Planet9energy.com is a new electricity company building a sophisticated analytics and energy trading platform for the UK market. Since the earliest draft of the platform, we took the unconventional decision to go serverless and build the product on top of AWS Lambda and the Serverless framework using Node.js. In this talk, I want to discuss why we took this radical decision, what are the pros and cons of this approach and what are the main issues we faced as a tech team in our design and development experience. We will discuss how normal things like testing and deployment need to be re-thought to work on a serverless fashion but also the benefits of (almost) infinite self-scalability and the peace of mind of not having to manage hundreds of servers. Finally, we will underline how Node.js seems to fit naturally in this scenario and how it makes developing serverless applications extremely convenient.
Technologies:
Backend
Frontend
Application architecture
Javascript
cloud computing
AWS Lambda with Serverless Framework and JavaManish Pandit
Serverless is a node.js based framework that makes creating, deploying, and managing serverless functions a breeze. We will use AWS Lambda as our FaaS (Function-as-a-Service) provider, although Serverless supports IBM OpenWhisk and Microsoft Azure as well.
In this session, we will talk about Serverless Applications, and Create and deploy a java-maven based AWS Lambda API. We will also explore the command line interface to manage lambda, which is provided out of the box by serverless framework.
Sebastien Thomas, System Architect at Coyote Amerique, gave a presentation on operator frameworks. His talk covered how Operator SDK can be used to create Kubernetes Operators with Go.
OpenSource API Server based on Node.js API framework built on supported Node.js platform with Tooling and DevOps. Use cases are Omni-channel API Server, Mobile Backend as a Service (mBaaS) or Next Generation Enterprise Service Bus. Key functionality include built in enterprise connectors, ORM, Offline Sync, Mobile and JS SDKs, Isomorphic JavaScript and Graphical API creation tool.
Do you know what your drupal is doing? Observe it!Luca Lusso
Our Drupal 8 websites are true applications, often very complex ones.
More and more workload is being delegated to external systems, usually microservices, that are used for many different tasks.
Software architectures are becoming more distributed and fragmented.
To track down problems and optimize for performance, it will become mandatory to trace the lifecycle of a single request as it originates from a client, passes through all Drupal subsystems, reaches external (micro)services and comes back.
This is often time consuming and without the right tools may become very difficult.
A simple, unstructured log stream isn't enough anymore; we need to find a way to observe the details of what is going on.
Observability is what it’s all about. This is based on structured logs, metrics and traces. In this talk you will see how to implement these techniques in Drupal, which tools and which modules to use to trace and log all requests that reach our website and how to expose and display useful metrics.
We will integrate Drupal with OpenTracing, Prometheus, Monolog, Grafana and many more.
nuclio is iguazio's open source serverless project. nuclio is 100x faster, brings significant new functionality and works with data and event sources to accelerate performance and development.
Similar to Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost (20)
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
So, you are a responsible software engineer building microservices for Apache Kafka, and life is good. Eventually, you hear the community talking about the outstanding experience they are having with Apache Pulsar features. They talk about infinite event stream retention, a rebalance-free architecture, native support for event processing, and multi-tenancy. Exciting, right? Most people would want to migrate their code to Pulsar. Especially when you know that Pulsar also supports Kafka clients natively via the protocol handler known as KoP — which enables the Kafka client APIs on Pulsar. But, as said before, you are responsible; and you don't believe in fairy tales, just like you don't believe that migrations like this happen effortlessly. This session will discuss the architecture behind protocol handlers, what it means having one enabled on Pulsar, and how the KoP works. It will detail the effort required to migrate a microservice written for Kafka to Pulsar, and whether the code need to change for this.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
This talk describes Klaviyo’s internal messaging system, an asynchronous application framework built around Pulsar that provides a set of high-quality tools for building business-critical asynchronous data flows in unreliable environments. This framework includes: a pulsar ORM and schema migrator for topic configuration; a retry/replay system; a versioned schema registry; a consumer framework oriented around preventing message loss and in hostile environments while maximizing observability; an experimental “online schema change” for topics; and more. Development of this system was informed by lessons learned during heavy use of datastores like RabbitMQ and Kafka, and frameworks like Celery, Spark, and Flink. In addition to the capabilities of this system, this talk will also cover (sometimes painful) lessons learned about the process of converting a heterogenous async-computing environment onto Pulsar and a unified model.
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
In this talk, learn how Toast leverages our Envoy control-plane to manage blue-green deploys of Pulsar consumers, and how this has helped drive adoption across the engineering organization. Dive into the history of Pulsar at Toast, starting from its introduction in 2019 to provide event-driven architecture across a rapidly scaling restaurant software platform. We will detail some of the hurdles that we encountered gaining buy-in across a diverse set of teams, and dive deep into how we enforce best practices and integrate with our service control plane.
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
Event streaming architectures launched a reexamination of applications and systems architectures across the board. We live in a world where answers are needed now in a constant real-time flow. Yet beyond the event streaming system itself, what are the corequisites to ensure our large scale distributed database systems can keep pace with this always-on, always-current real time flow of data? What are the requirements and expectations for this next tech cycle?
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
Pulsar Functions is a succinct framework provided by Apache Pulsar to conduct real-time data processing. Its use cases include ETL pipeline, event-driven applications, and simple data analytics. While Pulsar Functions already provides an extremely simple programming interface, we want to further lower the barrier for users to access real-time data. Since SQL is one of the universal languages in the technology world and well accepted by the vast majority of data engineers, we decided to add a SQL expressing layer on top of Pulsar Functions runtime. In this talk, we will discuss the architecture and implementation of this new service. We will see how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment. We will also walk through use cases like filtering, routing, and projecting messages as well as integrating with the Pulsar IO Connectors framework.
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based on your deployment environment. In this talk, walk through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.
Apache Pulsar is a highly available, distributed messaging system that provides guarantees of no message loss and strong message ordering with predictable read and write latency. In this talk, learn how this can be validated for Apache Pulsar Kubernetes deployments. Various failures are injected using Chaos Mesh to simulate network and other infrastructure failure conditions. There are many questions that are asked about failure scenarios, but it could be hard to find answers to these important questions. When a failure happens, how long does it take to recover? Does it cause unavailability? How does it impact throughput and latency? Are the guarantees of no message loss and strong message ordering kept, even when components fail? If a complete availability zone fails, is the system configured correctly to handle AZ failures? This talk will help you find answers to these questions and apply the tooling and practices to your own testing and validation.
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
Despite what the Ghostbusters said, we’re going to go ahead and cross (or, join) the streams. This session covers getting started with streaming data pipelines, maximizing Pulsar’s messaging system alongside one of the most flexible streaming frameworks available, Apache Flink. Specifically, we’ll demonstrate the use of Flink SQL, which provides various abstractions and allows your pipeline to be language-agnostic. So, if you want to leverage the power of a high-speed, highly customizable stream processing engine without the usual overhead and learning curves of the technologies involved (and their interconnected relationships), then this talk is for you. Watch the step-by-step demo to build a unified batch and streaming pipeline from scratch with Pulsar, via the Flink SQL client. This means you don’t need to be familiar with Flink, (or even a specific programming language). The examples provided are built for highly complex systems, but the talk itself will be accessible to any experience level.
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
Apache Pulsar depends upon message acknowledgments to provide at-least-once or exactly-once processing guarantees. With these guarantees, any transmission between the broker and its producers and consumers requires an acknowledgment. But what happens if an acknowledgment is not received? Resending the message introduces the potential of duplicate processing and increases the likelihood of out or order processing. Therefore, it is critical to understand the Pulsar message redelivery semantics in order to prevent either of these conditions. In this talk, we will walk you through the redelivery semantics of Apache Pulsar, and highlight some of the control mechanisms available to application developers to control this behavior. Finally, we will present best practices for configuring message redelivery to suit various use cases.
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, in order to ensure full utilization of the broker layer. You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. In this talk, we will walk you through the load balancing capabilities of Apache Pulsar, and highlight some of the control mechanisms available to control the distribution of load across the Pulsar brokers. Finally, we will discuss the various loading shedding strategies that are available. At the end of the talk, you will have a better understanding of how Pulsar's broker level auto-balancing works, and how to properly configure it to meet your workload demands.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
This talk describes Klaviyo’s internal messaging system, an asynchronous application framework built around Pulsar that provides a set of high-quality tools for building business-critical asynchronous data flows in unreliable environments. This framework includes: a pulsar ORM and schema migrator for topic configuration; a retry/replay system; a versioned schema registry; a consumer framework oriented around preventing message loss and in hostile environments while maximizing observability; an experimental “online schema change” for topics; and more. Development of this system was informed by lessons learned during heavy use of datastores like RabbitMQ and Kafka, and frameworks like Celery, Spark, and Flink. In addition to the capabilities of this system, this talk will also cover (sometimes painful) lessons learned about the process of converting a heterogenous async-computing environment onto Pulsar and a unified model.
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
In today’s world, we are seeing a big shift toward the Cloud. With this shift comes a big shift in the expectations we have for a messaging system, especially when the messaging system is presented as managed service in a large-scale, multi-tenant environment. For any large-scale enterprise, it’s very important to evaluate messaging system and be confident before expanding complex distributed data systems like Apache Pulsar from on-premise to elastically scalable, fully managed services on cloud services. We must consider aspects such as: migration from and integration with large-scale on-premise clusters, security, cost efficiency, and the cloud friendliness of the architecture, modeling cost and capacity, tenant isolation, deployment robustness, availability, monitoring, etc. Not every messaging system is built to be cloud-native and run as a managed service with cost efficiency. We have been running large-scale Apache Pulsar at Yahoo for the last 8 years on various platforms and hardware configurations while meeting application SLAs and serving more than 1M topics in a cluster. In this talk, we will talk about Pulsar’s journey in Yahoo! from an on-premise platform to a hybrid cloud and on-premise system. We will talk about Pulsar’s architecture and features that make Pulsar a good cloud-native messaging-system choice for any enterprise.
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
Our services team creates, builds, and maintains the as a service offering for base platform services within our organization. Several thousand applications use these custom services daily generating more than 700 million requests per minute. One of these services was our publish / subscriber offering, BQ with custom SDK and custom metrics based on Apache Pulsar. BQ is the core communication service within our organization, having more 200M RPM. All the core processes of the organization depend on this service for operation: the CDC of any of our RDBMS or NoSQL offering, all the eventing efforts of the organization, async communication between apps, notification systems, etc. The backend of the solution was Apache Pulsar running on EC2 on AWS and on top of that we built several components as wrappers of the actual backend, creating our own SDKs and abstractions and in many ways extending the features provided by Pulsar. We had a multi-cluster setup 100% on AWS, with custom Pulsar Docker images running on large ASG setups, along with our own wrapping and admin APIs and DBs. All of this in turn transformed the solution into a volatile solution.
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
There is an increasing need to unleash analytical capabilities directly to the end-users to democratize decision-making. User-Facing Analytics is a new frontier that will shape the products of tomorrow and push the limits of existing technology. It demands a solution that will scale to millions of users to provide fast, real-time insights. In this session, Xiang will talk about his journey to build Apache Pinot to tackle the analytics problem space with the architectural changes and technology inventions made over the past decade. He will also talk about how other big data companies such as LinkedIn, Uber, and Stripe power their user-facing analytical applications.
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
Milvus is an open-source vector database that leverages a novel data fabric to build and manage vector similarity search applications. As the world's most popular vector database, it has already been adopted in production by thousands of companies around the world, including Lucidworks, Shutterstock, and Cloudinary. With the launch of Milvus 2.0, the community aims to introduce a cloud-native, highly scalable and extendable vector similarity solution, and the key design concept is log as data.
Milvus relies on Pulsar as the log pub/sub system. Pulsar helps Milvus to reduce system complexity by loosely decoupling each micro service, making the system stateless by disaggregating log storage and computation, which also makes the system further extendable. We will introduce the overview design, the implementation details of Milvus and its roadmap in this topic.
Takeaways:
1) Get a general idea about what is a vector database and its real-world use cases.
2) Understand the major design principles of Milvus 2.0.
3) Learn how to build a complex system with the help of a modern log system like Pulsar.
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
MQTT (Message Queuing Telemetry Transport,) is a message protocol based on the pub/sub model with the advantages of compact message structure, low resource consumption, and high efficiency, which is suitable for IoT applications with low bandwidth and unstable network environments.
This session will introduce MQTT on Pulsar, which allows developers users of MQTT transport protocol to use Apache Pulsar. I will share the architecture, principles and future planning of MoP, to help you understand Apache Pulsar's capabilities and practices in the IoT industry.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The affect of service quality and online reviews on customer loyalty in the E...
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
1. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures
2. Legend & Referenced Technologies
Pulsar Beam
Pulsar Topic
AWS CodePipeline
Pulsar Brokers
Kubernetes
Golang
Amazon S3
CouchDB
ReactJS
Docker
AWS IAM
GCP Cloud Build
GCP IAM
GCP Cloud Storage
Google Cloud Functions
Pulsar Function
Flink Job
Sonotype Nexus
24. Might need to manually satisfy contract at firstUntil you can get to where the data is originated
25. Build tool Artifact Storage
Build data
Build tool Artifact Storage Storage data
(1)
(2)
Filter to
artifact data
Store
Filter to
artifact data
Store
Push to gate
keeping system
Push to gate
keeping system
Push to deployment
pipeline for desired
environment
Push to deployment
pipeline for desired
environment
33. Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
34. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": "environment",
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
Creates /ops/deployment-automation/[environment]
35. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": "environment",
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
Creates /ops/deployment-automation/[environment]
36. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
Creates /ops/deployment-automation/[environment]
{
"type": "function",
"artifactPathOrUrl": "http://pulsar/reusable-functions/generic-router-function-1.0.1-8-jar-with-dependencies.jar",
"tenant": "ops",
"namespace": "deployment",
"name": "pubSubConfigDeploymentRouter",
"className": "com.yourcompany.pulsar.functions.GenericRouterFunction",
"userConfig": {
"key": “generator-type”,
"tenant": "ops",
"namespace" : "deployment-automation"
},
"inputs": [
"persistent://ops/deployment/pre-deployment-configs-output"
],
"logTopic": "persistent://ops/deployment/pubSubConfigDeploymentRouter-log"
}
37. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"environment": "test",
"configs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"tenant": "exampleTenant",
"namespace": "exampleNamespace",
"name": "exampleIgniteFunction",
"className": "com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
"logTopic": "persistent://public/default/function-log-topic"
}]
}
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there
39. The Router Function
Router’s Function Config specifies a key in the message, such as “environment”, along with a tenant and namespace name.
The router then gets the value of this key in the message and creates a destination topic name from the value.
{
"environment": "test",
"configs": [{
"type": "function",
"artifactPathOrUrl": "http://repo-name/project-name/example-ignite-function-1.0.1-3-jar-with-dependencies.jar",
"tenant": "exampleTenant",
"namespace": "exampleNamespace",
"name": "exampleIgniteFunction",
"className": "com.yourcompany.pulsar.functions.ExampleIgniteFunction",
"inputs": [
"persistent://exampleTenant/exampleNamespace/data-to-dump-into-ignite"
],
"output": "persistent://exampleTenant/exampleNamespace/data-enriched-from-ignite",
"logTopic": "persistent://public/default/function-log-topic"
}]
}
From the message below, the router creates:
/ops/deployment-automation/test
and routes the message there
41. Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Option 2 - more advanced function CI/CD flow for reusable functions
42. Option 3 - more advanced function CI/CD flow for reusable functions with more decoupling from DB
Server Sent Events (SSE’s)
UI Tool
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
43. Build System Storage
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
WebHook Filter/Transform
44. Build System Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
AWS CodePipeline S3
Github Web Hook (1)
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
45. Github Web Hook
(2)
Passes metadata and reference to S3 artifact
Pulsar Beam
or equivalent HTTP Endpoint for Pulsar
Pulsar Brokers
Granting access to download artifacts in S3
. . .
Write JSON to Pulsar
GCP Cloud Build
GCP IAM
(1)
Build System
Storage
Build/storage data
Get our
artifact URL
(and any
necessary
metadata, if
applicable)
46. Filter/Transform
This was best done in Scala
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Security checking logic, such as package
vulnerability checks
Option 1 - Basic function CI/CD flow
Push for real-
time updates
Pull to get
all data
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
UI Tool
Server Sent Events (SSE’s)
WebHook
Download artifact to store in CouchDB
47. Option 2 - more advanced function CI/CD flow for reusable functions
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Once staged configs are approved,
push into test or prod environments
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pull to get all data
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
48. Option 3 - more advanced function CI/CD flow for reusable functions with more decoupling from DB
Deploy to test Deploy to prod
fast-deploy-go
Test Pulsar REST Admin API Prod Pulsar REST Admin API
fast-deploy-go
Router
Server Sent Events (SSE’s)
UI Tool
You could do the download asynchronously at a different point in the
flow, but then you will need to ensure it’s fully downloaded before
pushing the deployment from the UI
Synchronous Artifact
Download/Upload
(1)
(2)
Query to get all places
where the artifact has
been used.
Enrich the JSON with
this data.
Update configs
to use new
artifact
(1) Update configs in
CouchDB by writing as
staged
Synchronously
stage changes in
DB. (Add to
stage set.)
(2)
Push for real-
time updates
Pass command
Synchronously
execute
CouchDB
command
Be careful to avoid creating security
risks with how you implement this
e.g.
“merge-stage-sets”,
“commit-staged-to-test”,
“commit-staged-to-prod”,
“un-stage”,
“rollback”,
“get-all-data”,
etc.
(in a JSON object with any
additional parameters)
(1)
(2) Return result
Filter/Transform
This was best done in Scala
WebHook
Download artifact to store in CouchDB
51. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
52. Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
SNOW = Service Now
SNOW Request
Could be modified to use custom UI instead
User
55. Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
Add into single
JSON array of
function configs
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
56. Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Router
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
57. Save configs of what was created
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
58. User
Request new topic for SNOW Request feed
Request datasource
Approval Gate
ACL approver DataEng
Saves back to SNOW table
(workflow is triggered on write)
Generate
function configs
Generate role
configs
Generate token
configs
Generate tap
function configs
Generate
validation
function configs
Generate
passthrough
function configs
SNOW = Service Now
Fast-Deploy
Report functions
deployed for topic
Role Generator
Report roles
created for topic
Token Generator
Report tokens
created for topic
Flink keyBy request ID
window with 60 second timeout
Save configs of what was created
Add into single
JSON array of
function configs
Router
SNOW Request
Could be modified to use custom UI instead
Populates template for configs for request ID
Be sure to pass the request ID
with each JSON object to
allow all configs to be joined
to the user request after
deployment!
Note: One request ID represents all configs produced by this template
Router removes the routing envelope since it won’t be needed downstream
Note: We created the token generator
as a producer/consumer due to a lack
of available API to generate tokens. So,
we needed to use the Pulsar CLI, which
meant that we needed a disk location to
save the token.
Check if all required objects were created
or if anything is missing.
Report any problems to DataEng. Else,
notify user that their topic is ready and
provide them with the tokens and
connection details.
Notification function that sends Email, UI,
and/or Slack notification.
59. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
60. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
61. Why Streaming and Pulsar – Ammunition for the Business Case: https://www.youtube.com/watch?v=qsz-
FruOGoo&feature=youtu.be
Performance Architecture Deep Dive:
https://streamnative.io/whitepaper/taking-a-deep-dive-into-apache-pulsar-architecture-for-performance-tuning/
How Pulsar works: https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
2020 Apache Pulsar User Survey: https://streamnative.io/whitepaper/sn-apache-pulsar-user-survey-report-2020/
Basics of Pulsar architecture: https://www.youtube.com/watch?v=vlU9UegYab8&feature=youtu.be
Common Pulsar Architectural Patterns: https://www.youtube.com/watch?v=pmaCG1SHAW8&feature=youtu.be
(my most popular video yet!)
You can learn more about Pulsar Beam here: https://kafkaesque.io/introducing-pulsar-beam-http-for-apache-pulsar/
63. Pulsar Architectural Patterns for CI/CD
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Data-Driven CI/CD Automation for Pulsar Function Flows and Pub/Sub
+
Includes on-prem, AWS, and GCP architectures