Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
KoP(Kafka on Pulsar) is a protocol handler that provides a convenient method for Kafka clients to connect to Pulsar without any change.
Since the first GA version (2.8.0) of KoP (Kafka on Pulsar) was released several months ago, there has been a lot of significant bug fixes and enhancements, such as the support for authorization, and lower version Kafka client. This talk will walk through the basics of KoP, then introduce the improvements in the 2.9.0 version.
The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021StreamNative
Recently, the Tencent Cloud MQ team open sourced the 0.1.0 version of RoP(RocketMQ on Pulsar). However, many problems occurred during production practices, such as message ID overflow, incomplete message consumption, unbalanced load on consumption model, and invalid consumption requests. We made a series of optimizations for the above problems in RoP 0.2.0, and improved the overall performance and stability of the current RoP version.
In this session, we will introduce the RoP MessageID refactoring, design and implementation of RoP delayed message and routing protocol, RoP ACL design and implementation, RoP performance optimization and the application of RoP in Tencent Cloud.
Key takeaways:
1. Common design and implementation ideas for delayed messages in message queues
2. Implementation ideas and practical applications of Apache Pulsar Broker Entry metadata
3. Different ideas for the realization of message routing between Apache RocketMQ and Apache Pulsar
4. The similarities, differences and combination of authentication implementation schemas between Apache Pulsar and Apache RocketMQ.
RabbitMQ on Pulsar's Practice in Tencent Cloud - Puslar Summit Asia 2021StreamNative
RabbitMQ on Pulsar is a protocol handler developed by Tencent Cloud, which is fully equivalent to the open source RabbitMQ on Pulsar. Functionally, it can support RabbitMQ's authentication, dynamic topology, financial level, dynamic routing and other functional features. The architecture supports cloud-native features such as multi-tenancy, resource isolation, and automatic load balancing. It allows RabbitMQ users to get on the cloud-native message queue train without changing the code, and enjoy the bonuses brought by cloud-native.
In this session, we will share the practice of RabbitMQ on Pulsar in Tencent Cloud.
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
KoP(Kafka on Pulsar) is a protocol handler that provides a convenient method for Kafka clients to connect to Pulsar without any change.
Since the first GA version (2.8.0) of KoP (Kafka on Pulsar) was released several months ago, there has been a lot of significant bug fixes and enhancements, such as the support for authorization, and lower version Kafka client. This talk will walk through the basics of KoP, then introduce the improvements in the 2.9.0 version.
The Evolution History of RoP(RocketMQ-on-Pulsar) - Pulsar Summit Asia 2021StreamNative
Recently, the Tencent Cloud MQ team open sourced the 0.1.0 version of RoP(RocketMQ on Pulsar). However, many problems occurred during production practices, such as message ID overflow, incomplete message consumption, unbalanced load on consumption model, and invalid consumption requests. We made a series of optimizations for the above problems in RoP 0.2.0, and improved the overall performance and stability of the current RoP version.
In this session, we will introduce the RoP MessageID refactoring, design and implementation of RoP delayed message and routing protocol, RoP ACL design and implementation, RoP performance optimization and the application of RoP in Tencent Cloud.
Key takeaways:
1. Common design and implementation ideas for delayed messages in message queues
2. Implementation ideas and practical applications of Apache Pulsar Broker Entry metadata
3. Different ideas for the realization of message routing between Apache RocketMQ and Apache Pulsar
4. The similarities, differences and combination of authentication implementation schemas between Apache Pulsar and Apache RocketMQ.
RabbitMQ on Pulsar's Practice in Tencent Cloud - Puslar Summit Asia 2021StreamNative
RabbitMQ on Pulsar is a protocol handler developed by Tencent Cloud, which is fully equivalent to the open source RabbitMQ on Pulsar. Functionally, it can support RabbitMQ's authentication, dynamic topology, financial level, dynamic routing and other functional features. The architecture supports cloud-native features such as multi-tenancy, resource isolation, and automatic load balancing. It allows RabbitMQ users to get on the cloud-native message queue train without changing the code, and enjoy the bonuses brought by cloud-native.
In this session, we will share the practice of RabbitMQ on Pulsar in Tencent Cloud.
How do we manage more than one thousand of Pegasus clusters - backend partacelyc1112009
A presentation in Apache Pegasus meetup in 2021 from Wang Dan.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
MySQL 5.6 GA版本已经发布了,其中包含了大量的新特性,了解这些新特性,不仅对数据库内核研发有帮助,对于更好的使用MySQL数据库也有着极大的意义。本分享将深入剖析MySQL 5.6新特性的实现细节,一共分为两期:分别是InnoDB引擎以及MySQL Server。本次为第一期,分享 MySQL 5.6 InnoDB引擎中的性能优化与功能增强。
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
So, you are a responsible software engineer building microservices for Apache Kafka, and life is good. Eventually, you hear the community talking about the outstanding experience they are having with Apache Pulsar features. They talk about infinite event stream retention, a rebalance-free architecture, native support for event processing, and multi-tenancy. Exciting, right? Most people would want to migrate their code to Pulsar. Especially when you know that Pulsar also supports Kafka clients natively via the protocol handler known as KoP — which enables the Kafka client APIs on Pulsar. But, as said before, you are responsible; and you don't believe in fairy tales, just like you don't believe that migrations like this happen effortlessly. This session will discuss the architecture behind protocol handlers, what it means having one enabled on Pulsar, and how the KoP works. It will detail the effort required to migrate a microservice written for Kafka to Pulsar, and whether the code need to change for this.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
This talk describes Klaviyo’s internal messaging system, an asynchronous application framework built around Pulsar that provides a set of high-quality tools for building business-critical asynchronous data flows in unreliable environments. This framework includes: a pulsar ORM and schema migrator for topic configuration; a retry/replay system; a versioned schema registry; a consumer framework oriented around preventing message loss and in hostile environments while maximizing observability; an experimental “online schema change” for topics; and more. Development of this system was informed by lessons learned during heavy use of datastores like RabbitMQ and Kafka, and frameworks like Celery, Spark, and Flink. In addition to the capabilities of this system, this talk will also cover (sometimes painful) lessons learned about the process of converting a heterogenous async-computing environment onto Pulsar and a unified model.
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
In this talk, learn how Toast leverages our Envoy control-plane to manage blue-green deploys of Pulsar consumers, and how this has helped drive adoption across the engineering organization. Dive into the history of Pulsar at Toast, starting from its introduction in 2019 to provide event-driven architecture across a rapidly scaling restaurant software platform. We will detail some of the hurdles that we encountered gaining buy-in across a diverse set of teams, and dive deep into how we enforce best practices and integrate with our service control plane.
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
Event streaming architectures launched a reexamination of applications and systems architectures across the board. We live in a world where answers are needed now in a constant real-time flow. Yet beyond the event streaming system itself, what are the corequisites to ensure our large scale distributed database systems can keep pace with this always-on, always-current real time flow of data? What are the requirements and expectations for this next tech cycle?
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
Pulsar Functions is a succinct framework provided by Apache Pulsar to conduct real-time data processing. Its use cases include ETL pipeline, event-driven applications, and simple data analytics. While Pulsar Functions already provides an extremely simple programming interface, we want to further lower the barrier for users to access real-time data. Since SQL is one of the universal languages in the technology world and well accepted by the vast majority of data engineers, we decided to add a SQL expressing layer on top of Pulsar Functions runtime. In this talk, we will discuss the architecture and implementation of this new service. We will see how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment. We will also walk through use cases like filtering, routing, and projecting messages as well as integrating with the Pulsar IO Connectors framework.
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based on your deployment environment. In this talk, walk through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store, thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.
Apache Pulsar is a highly available, distributed messaging system that provides guarantees of no message loss and strong message ordering with predictable read and write latency. In this talk, learn how this can be validated for Apache Pulsar Kubernetes deployments. Various failures are injected using Chaos Mesh to simulate network and other infrastructure failure conditions. There are many questions that are asked about failure scenarios, but it could be hard to find answers to these important questions. When a failure happens, how long does it take to recover? Does it cause unavailability? How does it impact throughput and latency? Are the guarantees of no message loss and strong message ordering kept, even when components fail? If a complete availability zone fails, is the system configured correctly to handle AZ failures? This talk will help you find answers to these questions and apply the tooling and practices to your own testing and validation.
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
Despite what the Ghostbusters said, we’re going to go ahead and cross (or, join) the streams. This session covers getting started with streaming data pipelines, maximizing Pulsar’s messaging system alongside one of the most flexible streaming frameworks available, Apache Flink. Specifically, we’ll demonstrate the use of Flink SQL, which provides various abstractions and allows your pipeline to be language-agnostic. So, if you want to leverage the power of a high-speed, highly customizable stream processing engine without the usual overhead and learning curves of the technologies involved (and their interconnected relationships), then this talk is for you. Watch the step-by-step demo to build a unified batch and streaming pipeline from scratch with Pulsar, via the Flink SQL client. This means you don’t need to be familiar with Flink, (or even a specific programming language). The examples provided are built for highly complex systems, but the talk itself will be accessible to any experience level.
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
Apache Pulsar depends upon message acknowledgments to provide at-least-once or exactly-once processing guarantees. With these guarantees, any transmission between the broker and its producers and consumers requires an acknowledgment. But what happens if an acknowledgment is not received? Resending the message introduces the potential of duplicate processing and increases the likelihood of out or order processing. Therefore, it is critical to understand the Pulsar message redelivery semantics in order to prevent either of these conditions. In this talk, we will walk you through the redelivery semantics of Apache Pulsar, and highlight some of the control mechanisms available to application developers to control this behavior. Finally, we will present best practices for configuring message redelivery to suit various use cases.
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
Pulsar is a horizontally scalable messaging system, so the traffic in a logical cluster must be balanced across all the available Pulsar brokers as evenly as possible, in order to ensure full utilization of the broker layer. You can use multiple settings and tools to control the traffic distribution which requires a bit of context to understand how the traffic is managed in Pulsar. In this talk, we will walk you through the load balancing capabilities of Apache Pulsar, and highlight some of the control mechanisms available to control the distribution of load across the Pulsar brokers. Finally, we will discuss the various loading shedding strategies that are available. At the end of the talk, you will have a better understanding of how Pulsar's broker level auto-balancing works, and how to properly configure it to meet your workload demands.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
This talk describes Klaviyo’s internal messaging system, an asynchronous application framework built around Pulsar that provides a set of high-quality tools for building business-critical asynchronous data flows in unreliable environments. This framework includes: a pulsar ORM and schema migrator for topic configuration; a retry/replay system; a versioned schema registry; a consumer framework oriented around preventing message loss and in hostile environments while maximizing observability; an experimental “online schema change” for topics; and more. Development of this system was informed by lessons learned during heavy use of datastores like RabbitMQ and Kafka, and frameworks like Celery, Spark, and Flink. In addition to the capabilities of this system, this talk will also cover (sometimes painful) lessons learned about the process of converting a heterogenous async-computing environment onto Pulsar and a unified model.
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
In today’s world, we are seeing a big shift toward the Cloud. With this shift comes a big shift in the expectations we have for a messaging system, especially when the messaging system is presented as managed service in a large-scale, multi-tenant environment. For any large-scale enterprise, it’s very important to evaluate messaging system and be confident before expanding complex distributed data systems like Apache Pulsar from on-premise to elastically scalable, fully managed services on cloud services. We must consider aspects such as: migration from and integration with large-scale on-premise clusters, security, cost efficiency, and the cloud friendliness of the architecture, modeling cost and capacity, tenant isolation, deployment robustness, availability, monitoring, etc. Not every messaging system is built to be cloud-native and run as a managed service with cost efficiency. We have been running large-scale Apache Pulsar at Yahoo for the last 8 years on various platforms and hardware configurations while meeting application SLAs and serving more than 1M topics in a cluster. In this talk, we will talk about Pulsar’s journey in Yahoo! from an on-premise platform to a hybrid cloud and on-premise system. We will talk about Pulsar’s architecture and features that make Pulsar a good cloud-native messaging-system choice for any enterprise.
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
Our services team creates, builds, and maintains the as a service offering for base platform services within our organization. Several thousand applications use these custom services daily generating more than 700 million requests per minute. One of these services was our publish / subscriber offering, BQ with custom SDK and custom metrics based on Apache Pulsar. BQ is the core communication service within our organization, having more 200M RPM. All the core processes of the organization depend on this service for operation: the CDC of any of our RDBMS or NoSQL offering, all the eventing efforts of the organization, async communication between apps, notification systems, etc. The backend of the solution was Apache Pulsar running on EC2 on AWS and on top of that we built several components as wrappers of the actual backend, creating our own SDKs and abstractions and in many ways extending the features provided by Pulsar. We had a multi-cluster setup 100% on AWS, with custom Pulsar Docker images running on large ASG setups, along with our own wrapping and admin APIs and DBs. All of this in turn transformed the solution into a volatile solution.
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
There is an increasing need to unleash analytical capabilities directly to the end-users to democratize decision-making. User-Facing Analytics is a new frontier that will shape the products of tomorrow and push the limits of existing technology. It demands a solution that will scale to millions of users to provide fast, real-time insights. In this session, Xiang will talk about his journey to build Apache Pinot to tackle the analytics problem space with the architectural changes and technology inventions made over the past decade. He will also talk about how other big data companies such as LinkedIn, Uber, and Stripe power their user-facing analytical applications.
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
Pulsar Summit San Francisco is the event dedicated to Apache Pulsar. This one-day, action-packed event will include 5 keynotes, 12 breakout sessions, and 1 amazing happy hour. Speakers are from top companies, including Google, AWS, Databricks, Onehouse, StarTree, Intel, ScyllaDB, and more! It’s the perfect opportunity to network with Pulsar thought leaders in person.
Join developers, architects, data engineers, DevOps professionals, and anyone who wants to learn about messaging and event streaming for this one-day, in-person event. Pulsar Summit San Francisco brings the Apache Pulsar Community together to share best practices and discuss the future of streaming technologies.
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
Milvus is an open-source vector database that leverages a novel data fabric to build and manage vector similarity search applications. As the world's most popular vector database, it has already been adopted in production by thousands of companies around the world, including Lucidworks, Shutterstock, and Cloudinary. With the launch of Milvus 2.0, the community aims to introduce a cloud-native, highly scalable and extendable vector similarity solution, and the key design concept is log as data.
Milvus relies on Pulsar as the log pub/sub system. Pulsar helps Milvus to reduce system complexity by loosely decoupling each micro service, making the system stateless by disaggregating log storage and computation, which also makes the system further extendable. We will introduce the overview design, the implementation details of Milvus and its roadmap in this topic.
Takeaways:
1) Get a general idea about what is a vector database and its real-world use cases.
2) Understand the major design principles of Milvus 2.0.
3) Learn how to build a complex system with the help of a modern log system like Pulsar.
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
MQTT (Message Queuing Telemetry Transport,) is a message protocol based on the pub/sub model with the advantages of compact message structure, low resource consumption, and high efficiency, which is suitable for IoT applications with low bandwidth and unstable network environments.
This session will introduce MQTT on Pulsar, which allows developers users of MQTT transport protocol to use Apache Pulsar. I will share the architecture, principles and future planning of MoP, to help you understand Apache Pulsar's capabilities and practices in the IoT industry.