Matteo Merli, the tech lead for Cloud Messaging Service at Yahoo, went through their design decisions, how they reached that and how they leverage Apache BookKeeper to implement a multi-tenant messaging service.
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
Apache BookKeeper is a high performance and low latency storage service optimized for storing immutable and append-only data (such as log, streaming events, and objects). Sijie Guo and JV shares the experienced with Apache BookKeeper. This talk covers the motivation and overview of BookKeeper, dives into implementation details and describes the use cases built upon it.
Introduction to Apache BookKeeper Distributed StorageStreamlio
A brief technical introduction to Apache BookKeeper, the scalable, fault-tolerant, and low-latency storage service optimized for real-time and streaming workloads.
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
Yahoo recently open-sourced Pulsar, a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. Pulsar is used across various Yahoo applications for large scale data pipelines. Learn more about Pulsar architecture and use-cases in this talk.
Speakers:
Matteo Merli from Pulsar team at Yahoo
Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
Apache BookKeeper is a high performance and low latency storage service optimized for storing immutable and append-only data (such as log, streaming events, and objects). Sijie Guo and JV shares the experienced with Apache BookKeeper. This talk covers the motivation and overview of BookKeeper, dives into implementation details and describes the use cases built upon it.
Introduction to Apache BookKeeper Distributed StorageStreamlio
A brief technical introduction to Apache BookKeeper, the scalable, fault-tolerant, and low-latency storage service optimized for real-time and streaming workloads.
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
Yahoo recently open-sourced Pulsar, a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. Pulsar is used across various Yahoo applications for large scale data pipelines. Learn more about Pulsar architecture and use-cases in this talk.
Speakers:
Matteo Merli from Pulsar team at Yahoo
Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
Pulsar - flexible pub-sub for internet scaleMatteo Merli
Pub-Sub messaging is a very convenient abstraction that allows system and application developers to decouple components and let them communicate, by acting as durable buffer for transient data, or as a persistent log from where to recover after crashes. This talk will present an overview of Apache Pulsar, the reasons that led to its development and how it enabled many teams at Yahoo and to build scalable and reliable applications. Apache Pulsar has become the defacto pub-sub messaging at Yahoo serving 100+ applications and processing 100’s of billions of messages for over 3+ years.
In this talk, we will explore in detail different categories of use cases that highlight how Pulsar can be applied to solve a broad range of problems thanks to its flexible messaging model that supports both queuing and streaming semantics with a focus on durability and transaction guarantees.
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
Nowadays, real-time computation is heavily used in cases such as online product recommendation, online payment fraud detection and etc.. In the streaming pipeline, Kafka is normally used to store a day/week data, but won't store years-long data, as in looking at the trend historically. So, a batch pipeline is needed for historical data computation. Thus, it's where the Lambda architecture comes in. Lambda has been proved to be effective, and a good balance of speed and reliability. We have been running many systems with Lambda architecture for many years. But the biggest detraction to Lambda architecture has been the need to maintain two distinct (and possibly complex) systems to generate both batch and streaming layers. With that, we have to split our business logic into many segments across different places, which is a challenge to maintain as the business grows and it also increases communication overhead. Secondly, the data are duplicated in two different systems, and we have to move data among different systems for processing. With those challenges, we have been searching for alternatives and found Apache Pulsar a great fit. In this topic, I will show how we solve those problems with Apache Pulsar by making pulsar a unified storage backend for both batch and streaming pipeline, a solution that simplifies the s/w stack, lifts up our work efficiency and lowers the cost at the same time.
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
William Hill is one of the UK’s largest, most well-established gaming companies with a global presence across 9 countries with over 16,000 employees. In recent years the gaming industry and in particular sports betting, has been revolutionised by technology. Customers now demand a wide range of events and markets to bet on both pre-game and in-play 24/7. This has driven out a business need to process more data, provide more updates and offer more markets and prices in real time.
At William Hill, we have invested in a completely new trading platform using Apache Kafka. We process vast quantities of data from a variety of feeds, this data is fed through a variety of odds compilation models, before being piped out to UI apps for use by our trading teams to provide events, markets and pricing data out to various end points across the whole of William Hill. We deal with thousands of sporting events, each with sometimes hundreds of betting markets, each market receiving hundreds of updates. This scales up to vast numbers of messages flowing through our system. We have to process, transform and route that data in real time. Using Apache Kafka, we have built a high throughput, low latency pipeline, based on Cloud hosted Microservices. When we started, we were on a steep learning curve with Kafka, Microservices and associated technologies. This led to fast learnings and fast failings.
In this session, we will tell the story of what we built, what went well, what didn’t go so well and what we learnt. This is a story of how a team of developers learnt (and are still learning) how to use Kafka. We hope that you will be able to take away lessons and learnings of how to build a data processing pipeline with Apache Kafka.
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
In order to leverage the best performance characters of your stream backend, it is important to understand the nitty gritty details of how pulsar stores your data. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this talk, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
This session will empower you with the right background to map your data right with pulsar.
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...Peter Broadhurst
An introduction to one possible MQ architecture - an active/active multiple queue manager client<->server environment.
Summary of detailed topology articles available here:
http://ow.ly/vrUUV
And MQDev blog+discussion on client attachment here:
http://ibm.co/MM8rMl
The 100% open source WSO2 Message Broker is a lightweight, easy-to-use, distributed message-brokering server. It features high availability (HA) support with a complete hot-to-hot continuous availability mode, the ability to scale up to several servers in a cluster, and no single point of failure. It is designed to manage persistent messaging and large numbers of queues, subscribers and messages.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
Pulsar - flexible pub-sub for internet scaleMatteo Merli
Pub-Sub messaging is a very convenient abstraction that allows system and application developers to decouple components and let them communicate, by acting as durable buffer for transient data, or as a persistent log from where to recover after crashes. This talk will present an overview of Apache Pulsar, the reasons that led to its development and how it enabled many teams at Yahoo and to build scalable and reliable applications. Apache Pulsar has become the defacto pub-sub messaging at Yahoo serving 100+ applications and processing 100’s of billions of messages for over 3+ years.
In this talk, we will explore in detail different categories of use cases that highlight how Pulsar can be applied to solve a broad range of problems thanks to its flexible messaging model that supports both queuing and streaming semantics with a focus on durability and transaction guarantees.
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
Nowadays, real-time computation is heavily used in cases such as online product recommendation, online payment fraud detection and etc.. In the streaming pipeline, Kafka is normally used to store a day/week data, but won't store years-long data, as in looking at the trend historically. So, a batch pipeline is needed for historical data computation. Thus, it's where the Lambda architecture comes in. Lambda has been proved to be effective, and a good balance of speed and reliability. We have been running many systems with Lambda architecture for many years. But the biggest detraction to Lambda architecture has been the need to maintain two distinct (and possibly complex) systems to generate both batch and streaming layers. With that, we have to split our business logic into many segments across different places, which is a challenge to maintain as the business grows and it also increases communication overhead. Secondly, the data are duplicated in two different systems, and we have to move data among different systems for processing. With those challenges, we have been searching for alternatives and found Apache Pulsar a great fit. In this topic, I will show how we solve those problems with Apache Pulsar by making pulsar a unified storage backend for both batch and streaming pipeline, a solution that simplifies the s/w stack, lifts up our work efficiency and lowers the cost at the same time.
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
William Hill is one of the UK’s largest, most well-established gaming companies with a global presence across 9 countries with over 16,000 employees. In recent years the gaming industry and in particular sports betting, has been revolutionised by technology. Customers now demand a wide range of events and markets to bet on both pre-game and in-play 24/7. This has driven out a business need to process more data, provide more updates and offer more markets and prices in real time.
At William Hill, we have invested in a completely new trading platform using Apache Kafka. We process vast quantities of data from a variety of feeds, this data is fed through a variety of odds compilation models, before being piped out to UI apps for use by our trading teams to provide events, markets and pricing data out to various end points across the whole of William Hill. We deal with thousands of sporting events, each with sometimes hundreds of betting markets, each market receiving hundreds of updates. This scales up to vast numbers of messages flowing through our system. We have to process, transform and route that data in real time. Using Apache Kafka, we have built a high throughput, low latency pipeline, based on Cloud hosted Microservices. When we started, we were on a steep learning curve with Kafka, Microservices and associated technologies. This led to fast learnings and fast failings.
In this session, we will tell the story of what we built, what went well, what didn’t go so well and what we learnt. This is a story of how a team of developers learnt (and are still learning) how to use Kafka. We hope that you will be able to take away lessons and learnings of how to build a data processing pipeline with Apache Kafka.
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
In order to leverage the best performance characters of your stream backend, it is important to understand the nitty gritty details of how pulsar stores your data. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this talk, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
This session will empower you with the right background to map your data right with pulsar.
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...Peter Broadhurst
An introduction to one possible MQ architecture - an active/active multiple queue manager client<->server environment.
Summary of detailed topology articles available here:
http://ow.ly/vrUUV
And MQDev blog+discussion on client attachment here:
http://ibm.co/MM8rMl
The 100% open source WSO2 Message Broker is a lightweight, easy-to-use, distributed message-brokering server. It features high availability (HA) support with a complete hot-to-hot continuous availability mode, the ability to scale up to several servers in a cluster, and no single point of failure. It is designed to manage persistent messaging and large numbers of queues, subscribers and messages.
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
Apache Pulsar is the next generation messaging system that uses a fundamentally different architecture to achieve durability, performance, scalability, efficiency, multi-tenancy and geo replication.
Citi Tech Talk: Monitoring and Performanceconfluent
The objective of the engagement is for Citi to have an understanding and path forward to monitor their Confluent Platform and
- Platform Monitoring
- Maintenance and Upgrade
IBM MQ - better application performanceMarkTaylorIBM
Presented in Feb 2015 at Interconnect
This presentation is aimed at helping application developers understand how to best use MQ features for higher performance.
Topic: Speedtest: Benchmark Your Apache Kafka®️
Abstract: In this session, Mark will talk about running benchmarking utilities for Apache Kafka; to determine how much MB/sec a cluster can handle; how to set up automated benchmark runs (including the repo), and using this to find and optimize client-side producer configuration properties
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Roland Hochmut ist der Project Tech Lead (PTL) und Software Architect bei Monasca, das Open –Source Monitoring-as-a-Service (at-Scale) OpenStack Project (https://wiki.openstack.org/wiki/Monasca). Er konzentriert sich auf die Entwicklung einer leistungsstarken, skalierbaren und zuverlässigen Turn-Key Monitoring Lösung, die Einfluss hat auf die leitenden Trends und Innovationen der Industrie was Streaming von Daten, Analyse und Big Data betrifft. Er ist auch verantwortlich für die Metrics Processing Pipeline für HP`s öffentliche Cloud. Er hat Erfahrung in mehreren Software-Bereichen und Domänen, sowohl von 3-D Computer Grafiken als auch von Remote Desktop Visualisierung und Cloud Computing und Monitoring.
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Monasca, monasca.io ist eine Turn-Key Open Source OpenStack Monitoring-as-a-Service Plattform, die Authentifizierung und multi-Tenancy mittels OpenStack Keystone Identity Service unterstützt. Monasca ist eine hoch skalierbare, leistungsfähige und Fehler-tolerante Monitoring-as-a-Service Lösung, die Push-based Streaming-Metrics, Gesundheit/Status, Alarmierung/Thresholding und Benachrichtigungen unterstützt. Logging-as-a-Service befindet sich in der Entwicklung, und das Ziel ist es eine umfassende und integrierte Monitoring Lösung für Open Stack Clouds zur Verfügung zu stellen, die auch Kennzahlen, Events und Logs unterstützt.
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
Talk from FOSDEM 2022
It's easy to get misled into overconfidence based on the performance of powerful servers, given today's monster core counts and RAM sizes. However, the reality of high concurrency usage is often disappointing, with less throughput than one would expect. Because of its internals and its multi-process architecture, PostgreSQL is very particular about how it likes to deal with high concurrency and in some cases it can slow down to the point where it looks like it's not performing as it should. In this talk we'll take a look at potential pitfalls when you throw a lot of work at your database. Specifically, very high concurrency and resource contention can cause problems with lock waits in Postgres. Very high transaction rates can also cause problems of a different nature. Finally, we will be looking at ways to mitigate these by examining our queries and connection parameters, leveraging connection pooling and replication, or adapting the workload.
Topics:
1. Understand what we mean by high concurrency.
2. Understand ACID & MVCC in Postgres.
3. Understand how high concurrency affects Postgres performance.
4. Understand how locks/latches affect Postgres performance.
5. Understand how high transaction rates can affect Postgres.
6. Mitigation strategies for high concurrency scenarios.
Captial One: Why Stream Data as Part of Data Transformation?ScyllaDB
Event-driven architectures are increasingly part of a complete data transformation solution. Learn how to employ Apache Kafka, Cloud Native Computing Foundation’s NATS, Amazon SQS, or other message queueing technologies. This talks covers the details of each, their advantages and disadvantages and how to select the best for your company’s needs.
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...HostedbyConfluent
If your data platform is powered only by batch data processing, you know you are always trailing your customer. Your databases aren’t always up to date. Your inability to have a synchronized data flow across systems leads to operational inefficiencies. And, your dreams of running advanced real-time AI and ML applications can’t be fulfilled. However, you might be wary of the implications of turning your product into an event-driven one. In this presentation we’ll share our experience transforming our CDP-based marketing orchestration engine to be both real-time and highly scalable with the Kafka ecosystem. We will look into how we saved resources with Connect when ingesting and syncing data with NoSQL databases, data warehouses and third-party platforms. What we did to turn ksqlDB into our data transformation, aggregation and querying hub, reducing latency and costs. How Streams helps us activate multiple real-time applications such as building identity graphs, updating materialized views in high frequency for efficient real-time lookups and inferencing machine learning models. Finally, we will look at how Confluent Cloud solved our pre-rollout sizing and scaling questions, significantly reducing time-to-market.
IoT is becoming more pervasive, and it is a crucial part of digital transformation initiatives across industries. Knowing the essential building blocks can help you in your journey.
The first step towards this journey is learning about the de-facto standard for IoT messaging, MQTT, which helps reliably connect devices and efficiently move data bidirectionally between them even in unreliable networks.
Here're the webinar slides where Mary Grygleski walks you through MQTT history, some of the key features of this lightweight IoT protocol, and how it can be used for several IoT and IIoT use cases.
About the Speaker.
Mary Grygleski is the Senior Developer Advocate at HiveMQ. Based out of Chicago, Mary is a Java Champion and President and Executive Board Member of the Chicago Java Users Group (CJUG). She is also co-organizers for the Data, Cloud and AI In Chicago, Chicago Cloud, and IBM Cloud Chicago meetup groups. She has extensive experience in product and application design, development, integration, and deployment experience, and specializes in Reactive Java, Open Source, and cloud-enabled distributed systems.
To watch the webinar recording:
https://www.hivemq.com/webinars/back-to-the-basics-an-introduction-to-mqtt/
Similar to Cloud Messaging Service: Technical Overview (20)
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
3. What is CMS
3
• Hosted Pub / Sub
• Multi tenant (Auth / Quotas / Load Balancer)
• Horizontally scalable
• Highly available, durable and consistent storage
• Geo Replication
• In production since 2013
CMS - Technical Overview
CMS Cluster
Producer
Broker
Consumer
Bookie
ZK
Global
ZK
Replication
4. CMS key features
4 CMS - Technical Overview
• Multi-tenancy / hosted
• Operating a system at scale is hard and requires deep understanding of internals
• Authentication / Self service provisioning / Quotas
• SLAs (Write latency 2ms avg - 5ms 99pct)
• Maintain the same latencies and throughput under backlog draining scenarios
• Simple high level API with clear ordering, durability and consistency semantics
• Geo-replication
• Single API call to configure regions to replicate to
• Load balancer: Dynamically optimize topics assignment to brokers
• Support large number of topics
• Store subscription position
• Apps don’t need to store it
• Able to delete data as soon as it's consumed
• Support round-robin distribution across multiple consumers
5. Work load examples
5 CMS - Technical Overview
Challenge # Topics # Producers /
topic
# Subscriptions /
topic
Produced
msg rate / s / topic
Fan-out 1 1 1 K 1 K
Throughput & latency 1 1 1 100 K
# Topics & latency 1 M 1 10 10
Fan-in 1 1 K 1 > 100 K
• Design to support wide range of use cases
• Need to be cost effective in every case
7. Messaging model
7 CMS - Technical Overview
• Producers can attach to a topic and send messages to it
• A subscription is a durable resources that is the recipient of all messages sent to
the topic, after its creation
• Subscriptions do have a type:
• “Exclusive” means that only one consumer is allowed to attach to this subscription. First
consumer decides the type.
• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No
ordering guarantees.
• “Failover” allows multiple consumers, though only one is receiving messages at a given
point, while others are in standby mode.
Consumer-5
Failover
Subscription-C
Consumer-4
Consumer-3
Consumer-2
Subscription-B
Shared
Exclusive
Consumer-1
Subscription-AProducer-X
Producer-Y
Topic
8. Client API
8
▪ Expose messaging model concepts (producer/consumer)
▪ C++ and Java
▪ Connection pooling
▪ Handle recoverable failures transparently (reconnect / resend
messages) without compromising ordering guarantees
▪ Sync / async version of every operation
CMS - Technical Overview
9. Java producer example
9
CmsClient client = CmsClient.create("http://<broker vip>:4080");
Producer producer = client.createProducer("my-topic");
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});
CMS - Technical Overview
10. Java consumer example
10
CmsClient client = CmsClient.create(“http://<broker vip>:4080");
Consumer consumer = client.subscribe(
“my-topic",
"my-subscription-name",
SubscriptionType.Exclusive);
// Blocks until message available
Message msg = consumer.receive();
// Do something...
consumer.acknowledge(msg);
CMS - Technical Overview
11. System overview
11 CMS - Technical Overview
Broker
• State-less
• Maintain in memory cache of
messages
• Read from Bookkeeper when
cache miss
Bookkeeper
• Distributed write-ahead log
• Create many ledgers
• Append entries
• Read entries
• Delete ledger
• Consistent reads
• Single writer (the broker)
CMS Cluster
Broker
Bookie
ZK
Global
ZK
Replication
Native
dispatcher
Managed
Ledger
BK
Client
Global
replicators
Cache
Load
Balancer
Producer App
CMS client
Consumer App
CMS client
12. System overview
12 CMS - Technical Overview
Native dispatcher
• Async Netty server
Global replicators
• If topic is global, republish
messages in other regions
Global Zookeeper
• ZK instance with participants in
multiple US regions
• Consistent data store for
customers configuration
• Accept writes with one region
downCMS Cluster
Broker
Bookie
ZK
Global
ZK
Replication
Native
dispatcher
Managed
Ledger
BK
Client
Global
replicators
Cache
Load
Balancer
Producer App
CMS client
Consumer App
CMS client
13. Partitioned topics
13
▪ Client lib has a wrapper producer/
consumer implementation
▪ No API changes
▪ Producers can decide how to
assign messages to partitions:
▪ Single partition
▪ Round robin
▪ Provide a key on the message
▪ Hash of the key determines the
partition
▪ Custom routing
CMS - Technical Overview
App
CMS Cluster
Broker 1
Producer
T1
P0
P1
P2
P3
P4
T1-
P0
Broker 2
Broker 3
T1-
P1
T1-
P2
T1-
P3
T1-
P4
14. Partitioned topics
14
▪ Consumers can use all
subscription type with the same
semantics
▪ In “Failover” subscription type, the
election is done per partition
▪ Evenly spread the partitions
assignment across all available
consumers
▪ No need for ZK coordination
CMS - Technical Overview
CMS Cluster
Broker 1
App
Consumer-1
T1
C0
C1
C2
C3
C4
T1-
P0
Broker 2
Broker 3
T1-
P1
T1-
P2
T1-
P3
T1-
P4
App
Consumer-2
T1
C0
C1
C2
C3
C4
16. CMS Bookkeeper usage
16
▪ CMS uses Bookkeeper through a higher level interface of
ManagedLedger:
› A single managed ledger represent the storage of a single topic
› Maintains list of currently active BK ledgers
› Maintains the subscription positions using an additional ledger to checkpoint the last
acknowledged message in the stream
› Cache data
› Deletes ledgers when all cursors are done with them
CMS - Technical Overview
17. Bookie internal structure
17 CMS - Technical Overview
• Writes are written both to
journal and to ledger storage
(in different device)
• Ledger storage writes are
fsynced periodically
• Reads are only coming from
ledger storage
• Entries are interleaved in entry
log files
• Ledger indexes are used to
find entries offset
18. Bookkeeper issues
18
▪ Performance degrades when writing to many ledgers at the same time
▪ When there are heavy reads, the ledger storage device gets slow and
will impact writes
▪ Ledger storage flushes need to fsync many ledger index files each time
CMS - Technical Overview
19. Bookie storage improvements
19 CMS - Technical Overview
• Writes are written both to
journal and to in memory write
cache
• Entries are periodically flushed
• Entries are sorted by ledger to
be sequential on disk (per
flush period)
• Since entries are sequential,
we added read-ahead cache
• Location index is mostly kept
in memory and only updated
during flush
20. Bookkeeper write latency
20
▪ After hardware, next limit to achieve low latency is JVM GC
▪ GC pauses are unavoidable. Try to keep them around ~50ms and as
least as frequents as possible
› Switched BK client and servers to use Netty pooled ref-counted buffers and direct
memory to hide it from GC and eliminate payload copies
› Extensively profiled allocations and substantially reduced per-entry objects allocations
• Use Recycler pattern to pool objects (very efficient for same thread allocate/release)
• Primitive collections
• Array queue instead of linked queues in executors
• Open hash maps instead of linked hash maps
• BTree instead of ConcurrentSkipList
CMS - Technical Overview
23. Auto batching
23
▪ Send messages in batches throughout the system
▪ Transparent to application
▪ Configure group timing and size: e.g.: 1ms / 128Kb
▪ For the same byte/s throughput lower the txn/s through the system
› Less CPU usage in broker/bookies
› Lower GC pressure
CMS - Technical Overview
24. Low durability
24
▪ Current throughput bottleneck for bookie writes is journal syncs
▪ Could add more bookies but bigger cost
▪ Some use cases are ok to lose data in rare occasions
▪ Solution
› Store data in bookies
• No memory limitation, can build big backlog
› Don’t write to bookie journal
• Data is stored in write cache in 2 bookies + broker cache
› Can lose < 1min data in case 1 broker & 2 bookies crash
▪ Higher throughput with less bookies
▪ Lower publish latency
CMS - Technical Overview