Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. This presentation from Strata 2017 in New York provides an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
Apache Pulsar is the next generation messaging system that uses a fundamentally different architecture to achieve durability, performance, scalability, efficiency, multi-tenancy and geo replication.
Data processing use cases, from transformation to analytics, perform tasks that require various combinations of queuing, streaming & lightweight processing steps. Until now, supporting all of those needs has required different systems for each task--stream processing engines, messaging queuing middleware, & streaming messaging systems. That has led to increased complexity for development & operations.
In this session, well discuss the need to unify these capabilities in a single system & how Apache Pulsar was designed to address that. Apache Pulsar is a next generation distributed pub-sub system that was developed & deployed at Yahoo. Streamlios Karthik Ramasamy, will explain how the architecture & design of Pulsar provides the flexibility to support developers & applications needing any combination of queuing, messaging, streaming & lightweight compute.
Pulsar - flexible pub-sub for internet scaleMatteo Merli
Pub-Sub messaging is a very convenient abstraction that allows system and application developers to decouple components and let them communicate, by acting as durable buffer for transient data, or as a persistent log from where to recover after crashes. This talk will present an overview of Apache Pulsar, the reasons that led to its development and how it enabled many teams at Yahoo and to build scalable and reliable applications. Apache Pulsar has become the defacto pub-sub messaging at Yahoo serving 100+ applications and processing 100’s of billions of messages for over 3+ years.
In this talk, we will explore in detail different categories of use cases that highlight how Pulsar can be applied to solve a broad range of problems thanks to its flexible messaging model that supports both queuing and streaming semantics with a focus on durability and transaction guarantees.
Apache Pulsar, Supporting the Entire Lifecycle of Streaming DataStreamNative
We have long stressed that there is more and more a need for unified messaging and streaming and that Apache Pulsar is the platform that better supports this vision and makes it possible, at a large scale. In his talk, Matteo Merli will show how we can take this messaging & streaming unified paradigm one step further, to fully take advantage of their integration. The result is a drastically simplified architecture: a single system that is able to support the data throughout its entire lifecycle, from when the event is happening down to the historical archiving. The ramifications of this shift are big, as we can see Pulsar is in the perfect spot to enable tighter integration between online and offline worlds.
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. This presentation from Strata 2017 in New York provides an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
Apache Pulsar is the next generation messaging system that uses a fundamentally different architecture to achieve durability, performance, scalability, efficiency, multi-tenancy and geo replication.
Data processing use cases, from transformation to analytics, perform tasks that require various combinations of queuing, streaming & lightweight processing steps. Until now, supporting all of those needs has required different systems for each task--stream processing engines, messaging queuing middleware, & streaming messaging systems. That has led to increased complexity for development & operations.
In this session, well discuss the need to unify these capabilities in a single system & how Apache Pulsar was designed to address that. Apache Pulsar is a next generation distributed pub-sub system that was developed & deployed at Yahoo. Streamlios Karthik Ramasamy, will explain how the architecture & design of Pulsar provides the flexibility to support developers & applications needing any combination of queuing, messaging, streaming & lightweight compute.
Pulsar - flexible pub-sub for internet scaleMatteo Merli
Pub-Sub messaging is a very convenient abstraction that allows system and application developers to decouple components and let them communicate, by acting as durable buffer for transient data, or as a persistent log from where to recover after crashes. This talk will present an overview of Apache Pulsar, the reasons that led to its development and how it enabled many teams at Yahoo and to build scalable and reliable applications. Apache Pulsar has become the defacto pub-sub messaging at Yahoo serving 100+ applications and processing 100’s of billions of messages for over 3+ years.
In this talk, we will explore in detail different categories of use cases that highlight how Pulsar can be applied to solve a broad range of problems thanks to its flexible messaging model that supports both queuing and streaming semantics with a focus on durability and transaction guarantees.
Apache Pulsar, Supporting the Entire Lifecycle of Streaming DataStreamNative
We have long stressed that there is more and more a need for unified messaging and streaming and that Apache Pulsar is the platform that better supports this vision and makes it possible, at a large scale. In his talk, Matteo Merli will show how we can take this messaging & streaming unified paradigm one step further, to fully take advantage of their integration. The result is a drastically simplified architecture: a single system that is able to support the data throughout its entire lifecycle, from when the event is happening down to the historical archiving. The ramifications of this shift are big, as we can see Pulsar is in the perfect spot to enable tighter integration between online and offline worlds.
This is an overview of interesting features from Apache Pulsar. Keep in mind that by the time I did this presentation I did not have used Pulsar yet. It's just my first impressions from the list of features.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
I Heart Log: Real-time Data and Apache KafkaJay Kreps
This presentation discusses how logs and stream-processing can form a backbone for data flow, ETL, and real-time data processing. It will describe the challenges and lessons learned as LinkedIn built out its real-time data subscription and processing infrastructure. It will also discuss the role of real-time processing and its relationship to offline processing frameworks such as MapReduce.
This presentation provides an introduction to Apache Kafka and describes best practices for working with fast data streams in Kafka and MapR Streams.
The code examples used during this talk are available at github.com/iandow/design-patterns-for-fast-data.
Author:
Ian Downard
Presented at the Portland Java User Group on Tuesday, October 18 2016.
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
Nowadays, real-time computation is heavily used in cases such as online product recommendation, online payment fraud detection and etc.. In the streaming pipeline, Kafka is normally used to store a day/week data, but won't store years-long data, as in looking at the trend historically. So, a batch pipeline is needed for historical data computation. Thus, it's where the Lambda architecture comes in. Lambda has been proved to be effective, and a good balance of speed and reliability. We have been running many systems with Lambda architecture for many years. But the biggest detraction to Lambda architecture has been the need to maintain two distinct (and possibly complex) systems to generate both batch and streaming layers. With that, we have to split our business logic into many segments across different places, which is a challenge to maintain as the business grows and it also increases communication overhead. Secondly, the data are duplicated in two different systems, and we have to move data among different systems for processing. With those challenges, we have been searching for alternatives and found Apache Pulsar a great fit. In this topic, I will show how we solve those problems with Apache Pulsar by making pulsar a unified storage backend for both batch and streaming pipeline, a solution that simplifies the s/w stack, lifts up our work efficiency and lowers the cost at the same time.
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
BY Jun Rao
From the Bay Area Apache Kafka September 2016 Meetup.
Abstract: To manage the ever-increasing volume and velocity of data within your company you have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center powered by Apache Kafka. But what needs to be done if one data center is not enough? In this session we describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence. We provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication and mirroring as well as disaster scenarios and failure handling.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
What’s easier than building a data pipeline nowadays? You add a few Apache Kafka clusters and a way to ingest data (probably over HTTP), design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse… wait, this looks like a lot of things, doesn’t it? And you probably want to make it highly scalable and available too. Join this session to learn best practices for building a data pipeline, drawn from my experience at Activision/Demonware. I’ll share the lessons learned about scaling pipelines, not only in terms of volume, but also in terms of supporting more games and more use cases. You’ll also hear about message schemas and envelopes, Apache Kafka organization, topics naming conventions, routing, reliable and scalable producers and the ingestion layer, as well as stream processing.
Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.
#PulsarBeijingMeetup
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
Introduction of Apache Kafka - the open source platform for real time message queuing and reliable, scalable, distributed event handling and high volume pub/sub implementation.
see GitHub https://github.com/MaartenSmeets/kafka-workshop for the workshop resources.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
This is an overview of interesting features from Apache Pulsar. Keep in mind that by the time I did this presentation I did not have used Pulsar yet. It's just my first impressions from the list of features.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
I Heart Log: Real-time Data and Apache KafkaJay Kreps
This presentation discusses how logs and stream-processing can form a backbone for data flow, ETL, and real-time data processing. It will describe the challenges and lessons learned as LinkedIn built out its real-time data subscription and processing infrastructure. It will also discuss the role of real-time processing and its relationship to offline processing frameworks such as MapReduce.
This presentation provides an introduction to Apache Kafka and describes best practices for working with fast data streams in Kafka and MapR Streams.
The code examples used during this talk are available at github.com/iandow/design-patterns-for-fast-data.
Author:
Ian Downard
Presented at the Portland Java User Group on Tuesday, October 18 2016.
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
My talk from ScalaDays 2016 in New York on May 11, 2016:
Transitioning from a monolithic application to a set of microservices can help increase performance and scalability, but it can also drastically increase complexity. Layers of inter-service network calls for add latency and an increasing risk of failure where previously only local function calls existed. In this talk, I'll speak about how to tame this complexity using Apache Kafka and Reactive Streams to:
- Extract non-critical processing from the critical path of your application to reduce request latency
- Provide back-pressure to handle both slow and fast producers/consumers
- Maintain high availability, high performance, and reliable messaging
- Evolve message payloads while maintaining backwards and forwards compatibility.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
Nowadays, real-time computation is heavily used in cases such as online product recommendation, online payment fraud detection and etc.. In the streaming pipeline, Kafka is normally used to store a day/week data, but won't store years-long data, as in looking at the trend historically. So, a batch pipeline is needed for historical data computation. Thus, it's where the Lambda architecture comes in. Lambda has been proved to be effective, and a good balance of speed and reliability. We have been running many systems with Lambda architecture for many years. But the biggest detraction to Lambda architecture has been the need to maintain two distinct (and possibly complex) systems to generate both batch and streaming layers. With that, we have to split our business logic into many segments across different places, which is a challenge to maintain as the business grows and it also increases communication overhead. Secondly, the data are duplicated in two different systems, and we have to move data among different systems for processing. With those challenges, we have been searching for alternatives and found Apache Pulsar a great fit. In this topic, I will show how we solve those problems with Apache Pulsar by making pulsar a unified storage backend for both batch and streaming pipeline, a solution that simplifies the s/w stack, lifts up our work efficiency and lowers the cost at the same time.
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
BY Jun Rao
From the Bay Area Apache Kafka September 2016 Meetup.
Abstract: To manage the ever-increasing volume and velocity of data within your company you have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center powered by Apache Kafka. But what needs to be done if one data center is not enough? In this session we describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence. We provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication and mirroring as well as disaster scenarios and failure handling.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
What’s easier than building a data pipeline nowadays? You add a few Apache Kafka clusters and a way to ingest data (probably over HTTP), design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse… wait, this looks like a lot of things, doesn’t it? And you probably want to make it highly scalable and available too. Join this session to learn best practices for building a data pipeline, drawn from my experience at Activision/Demonware. I’ll share the lessons learned about scaling pipelines, not only in terms of volume, but also in terms of supporting more games and more use cases. You’ll also hear about message schemas and envelopes, Apache Kafka organization, topics naming conventions, routing, reliable and scalable producers and the ingestion layer, as well as stream processing.
Nozomi from Yahoo! Japan gave a presentation how Yahoo! Japan uses Apache Pulsar to build their internal messaging platform for processing tens of billions of messages every day. He explains why Yahoo! Japan choose Pulsar and what are the use cases of Apache Pulsar and their best practices.
#PulsarBeijingMeetup
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
Introduction of Apache Kafka - the open source platform for real time message queuing and reliable, scalable, distributed event handling and high volume pub/sub implementation.
see GitHub https://github.com/MaartenSmeets/kafka-workshop for the workshop resources.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores.
In this presentation, I talk about:
- Why we chose Kafka
- How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware.
- How we produce + consume events
- How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
Yahoo recently open-sourced Pulsar, a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. Pulsar is used across various Yahoo applications for large scale data pipelines. Learn more about Pulsar architecture and use-cases in this talk.
Speakers:
Matteo Merli from Pulsar team at Yahoo
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Roland Hochmut ist der Project Tech Lead (PTL) und Software Architect bei Monasca, das Open –Source Monitoring-as-a-Service (at-Scale) OpenStack Project (https://wiki.openstack.org/wiki/Monasca). Er konzentriert sich auf die Entwicklung einer leistungsstarken, skalierbaren und zuverlässigen Turn-Key Monitoring Lösung, die Einfluss hat auf die leitenden Trends und Innovationen der Industrie was Streaming von Daten, Analyse und Big Data betrifft. Er ist auch verantwortlich für die Metrics Processing Pipeline für HP`s öffentliche Cloud. Er hat Erfahrung in mehreren Software-Bereichen und Domänen, sowohl von 3-D Computer Grafiken als auch von Remote Desktop Visualisierung und Cloud Computing und Monitoring.
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Monasca, monasca.io ist eine Turn-Key Open Source OpenStack Monitoring-as-a-Service Plattform, die Authentifizierung und multi-Tenancy mittels OpenStack Keystone Identity Service unterstützt. Monasca ist eine hoch skalierbare, leistungsfähige und Fehler-tolerante Monitoring-as-a-Service Lösung, die Push-based Streaming-Metrics, Gesundheit/Status, Alarmierung/Thresholding und Benachrichtigungen unterstützt. Logging-as-a-Service befindet sich in der Entwicklung, und das Ziel ist es eine umfassende und integrierte Monitoring Lösung für Open Stack Clouds zur Verfügung zu stellen, die auch Kennzahlen, Events und Logs unterstützt.
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!?!??
My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
https://www.datainmotion.dev/2022/04/flip-py-pi-enviroplus-using-apache.html
https://dzone.com/articles/pulsar-in-python-on-pi
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
(Current22) Let's Monitor The Conditions at the Conference
Let's Monitor The Conditions at the Conference
Session Time11:15 am - 12:00 pm Session DateWednesday, 5 October 2022 Session Type:In-Person Location:Ballroom G
Session Description:
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk. Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!? My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
Timothy Spann, StreamNative
Developer Advocate
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC.
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
Alluxio Bay Area Meetup March 14th
Join the Alluxio Meetup group: https://www.meetup.com/Alluxio
Alluxio Community slack: https://www.alluxio.org/slack
What are the key considerations people should look at to decide on the right technology to meet their messaging and queuing need? This presentation provides an overview of key requirements and introduces Apache Pulsar, the open source messaging and queuing solution.
World of Tanks Experience of Using KafkaLevon Avakyan
In this paper I speak about BigWorld technology, WoT server, Apache Kafka and how we started to use it together. What difficulties we had and how we had solved them.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
2.Cellular Networks_The final stage of connectivity is achieved by segmenting...JeyaPerumal1
A cellular network, frequently referred to as a mobile network, is a type of communication system that enables wireless communication between mobile devices. The final stage of connectivity is achieved by segmenting the comprehensive service area into several compact zones, each called a cell.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
Italy Agriculture Equipment Market Outlook to 2027harveenkaur52
Agriculture and Animal Care
Ken Research has an expertise in Agriculture and Animal Care sector and offer vast collection of information related to all major aspects such as Agriculture equipment, Crop Protection, Seed, Agriculture Chemical, Fertilizers, Protected Cultivators, Palm Oil, Hybrid Seed, Animal Feed additives and many more.
Our continuous study and findings in agriculture sector provide better insights to companies dealing with related product and services, government and agriculture associations, researchers and students to well understand the present and expected scenario.
Our Animal care category provides solutions on Animal Healthcare and related products and services, including, animal feed additives, vaccination
3. What is Apache Pulsar?
3
Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a
single partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed
applications
Unified messaging
model
Support both Topic &
Queue semantic in a
single model
Delivery Guarantees
At least once, at most
once and effectively once
Low Latency
Low publish latency of
5ms at 99pct
Highly scalable
Can support millions of
topics
4. Usage of Pulsar
• In production for 3+ years at Yahoo
• Powering critical products like:
• Yahoo Mail, Yahoo Finance, Gemini Ads,
Flickr and Sherpa (NoSQL database)
• 80+ tenants
• 2.3 Million topics
• 100 B messages / day
• Full-mesh replication in 8 data-centers
4
5. Why build a new system?
• No existing solution to satisfy requirements
• Multi tenant — 1M topics — Low latency — Durability — Geo replication
• Kafka doesn’t scale well with many topics:
• Storage model based on individual directory per topic partition
• Enabling durability kills the performance
• Many other choking points: getting stats, access to metadata, flow-control
• Operations are not very convenient
• eg: replacing a server, manual commands to copy the data and involves clients
• clients access to ZK clusters not desirable
• Ability to manage large backlogs
• No scalable support to keep consumer position
5
6. Architecture view
Separate layers between
brokers bookies
• Broker and bookies can
be added
independently
• Traffic can be shifted
very quickly across
brokers
• New bookies will ramp
up on traffic quickly
6
Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper
Apache Pulsar
Producer Consumer
7. Apache BookKeeper
• A replicated log storage
• Low-latency durable writes
• Simple repeatable read
consistency
• Highly available
• Store many logs per node
• I/O Isolation
7
9. BookKeeper - Storage
• A single bookie can serve and
store thousands of ledgers
• Write and read paths are
separated:
• Avoid read activity to impact
write latency
• Writes are added to in-
memory write-cache and
committed to journal
• Write cache is flushed in
background to separated
device
• Entries are sorted to allow for
mostly sequential reads
9
20. Use cases - Message Queue
• Decouple online / background
• Provide high-availability
• Reliable data transport
20
Online
events
Pulsar
topic 1
Worker 1
Worker 2
Worker 3
Pulsar
topic 2
Low latency
publish
Long running task
Notification
21. Use cases - Message Queue
• Async processing of time intensive background tasks
• Publishes and complete HTTP request
• Examples: Image / Video transcoding, bulk operations (large folder deletions)
21
22. Use cases - Message Queue
22
• Delegate the processing to multiple compute jobs which interact with micro-services
• Wait until the response is pushed back to the HTTP server
• Examples: breaking monolithic applications into micro-services
23. Use cases - Notifications
• Change data capture
• Listeners are frequently different tenants
• Quotas needs to ensure producer is not affected
23
Event
Pulsar
topic
Component 1
Component 2
Component 3
Listeners
24. Use cases - Notifications
• Replicating DB transactions across regions
• Notification to multiple interested tenants (example: new user
account, …)
24
25. Use cases - Feedback system
• Coordinate a large number of machines
• Propagate state
25
External
inputs
Pulsar
topic 1
Serving
system
Serving
system
Serving
system
Pulsar
topic 2
Controller
Updates
Feedback
30. Geo-Replication
• Scalable
asynchronous
replication
• Integrated in the
broker message
flow
• Simple
configuration to
add/remove
regions
30
Topic (T1) Topic (T1)
Topic (T1)
Subscrip@on (S1) Subscrip@on (S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C
32. Pulsar — Conclusion
• A fast durable, distributed pub/sub messaging
• Flexible Traditional Messaging: Queuing and Pub/Sub
• Focus on message dispatch and consumption
• Remove data as soon as possible if they are not needed
• It is backed by a scalable log store
• durable message store, zero data loss
• allow rewinding / reprocessing messages for backfill, bootstrap systems and
stream computing
• It carries all the fantastic features from the log store.
32