Apache Pulsar is a flexible pub-sub messaging system backed by a durable log storage. It uses a segment-centric architecture where messages are stored in independent segments across multiple brokers and bookies for redundancy. This allows for strong durability, high throughput, and seamless expansion without data rebalancing. Pulsar brokers serve client requests and acquire ownership of topics, while bookies provide durable storage with replication for fault tolerance.
This talk will cover lessons learned at Community Engine regarding MongoDB, including: why we moved away from an Hybrid solution using SQL and MongoDB; an outline of the technologies and what we learned using MongoDB on Amazon Web Services; the MongoDB C# driver; MongoDB with SOLR for Full Text Search; how we do migration, deployment and more.
As SeatGeek's traffic continues to grow, so too has its infrastructure needs. Recent expansion of the operations team has allowed us to replace our existing service discovery solution with Consul, improving our ability to scale and manage an elastic cloud environment. At the same time,we saw this opportunity to migrate from EC2 Classic to VPC and take advantage of AWS's latest offerings.
In this talk, we will discuss Consul, the problems it has solved, and adoption issues that surfaced along the way. In addition, we will also highlight our experiences with VPC, including setup, routing, access control, and migration with the extremely useful EC2 ClassicLink.
PDF with presenter notes and links can be found here:
http://bit.ly/1OH7HC0
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Stateful Applications On the Cloud: A PayPal JourneyTesora
1) PayPal operates a large OpenStack cloud platform with over 10,000 physical servers hosting 100,000 VMs to run over 1000 services for their business.
2) They wanted to move stateful applications like messaging, streaming, caching and databases to the cloud but faced challenges with agility, efficiency, elasticity and onboarding while preserving stateful data.
3) After evaluating options like network block storage, ephemeral disks, and hyperconverged storage, they chose to use VMs with attached local disks which does not lose data when VMs are lost and has lower network bandwidth needs and costs, though storage is lost if the host fails.
This document discusses several Platform as a Service (PaaS) alternatives for .NET applications. It describes what PaaS is and some of the challenges it presents. It then evaluates several specific PaaS options for .NET including Apprenda, CloudFoundry, Uhuru, Tier3, AppHarbor, and AWS Elastic Beanstalk. It concludes that Windows Azure is still the best public PaaS for .NET, but that CloudFoundry-based private PaaS solutions are worth considering to avoid vendor lock-in.
Matteo Merli, the tech lead for Cloud Messaging Service at Yahoo, went through their design decisions, how they reached that and how they leverage Apache BookKeeper to implement a multi-tenant messaging service.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
Apache Pulsar is a flexible pub-sub messaging system backed by a durable log storage. It uses a segment-centric architecture where messages are stored in independent segments across multiple brokers and bookies for redundancy. This allows for strong durability, high throughput, and seamless expansion without data rebalancing. Pulsar brokers serve client requests and acquire ownership of topics, while bookies provide durable storage with replication for fault tolerance.
This talk will cover lessons learned at Community Engine regarding MongoDB, including: why we moved away from an Hybrid solution using SQL and MongoDB; an outline of the technologies and what we learned using MongoDB on Amazon Web Services; the MongoDB C# driver; MongoDB with SOLR for Full Text Search; how we do migration, deployment and more.
As SeatGeek's traffic continues to grow, so too has its infrastructure needs. Recent expansion of the operations team has allowed us to replace our existing service discovery solution with Consul, improving our ability to scale and manage an elastic cloud environment. At the same time,we saw this opportunity to migrate from EC2 Classic to VPC and take advantage of AWS's latest offerings.
In this talk, we will discuss Consul, the problems it has solved, and adoption issues that surfaced along the way. In addition, we will also highlight our experiences with VPC, including setup, routing, access control, and migration with the extremely useful EC2 ClassicLink.
PDF with presenter notes and links can be found here:
http://bit.ly/1OH7HC0
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Stateful Applications On the Cloud: A PayPal JourneyTesora
1) PayPal operates a large OpenStack cloud platform with over 10,000 physical servers hosting 100,000 VMs to run over 1000 services for their business.
2) They wanted to move stateful applications like messaging, streaming, caching and databases to the cloud but faced challenges with agility, efficiency, elasticity and onboarding while preserving stateful data.
3) After evaluating options like network block storage, ephemeral disks, and hyperconverged storage, they chose to use VMs with attached local disks which does not lose data when VMs are lost and has lower network bandwidth needs and costs, though storage is lost if the host fails.
This document discusses several Platform as a Service (PaaS) alternatives for .NET applications. It describes what PaaS is and some of the challenges it presents. It then evaluates several specific PaaS options for .NET including Apprenda, CloudFoundry, Uhuru, Tier3, AppHarbor, and AWS Elastic Beanstalk. It concludes that Windows Azure is still the best public PaaS for .NET, but that CloudFoundry-based private PaaS solutions are worth considering to avoid vendor lock-in.
Matteo Merli, the tech lead for Cloud Messaging Service at Yahoo, went through their design decisions, how they reached that and how they leverage Apache BookKeeper to implement a multi-tenant messaging service.
Watch this talk here: https://www.confluent.io/online-talks/how-apache-kafka-works-on-demand
Pick up best practices for developing applications that use Apache Kafka, beginning with a high level code overview for a basic producer and consumer. From there we’ll cover strategies for building powerful stream processing applications, including high availability through replication, data retention policies, producer design and producer guarantees.
We’ll delve into the details of delivery guarantees, including exactly-once semantics, partition strategies and consumer group rebalances. The talk will finish with a discussion of compacted topics, troubleshooting strategies and a security overview.
This session is part 3 of 4 in our Fundamentals for Apache Kafka series.
Single tenant software to multi-tenant SaaS using K8SCloudLinux
This document discusses how Kubernetes can be used to convert single-tenant software applications into multi-tenant SaaS applications. Key points include:
1) Kubernetes can orchestrate each tenant as a separate pod or set of pods, providing isolation, easy scalability, and the ability to customize deployments for each tenant.
2) This approach simplifies many challenges of traditional SaaS like customer management, billing integration, high availability, upgrades and rollbacks by leveraging Kubernetes features.
3) An initial test project converted an existing PHP/MySQL billing application for 10,000+ companies into a multi-tenant SaaS deployment using Kubernetes, requiring under 40 hours of development.
If you're WordPress site is slower or has low performance scores, watch as I show you some of the tools I use to improve speed and performance for your own site as well as clients.
In the presentation, I go through some plugins you should be using on your site that are easy to set up as well as a basic setup of W3 Total Cache and using a CDN for your site.
This document discusses using microservices with Kafka. It describes how Kafka can be used to connect microservices for asynchronous communication. It outlines various features of Kafka like high throughput, replication, partitioning, and how it can provide reliability. Examples are given of how microservices could use Kafka for logging, filtering messages, and dispatching to different topics. Performance benefits of Kafka are highlighted like scalability and ability to handle high volumes of messages.
The document discusses the MaxL scripting language, which can automate repetitive tasks in Essbase, provides benefits like minimal cost and small file size, and offers strategies for effectively building scripts like parameterizing common actions, using variables and error handling, and commenting code thoroughly.
Apache Kafka as Message Queue for your microservices and other occasionsMichael Reinsch
This talk provides a quick intro to Apache Kafka, the basic concepts, and why it's good as a message queue.
We'll also explore the benefits and challenges of using a message queue as base of your microservices infrastructure (especially when transitioning from a monolith).
SignalR allows for real-time web functionality by facilitating the broadcasting of messages to connected clients. This presentation discusses scaling out SignalR applications using built-in or custom backplanes, securing endpoints, enabling cross-domain calls, and alternatives to SignalR like Socket.IO. It recommends starting with a built-in backplane for most cases and considering a custom solution for high message volumes. Things to watch out for include browser and OS compatibility.
The document discusses content delivery networks (CDNs), which provide advantages over typical hosting by distributing content storage across multiple servers located closer to users. This results in faster page load times, less bandwidth usage, and no single point of failure compared to traditional hosting with a single server. The document also provides tutorials and information on setting up CDNs with WordPress as well as listing several popular CDN providers.
The document discusses automating server provisioning and configuration using Ansible playbooks. It recommends developing playbooks using Vagrant to easily create, boot, and destroy virtual machines. Playbooks can then be run against real servers on Red Hat to provision them in a repeatable, documented process. Lessons learned are that Ansible playbooks serve as great infrastructure documentation and help refine the setup, and it's important to ensure playbooks are idempotent and prefer using modules over raw commands.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.Suneet Grover
The document summarizes 24/7 Customer's experience migrating from Apache Kafka 0.7 and 0.8 to the newer 0.10.0.1 version. It describes the challenges faced with sticky partitions and range-based mirror makers in 0.8. It details 24/7 Customer's upgrade path from 0.8 to 0.8.2.2 to 0.9 to the current 0.10.0.1 version. It also discusses the configurations, monitoring, and design considerations for running Kafka reliably across multiple data centers.
This document discusses content delivery networks (CDNs) and considerations for choosing a CDN. It explains that a CDN can decrease latency, prevent server overload, and increase security by caching and delivering content from edge servers closer to users. Key factors in choosing a CDN include traffic volume, content type (static vs dynamic), and level of user engagement. The document provides examples of how a CDN can optimize dynamic websites using technologies like Edge Side Includes to improve caching. It emphasizes the importance of working with CDN experts and covers technical topics related to deploying and monitoring a CDN.
This document discusses Comcast's use of OpenStack for cloud computing. It notes that Comcast has 34 regions, over 700 tenants, and 20,000 instances running on OpenStack. It details Comcast's history with OpenStack, including starting in 2012 with three regions on Essex and upgrading to newer versions over time. Currently, Comcast runs IceHouse across 34 regions, with over 960,000 cores, 20,000 VMs, and plans to deploy Mitaka this year across multiple regions.
Velocity is a distributed cache that allows sharing of cached data across multiple servers. In version 1, it is best suited for session state caching due to limitations in handling dependencies between cached objects. Future versions will expand its capabilities to support full output caching and read-write operations. Currently, Velocity provides a basic set of cache operations and management functionality through its client and server configuration.
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
William Hill is one of the UK’s largest, most well-established gaming companies with a global presence across 9 countries with over 16,000 employees. In recent years the gaming industry and in particular sports betting, has been revolutionised by technology. Customers now demand a wide range of events and markets to bet on both pre-game and in-play 24/7. This has driven out a business need to process more data, provide more updates and offer more markets and prices in real time.
At William Hill, we have invested in a completely new trading platform using Apache Kafka. We process vast quantities of data from a variety of feeds, this data is fed through a variety of odds compilation models, before being piped out to UI apps for use by our trading teams to provide events, markets and pricing data out to various end points across the whole of William Hill. We deal with thousands of sporting events, each with sometimes hundreds of betting markets, each market receiving hundreds of updates. This scales up to vast numbers of messages flowing through our system. We have to process, transform and route that data in real time. Using Apache Kafka, we have built a high throughput, low latency pipeline, based on Cloud hosted Microservices. When we started, we were on a steep learning curve with Kafka, Microservices and associated technologies. This led to fast learnings and fast failings.
In this session, we will tell the story of what we built, what went well, what didn’t go so well and what we learnt. This is a story of how a team of developers learnt (and are still learning) how to use Kafka. We hope that you will be able to take away lessons and learnings of how to build a data processing pipeline with Apache Kafka.
SUSE provides open source solutions to help customers define their future through digital transformation. Their software-defined infrastructure approach offers application delivery management, operations monitoring and patching, cluster deployment, and orchestration tools. Key products include SUSE Linux Enterprise Server, SUSE OpenStack Cloud, SUSE CaaS Platform, and SUSE Enterprise Storage. SUSE has grown through partnerships, contributions to open source projects, and recent product releases that expand support for technologies like ARM64, Kubernetes, and Cloud Foundry integration.
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...Redis Labs
I will build from scratch in this session a Microsoft ASP.NET website that caches WebAPI REST calls with both MSOpenTech’s Redis implementation for running while developing in Visual Studio as well as running on a Windows server running IIS. I will show you how to build a safe reusable caching library in c# that can be used in any .net project. I will also demonstrate how to use the Redis cache services that are available on Microsoft’s Azure cloud platform. Further, I’ll demonstrate a real world web site that uses Azure Redis cache and show statistics on how Redis improves performance consistently and reliably.
Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics. Producers write data to topics and consumers read from topics. The data is partitioned and replicated across clusters of machines called brokers for reliability and scalability. A common data format like Avro can be used to serialize the data.
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. Many organizations understand the use cases around their data – fraud detection, quality of service and technical operations, user behavior analysis, for example – but are not necessarily data infrastructure experts. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality.
Attendees will leave this session knowing not just which open source projects go into a system such as this, but how they work together, what tradeoffs and decisions need to be addressed, and how to present a single general purpose data platform to multiple applications. This session should be attended by data infrastructure engineers and architects planning, building, or maintaining similar systems.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB
Like many startups, Wish grew up on AWS. As our cluster grew and the price of SSDs fell, we started exploring bare metal. Fast-forward 2 years and we have hundreds of MongoDB instances on bare metal fully integrated with our AWS infrastructure. It wasn't all smooth sailing, but the performance & cost improvements were worth it! Hear the story of how we did it and gain a framework for thinking about how to make the leap from cloud-centric architecture to a hybrid model.
At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores.
In this presentation, I talk about:
- Why we chose Kafka
- How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware.
- How we produce + consume events
- How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.
Kafka is a distributed, replicated, and partitioned platform for handling real-time data feeds. It allows both publishing and subscribing to streams of records, and is commonly used for applications such as log aggregation, metrics, and streaming analytics. Kafka runs as a cluster of one or more servers that can reliably handle trillions of events daily.
Single tenant software to multi-tenant SaaS using K8SCloudLinux
This document discusses how Kubernetes can be used to convert single-tenant software applications into multi-tenant SaaS applications. Key points include:
1) Kubernetes can orchestrate each tenant as a separate pod or set of pods, providing isolation, easy scalability, and the ability to customize deployments for each tenant.
2) This approach simplifies many challenges of traditional SaaS like customer management, billing integration, high availability, upgrades and rollbacks by leveraging Kubernetes features.
3) An initial test project converted an existing PHP/MySQL billing application for 10,000+ companies into a multi-tenant SaaS deployment using Kubernetes, requiring under 40 hours of development.
If you're WordPress site is slower or has low performance scores, watch as I show you some of the tools I use to improve speed and performance for your own site as well as clients.
In the presentation, I go through some plugins you should be using on your site that are easy to set up as well as a basic setup of W3 Total Cache and using a CDN for your site.
This document discusses using microservices with Kafka. It describes how Kafka can be used to connect microservices for asynchronous communication. It outlines various features of Kafka like high throughput, replication, partitioning, and how it can provide reliability. Examples are given of how microservices could use Kafka for logging, filtering messages, and dispatching to different topics. Performance benefits of Kafka are highlighted like scalability and ability to handle high volumes of messages.
The document discusses the MaxL scripting language, which can automate repetitive tasks in Essbase, provides benefits like minimal cost and small file size, and offers strategies for effectively building scripts like parameterizing common actions, using variables and error handling, and commenting code thoroughly.
Apache Kafka as Message Queue for your microservices and other occasionsMichael Reinsch
This talk provides a quick intro to Apache Kafka, the basic concepts, and why it's good as a message queue.
We'll also explore the benefits and challenges of using a message queue as base of your microservices infrastructure (especially when transitioning from a monolith).
SignalR allows for real-time web functionality by facilitating the broadcasting of messages to connected clients. This presentation discusses scaling out SignalR applications using built-in or custom backplanes, securing endpoints, enabling cross-domain calls, and alternatives to SignalR like Socket.IO. It recommends starting with a built-in backplane for most cases and considering a custom solution for high message volumes. Things to watch out for include browser and OS compatibility.
The document discusses content delivery networks (CDNs), which provide advantages over typical hosting by distributing content storage across multiple servers located closer to users. This results in faster page load times, less bandwidth usage, and no single point of failure compared to traditional hosting with a single server. The document also provides tutorials and information on setting up CDNs with WordPress as well as listing several popular CDN providers.
The document discusses automating server provisioning and configuration using Ansible playbooks. It recommends developing playbooks using Vagrant to easily create, boot, and destroy virtual machines. Playbooks can then be run against real servers on Red Hat to provision them in a repeatable, documented process. Lessons learned are that Ansible playbooks serve as great infrastructure documentation and help refine the setup, and it's important to ensure playbooks are idempotent and prefer using modules over raw commands.
Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.Suneet Grover
The document summarizes 24/7 Customer's experience migrating from Apache Kafka 0.7 and 0.8 to the newer 0.10.0.1 version. It describes the challenges faced with sticky partitions and range-based mirror makers in 0.8. It details 24/7 Customer's upgrade path from 0.8 to 0.8.2.2 to 0.9 to the current 0.10.0.1 version. It also discusses the configurations, monitoring, and design considerations for running Kafka reliably across multiple data centers.
This document discusses content delivery networks (CDNs) and considerations for choosing a CDN. It explains that a CDN can decrease latency, prevent server overload, and increase security by caching and delivering content from edge servers closer to users. Key factors in choosing a CDN include traffic volume, content type (static vs dynamic), and level of user engagement. The document provides examples of how a CDN can optimize dynamic websites using technologies like Edge Side Includes to improve caching. It emphasizes the importance of working with CDN experts and covers technical topics related to deploying and monitoring a CDN.
This document discusses Comcast's use of OpenStack for cloud computing. It notes that Comcast has 34 regions, over 700 tenants, and 20,000 instances running on OpenStack. It details Comcast's history with OpenStack, including starting in 2012 with three regions on Essex and upgrading to newer versions over time. Currently, Comcast runs IceHouse across 34 regions, with over 960,000 cores, 20,000 VMs, and plans to deploy Mitaka this year across multiple regions.
Velocity is a distributed cache that allows sharing of cached data across multiple servers. In version 1, it is best suited for session state caching due to limitations in handling dependencies between cached objects. Future versions will expand its capabilities to support full output caching and read-write operations. Currently, Velocity provides a basic set of cache operations and management functionality through its client and server configuration.
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
William Hill is one of the UK’s largest, most well-established gaming companies with a global presence across 9 countries with over 16,000 employees. In recent years the gaming industry and in particular sports betting, has been revolutionised by technology. Customers now demand a wide range of events and markets to bet on both pre-game and in-play 24/7. This has driven out a business need to process more data, provide more updates and offer more markets and prices in real time.
At William Hill, we have invested in a completely new trading platform using Apache Kafka. We process vast quantities of data from a variety of feeds, this data is fed through a variety of odds compilation models, before being piped out to UI apps for use by our trading teams to provide events, markets and pricing data out to various end points across the whole of William Hill. We deal with thousands of sporting events, each with sometimes hundreds of betting markets, each market receiving hundreds of updates. This scales up to vast numbers of messages flowing through our system. We have to process, transform and route that data in real time. Using Apache Kafka, we have built a high throughput, low latency pipeline, based on Cloud hosted Microservices. When we started, we were on a steep learning curve with Kafka, Microservices and associated technologies. This led to fast learnings and fast failings.
In this session, we will tell the story of what we built, what went well, what didn’t go so well and what we learnt. This is a story of how a team of developers learnt (and are still learning) how to use Kafka. We hope that you will be able to take away lessons and learnings of how to build a data processing pipeline with Apache Kafka.
SUSE provides open source solutions to help customers define their future through digital transformation. Their software-defined infrastructure approach offers application delivery management, operations monitoring and patching, cluster deployment, and orchestration tools. Key products include SUSE Linux Enterprise Server, SUSE OpenStack Cloud, SUSE CaaS Platform, and SUSE Enterprise Storage. SUSE has grown through partnerships, contributions to open source projects, and recent product releases that expand support for technologies like ARM64, Kubernetes, and Cloud Foundry integration.
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...Redis Labs
I will build from scratch in this session a Microsoft ASP.NET website that caches WebAPI REST calls with both MSOpenTech’s Redis implementation for running while developing in Visual Studio as well as running on a Windows server running IIS. I will show you how to build a safe reusable caching library in c# that can be used in any .net project. I will also demonstrate how to use the Redis cache services that are available on Microsoft’s Azure cloud platform. Further, I’ll demonstrate a real world web site that uses Azure Redis cache and show statistics on how Redis improves performance consistently and reliably.
Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics. Producers write data to topics and consumers read from topics. The data is partitioned and replicated across clusters of machines called brokers for reliability and scalability. A common data format like Avro can be used to serialize the data.
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. Many organizations understand the use cases around their data – fraud detection, quality of service and technical operations, user behavior analysis, for example – but are not necessarily data infrastructure experts. In this session, we’ll follow the flow of data through an end to end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive are actually stitched together; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality.
Attendees will leave this session knowing not just which open source projects go into a system such as this, but how they work together, what tradeoffs and decisions need to be addressed, and how to present a single general purpose data platform to multiple applications. This session should be attended by data infrastructure engineers and architects planning, building, or maintaining similar systems.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB
Like many startups, Wish grew up on AWS. As our cluster grew and the price of SSDs fell, we started exploring bare metal. Fast-forward 2 years and we have hundreds of MongoDB instances on bare metal fully integrated with our AWS infrastructure. It wasn't all smooth sailing, but the performance & cost improvements were worth it! Hear the story of how we did it and gain a framework for thinking about how to make the leap from cloud-centric architecture to a hybrid model.
At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores.
In this presentation, I talk about:
- Why we chose Kafka
- How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware.
- How we produce + consume events
- How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.
Kafka is a distributed, replicated, and partitioned platform for handling real-time data feeds. It allows both publishing and subscribing to streams of records, and is commonly used for applications such as log aggregation, metrics, and streaming analytics. Kafka runs as a cluster of one or more servers that can reliably handle trillions of events daily.
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
Introduction of Apache Kafka - the open source platform for real time message queuing and reliable, scalable, distributed event handling and high volume pub/sub implementation.
see GitHub https://github.com/MaartenSmeets/kafka-workshop for the workshop resources.
Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. This presentation from Strata 2017 in New York provides an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
MyHeritage Kakfa use cases - Feb 2014 Meetup Ran Levy
MyHeritage uses Kafka as a messaging system to handle two main use cases: indexing data to their search system and reporting statistics to their business intelligence system. The document provides an overview of Kafka, describing it as a fast, scalable, durable, distributed messaging system. It then details MyHeritage's implementation, including using Kafka to handle event streaming from producers to consumers that process the data for indexing and reporting. The summary emphasizes that Kafka is very fast, scalable, and extensively used at MyHeritage to handle their high scale systems.
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteLucidworks
Evernote stores over 3 billion notes from over 100 million users worldwide. To improve search performance and allow upgrades to newer Lucene versions, Evernote rearchitected their search system. They separated search code from the data storage, allowed multiple Lucene versions to run concurrently on each machine, and automatically migrated each user's index to the default version without downtime. This reduced disk I/O by 81% and allowed compression techniques to further reduce storage needs by terabytes and input/output by petabytes each week.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
JavaOne 2016
JMS is pretty simple, right? Once you’ve mastered topics and queues, the rest can appear trivial, but that isn’t the case. The queuing system, whether ActiveMQ, OpenMQ, or WebLogic JMS, provides many more features and settings than appear in the Java EE documentation. This session looks at some of the important extended features and configuration settings. What would you need to optimize if your messages are large or you need to minimize prefetching? What is the best way to implement time-delayed messages? The presentation also looks at dangerous bugs that can be introduced via simple misconfigurations with pooled beans. The JMS APIs are deceptively simple, but getting an implementation into production and tuned correctly can be a bit trickier.
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
Yahoo recently open-sourced Pulsar, a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. Pulsar is used across various Yahoo applications for large scale data pipelines. Learn more about Pulsar architecture and use-cases in this talk.
Speakers:
Matteo Merli from Pulsar team at Yahoo
Agile Lab is an Italian company that specializes in leveraging innovative technologies like machine learning, big data, and artificial intelligence to satisfy customers' objectives. They have over 50 specialists with deep experience in production environments. The company believes in investing in its team through conferences, R&D projects, and welfare benefits. They also release open source frameworks on GitHub and share knowledge through meetups in Milan and Turin.
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Ontico
HighLoad++ 2017
Зал «Дели + Калькутта», 8 ноября, 17:00
Тезисы:
http://www.highload.ru/2017/abstracts/2978.html
When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does “up and running” even mean?
...
This document discusses strategies for building large-scale stream infrastructures across multiple data centers using Apache Kafka. It outlines common multi-data center patterns like stretched clusters, active/passive clusters, and active/active clusters. It also covers challenges like maintaining ordering and consumer offsets across data centers and potential solutions.
The document provides an introduction and overview of Apache Kafka presented by Jeff Holoman. It begins with an agenda and background on the presenter. It then covers basic Kafka concepts like topics, partitions, producers, consumers and consumer groups. It discusses efficiency and delivery guarantees. Finally, it presents some use cases for Kafka and positioning around when it may or may not be a good fit compared to other technologies.
Kafka is a distributed publish-subscribe messaging system that can handle high volumes of data by passing messages from endpoints. It provides reliability through replication, scalability by scaling easily without downtime, and durability by persisting data quickly to disk. Key components are topics to publish messages to, brokers to maintain data, producers to publish data, consumers to subscribe and consume data, and Zookeeper for coordination. Partitioning topics across multiple brokers in a cluster allows for parallel consumption by consumer groups.
Apache Kafka is a distributed messaging system originally developed by LinkedIn to handle high volumes of log data with low latency. It allows for both online and offline data analysis and is highly scalable and efficient. Kafka uses a "pull model" where consumers pull messages from brokers in a distributed, fault-tolerant way coordinated by Zookeeper. Producers push messages to topics which are partitioned across brokers for scalability.
Pulsar is a distributed pub/sub messaging platform developed by Yahoo. It provides scalable messaging with persistence, ordering and delivery guarantees. Pulsar is used extensively at Yahoo, handling 100 billion messages per day across 80+ applications. It provides common use cases like messaging queues, notifications and feedback systems. Pulsar's architecture uses brokers for client interactions, Apache BookKeeper for durable storage, and Zookeeper for coordination. Future work includes adding encryption, globally consistent topics, and C++ client support.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
2. Using Kafka as Your Primary Data Store
• Compacted Topics Guarantees
• Compacted Topics and Log Cleaner
• Use Cases
• MongoDB?
• KafkaSubscriptionBackingStore
• Multiple Node Support
3. Compacted Topics Guarantees
1. Stay up with head, see every message.
2. Message Order Always Maintained.
3. Offset never changes.
4. Consumer will see at least the final state.
5. Use Cases
1. Database change subscription
2. Event sourcing
3. Journaling for high-availability
6. Why not just use MongoDB?
• MongoDB doesn’t have triggers, notifications or listeners.
• Would have to develop a custom solution in order to see changes.
• One less point of failure
1. Messages will have sequential offsets
2. Compaction doesn't reorder, just removes duplicates
3. Offsets are permanent identifiers.
4. ...in the order they were written.
1. In oracle, this is known as Database Change Notification. Use this if you have a cache and want to update that cache when the database changes.
2. Design that co-locates query processing with application design and uses a log of changes as the primary store for the application.
3. Save local state so another process can reload changes and carry on when the original process fails.