(Stephen Parente + Jeff Field, Blizzard) Kafka Summit SF 2018
Blizzard’s global data platform has become a driving force in both business and operational analytics. As more internal customers onboard with the system, there is increasing demand for custom applications to access this data in near real time. In order to avoid many independent teams with varying levels of Kafka expertise all accessing the firehose from our critical production Kafkas, we developed our own pub-sub system on top of Kafka to provide specific datasets to customers on their own cloud deployed Kafka clusters.
Real-Time Dynamic Data Export Using the Kafka Ecosystemconfluent
(Preston Thompson, Braze) Kafka Summit SF 2018
If you collect billions of data points every day and create billions more sending and tracking messages, then you know you need to get your infrastructure right. Our clients use Braze to engage their users over their lifecycle via push notifications, emails, in-app messages and more. Using our Currents product, clients can enable multiple configurable integrations to export this event data in real time to a variety of third-party systems, allowing them to tightly integrate with the rest of their operations and understand the impacts of their engagement strategy.
We use Kafka and the Kafka ecosystem to power this high volume real-time export. As you’d expect in a big data environment, we take data collected from a variety of sources—our SDKs, email partner APIs, our own systems—and produce it to Kafka, with topics for each type of event (about 30 types). Kafka Streams filters and transforms this data according to the configurations set by our clients. Clients can choose which types of events should be sent to which third-party systems. Kafka Connect helps to export the data to third-party systems in real time using custom developed connectors. We run a connector instance for each integration for each customer that consumes from the integration-specific topic. On top of it all, we built a service to manage the pipeline. The service provides configurations to the Streams application and also creates topics for new integrations and uses the Connect REST API to create and manage connectors.
In this talk, I will discuss:
-How we started our journey in designing this large-scale streaming architecture
-Why streaming technologies were necessary to solve our technology and business issues
-The lessons we learned along the way that can help you with your Kafka-based architecture
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...confluent
(Bob Lehmann, Bayer) Kafka Summit SF 2018
You’ve built your streaming data platform. The early adopters are “all in” and have developed producers, consumers and stream processing apps for a number of use cases. A large percentage of the enterprise, however, has expressed interest but hasn’t made the leap. Why?
In 2014, Bayer Crop Science (formerly Monsanto) adopted a cloud first strategy and started a multi-year transition to the cloud. A Kafka-based cross-datacenter DataHub was created to facilitate this migration and to drive the shift to real-time stream processing. The DataHub has seen strong enterprise adoption and supports a myriad of use cases. Data is ingested from a wide variety of sources and the data can move effortlessly between an on premise datacenter, AWS and Google Cloud. The DataHub has evolved continuously over time to meet the current and anticipated needs of our internal customers. The “cost of admission” for the platform has been lowered dramatically over time via our DataHub Portal and technologies such as Kafka Connect, Kubernetes and Presto. Most operations are now self-service, onboarding of new data sources is relatively painless and stream processing via KSQL and other technologies is being incorporated into the core DataHub platform.
In this talk, Bob Lehmann will describe the origins and evolution of the Enterprise DataHub with an emphasis on steps that were taken to drive user adoption. Bob will also talk about integrations between the DataHub and other key data platforms at Bayer, lessons learned and the future direction for streaming data and stream processing at Bayer.
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...confluent
(Dmitry Milman + Ankur Kaneria, Express Scripts) Kafka Summit SF 2018
Building cloud-based microservices can be a challenge when the system of record is a relational database residing on an on-premise mainframe. The challenge lies in the ability to efficiently and cost-effectively access the ever-increasing amount of data. Express Scripts is reimagining its data architecture to bring best-in-class user experience and provide the foundation of next-generation applications.
This talk will showcase how Kafka plays a key role within Express Scripts’ transformation from mainframe to a microservice-based ecosystem, ensuring data integrity between two worlds. It will discuss how change data capture (CDC) is leveraged to stream data changes to Kafka, allowing us to build a low-latency data sync pipeline. We will describe how we achieve transactional consistency by collapsing all events that belong together onto a single topic, yet have the ability to scale out to meet the real time SLAs and low-latency requirements through means of partitions. We will share our Kafka Streams configuration to handle the data transformation workload. We will discuss our overall Kafka cluster footprint, configuration and security measures.
Express Scripts Holding Company is an American Fortune 100 company. As of 2018, the company is the 25th largest in the U.S. as well as one of the largest pharmacy benefit management organizations in the U.S. Customers rely on 24/7 access to our services, and need the ability to interact with our systems in real time via various channels such as web and mobile. Sharing our mainframe t0 microservices migration journey, our experiences and lessons learned would be beneficial to other companies venturing on a similar path.
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.
(Krunal Vora, Tinder) Kafka Summit San Francisco 2018
At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
Real-Time Dynamic Data Export Using the Kafka Ecosystemconfluent
(Preston Thompson, Braze) Kafka Summit SF 2018
If you collect billions of data points every day and create billions more sending and tracking messages, then you know you need to get your infrastructure right. Our clients use Braze to engage their users over their lifecycle via push notifications, emails, in-app messages and more. Using our Currents product, clients can enable multiple configurable integrations to export this event data in real time to a variety of third-party systems, allowing them to tightly integrate with the rest of their operations and understand the impacts of their engagement strategy.
We use Kafka and the Kafka ecosystem to power this high volume real-time export. As you’d expect in a big data environment, we take data collected from a variety of sources—our SDKs, email partner APIs, our own systems—and produce it to Kafka, with topics for each type of event (about 30 types). Kafka Streams filters and transforms this data according to the configurations set by our clients. Clients can choose which types of events should be sent to which third-party systems. Kafka Connect helps to export the data to third-party systems in real time using custom developed connectors. We run a connector instance for each integration for each customer that consumes from the integration-specific topic. On top of it all, we built a service to manage the pipeline. The service provides configurations to the Streams application and also creates topics for new integrations and uses the Connect REST API to create and manage connectors.
In this talk, I will discuss:
-How we started our journey in designing this large-scale streaming architecture
-Why streaming technologies were necessary to solve our technology and business issues
-The lessons we learned along the way that can help you with your Kafka-based architecture
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...confluent
(Bob Lehmann, Bayer) Kafka Summit SF 2018
You’ve built your streaming data platform. The early adopters are “all in” and have developed producers, consumers and stream processing apps for a number of use cases. A large percentage of the enterprise, however, has expressed interest but hasn’t made the leap. Why?
In 2014, Bayer Crop Science (formerly Monsanto) adopted a cloud first strategy and started a multi-year transition to the cloud. A Kafka-based cross-datacenter DataHub was created to facilitate this migration and to drive the shift to real-time stream processing. The DataHub has seen strong enterprise adoption and supports a myriad of use cases. Data is ingested from a wide variety of sources and the data can move effortlessly between an on premise datacenter, AWS and Google Cloud. The DataHub has evolved continuously over time to meet the current and anticipated needs of our internal customers. The “cost of admission” for the platform has been lowered dramatically over time via our DataHub Portal and technologies such as Kafka Connect, Kubernetes and Presto. Most operations are now self-service, onboarding of new data sources is relatively painless and stream processing via KSQL and other technologies is being incorporated into the core DataHub platform.
In this talk, Bob Lehmann will describe the origins and evolution of the Enterprise DataHub with an emphasis on steps that were taken to drive user adoption. Bob will also talk about integrations between the DataHub and other key data platforms at Bayer, lessons learned and the future direction for streaming data and stream processing at Bayer.
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
We often need to build applications that analyze Kafka data to unlock the most value from event streams, so how can organizations build these real-time analytics applications? In this talk, we examine an indexing approach that enables fast SQL analytics on data from Kafka, without data flattening or denormalization. Rockset is the real-time indexing database that builds an inverted index, a columnar index and a row index on all fields of your Kafka messages, including nested fields and arrays. This Converged Index accelerates various types of analytic queries–search, aggregations and joins–without the need to denormalize or transform data for performance reasons. With indexing delivering significant gains in query performance, we also need to index new data in a timely manner. We discuss several strategies used for efficient ingestion and indexing from Kafka, including rollups, write optimizations on the underlying RocksDB storage engine, and the disaggregation of ingest and query compute.
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...confluent
(Dmitry Milman + Ankur Kaneria, Express Scripts) Kafka Summit SF 2018
Building cloud-based microservices can be a challenge when the system of record is a relational database residing on an on-premise mainframe. The challenge lies in the ability to efficiently and cost-effectively access the ever-increasing amount of data. Express Scripts is reimagining its data architecture to bring best-in-class user experience and provide the foundation of next-generation applications.
This talk will showcase how Kafka plays a key role within Express Scripts’ transformation from mainframe to a microservice-based ecosystem, ensuring data integrity between two worlds. It will discuss how change data capture (CDC) is leveraged to stream data changes to Kafka, allowing us to build a low-latency data sync pipeline. We will describe how we achieve transactional consistency by collapsing all events that belong together onto a single topic, yet have the ability to scale out to meet the real time SLAs and low-latency requirements through means of partitions. We will share our Kafka Streams configuration to handle the data transformation workload. We will discuss our overall Kafka cluster footprint, configuration and security measures.
Express Scripts Holding Company is an American Fortune 100 company. As of 2018, the company is the 25th largest in the U.S. as well as one of the largest pharmacy benefit management organizations in the U.S. Customers rely on 24/7 access to our services, and need the ability to interact with our systems in real time via various channels such as web and mobile. Sharing our mainframe t0 microservices migration journey, our experiences and lessons learned would be beneficial to other companies venturing on a similar path.
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.
(Krunal Vora, Tinder) Kafka Summit San Francisco 2018
At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018
Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services.
In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?
Kafka Summit SF 2017 - Providing Reliability Guarantees in Kafka at One Trill...confluent
In this presentation, I will talk about my firsthand experience dealing with the unique challenges of running Kafka at a massive scale. If you ever thought that running Kafka is difficult, this talk may change your mind and provide you with valuable insights into how to configure a Kafka cluster efficiently, how to manage Kafka for enterprise customers and how to measure, monitor and maintain the Quality of Kafka Service. Our production Kafka cluster runs over 1500+ VMs, and serves over 10 GBPS data spread across hundreds of topics for multiple teams across Microsoft. We built a self-serve Kafka management service to make the process manageable and scalable across many teams. In this talk, I will also share insights about running Kafka in Private vs multi-tenant mode, supporting failover and disaster recovery requirements, and how to make Kafka Compliant with regulatory certifications such as ISO, SOC, FEDRAMP, etc.
Presented by Nitin Kumar, Microsoft
Systems Track
(Joseph deBuzna + Zulfikar Quereshi, HVR) Kafka Summit SF 2018
This presentation is a customer story about France-based regional airline HOP! and their need to make better use of data that was contained in various applications. They also needed this information to be available in real time. As one can imagine, airlines manage a wide variety of information such as weather, customer information, flight plans, sensor data from planes and much more.
In this presentation, Joe will discuss how HOP! was delivering their data before and the limitations associated with delivering this data. Joe will then talk about HOP!’s selection of Kafka and HVR as a solution to enabling data availability and real-time information for analysis and action.
In this session, attendees will learn:
-How Kafka was selected and chosen as a solution for HOP!’s complex challenges
-Architecture and capabilities implemented that enabled data feeding from multiple sources to Kafka
-Considerations and challenges with this approach
-Business results and future plans
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
To remain competitive, organizations need to democratize access to fast analytics, not only to gain real-time insights on their business but also to power smart apps that need to react in the moment. In this session, you will learn how Kafka and SingleStore enable modern, yet simple data architecture to analyze both fast paced incoming data as well as large historical datasets. In particular, you will understand why SingleStore is well suited process data streams coming from Kafka.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Achieving end-to-end visibility into complex event-sourcing transactions usin...HostedbyConfluent
Event-sourcing systems usage like Kafka is growing rapidly among Node.js applications. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. With these advantages, we face new challenges - how to get visibility into these complex processes.
Event-driven architecture is async by nature. Tracking the communication between different components is both extremely difficult and important when debugging or figuring out bottlenecks in the system.
In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...confluent
(Benny Lee + Christopher Arthur, Bank of Australia) Kafka Summit SF 2018
Commonwealth Bank of Australia (CBA) is Australia’s largest bank with over 15m customers, 50,000 employees and over USD700 billion in assets. We started the journey two years ago to transform our existing enterprise architecture into an “event driven” architecture. Since then, Kafka has become a mission critical platform in the Bank and it is the core component in our “event driven” architecture strategy.
In this talk, we will walk you through the journey of how we stood up the initial Kafka clusters, the challenges we encountered (both technical and organisational) and how we overcame those challenges.
We will also deep dive into one of the use cases for Kafka (with Kafka Streams and Connectors) in our new real time payment system that was introduced in Australia early this year. We will discuss why we think Kafka was the perfect solution for this use case, and the lessons learned.
Key Takeaways:
-Lessons learned from our experiences (that we think other companies could be able to benefit from)
-Our use cases for Kafka with a particular focus on the new real time payment systems (NPP) initiative in Australia
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...HostedbyConfluent
At Wells-Fargo, we move 150 TB of logs data from our syslogs to Splunk forwarders that get indexed and organized for analytic queries. As we modernize and migrate our applications to our hybrid cloud the performance expectations for this infrastructure will proportionately increase. Those improvements include the resilience of the end to end infrastructure. First, we decoupled the applications from their logging interface through a loglibrary which split the streams of logs from their sources to KAFKA which routed them to two separate destinations Splunk and ELK respectively. We also used prometheus and grafana for monitoring the metrics. We also deployed KAFKA, Splunk, ELK, Prometheus and Grafana on the Kubernetes clusters. Confluent had released a version of KAFKA without Zookeeper and replaced its functionality with Quorum Controller. The Quorum-Controller version exhibited better disposability one of the 12factors that's important for Cloud-Nativeness. We packaged this version into a Kubernetes operator called Keda and deployed this for auto-scaling. We tested this to simulate the amount of logdata that we typically generate in production. Based on the above we have also implemented distributed tracing and help make it just as resilient. We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
With 60+ products and over 24% of the US GDP flowing through it, system integration is a tough problem for Intuit. Seasonality, scale, and massive peaks in products like TurboTax, QuickBooks, and Mint.com add extra layers of difficulty when building shared data services around transaction and user graphs, clickstream processing, a/b testing, and personalization. To reduce complexity and latency, we’ve implemented Kafka as the backbone across these data services. This allows us to asynchronously trigger relevant processing, elegantly scaling up and down as needed around peaks, all without the need for point-to-point integrations.
In this talk, we share what we’ve learned about Kafka at Intuit and describe our data services architecture. We found that Kafka is invaluable in achieving a scalable, clean architecture, allowing engineering teams to focus less on integration and more on product development.
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...HostedbyConfluent
Quality Matters … and as event-driven architectures (EDA) become increasingly popular in the microservices space, ensuring the delivery and performance of your EDA increases in importance. But while it’s powerful architecture, it does come with its challenges, especially from a testing perspective. For example, most organizations are not reliant on Kafka alone, but a multitude of interconnected APIs like REST, GraphQL and gRPC. One of the questions that arise from this challenge: How do you build end-to-end tests when the APIs are completely different technologies—without relying on fragile scripts? In our talk, we’ll tackle this question and many more when it comes to the testing of Apache Kafka endpoints and your services architecture. We’ll cover what makes testing in EDA difficult; technologies that can help you; and how we at SmartBear are thinking about these testing problems and, most importantly, how we are trying to solve for them.
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior.
While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement.
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Putting the Micro into Microservices with Stateful Stream Processingconfluent
How small can a microservice be? This talk will look at how Stateful Stream Processing is used to build truly autonomous, often minuscule services. With the distributed guarantees of Exactly Once Processing, Event Driven Services supported by Apache Kafka become reliable, fast and nimble, blurring the line between business system and big data pipeline.
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020HostedbyConfluent
One of the key metrics to monitor when working with Apache Kafka, as a data pipeline or a streaming platform, is Consumer Groups Lag.
Lag is the delta between the last produced message and the last committed message of a partition. In other words, lag indicates how far behind your application is in processing up-to-date information.
For a long time, we used our own service to keep track of these metrics, collect them and visualize them. But this didn’t scale well.
You had to perform many manual operations, redeploy it and to do other tedious manual tasks, but most importantly, the biggest gap for us, was that its output was represented in absolute numbers (e.g - your lag is 30K), which basically tells you nothing as a human being.
We understood that we had to find a more suitable solution that will give us better visibility and will allow us to measure the lag in a time-based format that we all understand.
In this talk, I’m going to go over the core concepts of Kafka offsets and lags, and explain why lag even matters and is an important KPI to measure. I’ll also talk about the kind of research we did to find the right tool, what the options in the market were at the time, and eventually why we chose Linkedin’s Burrow as the right tool for us. And finally, I’ll take a closer look at Burrow, its building blocks, how we build and deploy it, how we monitor better with it, and eventually the most important improvement - how we transformed its output from numbers to time-based metrics.
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...confluent
Flipp is an e-commerce company that promotes weekly shopping opportunities. We began our migration to event-driven microservices in November 2016, and have since moved to nearly 300 Kafka-powered microservices. In this presentation we will explore the major strategies we have used in our migration from distributed monoliths to event-driven microservices. There have been a number of painful learnings and pitfalls along the way that we will share with you. Lastly, we will provide recommendations for each step of the way on your journey from monoliths to effective event-driven microservices. The first major section of this presentation deals with the liberation of data from monolithic services. In this section we will cover: Kafka Connect vs System Production, Event Schematization, Entities and Events, The importance of the Single Source of Truth, Consumption patterns and Event update verbosity. The second major section of this presentation discusses the usage of liberated event data in conjunction with other event streams.In this section we will cover common access patterns, handling (lots) of relational data, Stateful Foreign-Key Joins in Kafka Streams (See Kafka KIP-213), High frequency updates (price, stock) vs static properties and how to handle too many data streams. The third major section details how to abstract event complexity away, leverage the single source of truth and the usage of Core Events across a company. In this section we cover abstracting data streams, Core Events as detailed by the Single Source of Truth, Core Events in relation to bounded contexts and using Core Events successfully as a business.
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...HostedbyConfluent
As Kafka deployments grow within your organization, so do the challenges around lifecycle management. For instance, do you really know what streams exist, who is producing and consuming them? What is the effect of upstream changes? How is this information kept up to date, so it is relevant and consistent to others looking to reuse these streams? Ever wish you had a way to view and visualize graphically the relationships between schemas, topics and applications? In this talk we will show you how to do that and get more value from your Kafka Streaming infrastructure using an event portal. It’s like an API portal but specialized for event streams and publish/subscribe patterns. Join us to see how you can automatically discover event streams from your Kafka clusters, import them to a catalog and then leverage code gen capabilities to ease development of new applications.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?
Kafka Summit SF 2017 - Providing Reliability Guarantees in Kafka at One Trill...confluent
In this presentation, I will talk about my firsthand experience dealing with the unique challenges of running Kafka at a massive scale. If you ever thought that running Kafka is difficult, this talk may change your mind and provide you with valuable insights into how to configure a Kafka cluster efficiently, how to manage Kafka for enterprise customers and how to measure, monitor and maintain the Quality of Kafka Service. Our production Kafka cluster runs over 1500+ VMs, and serves over 10 GBPS data spread across hundreds of topics for multiple teams across Microsoft. We built a self-serve Kafka management service to make the process manageable and scalable across many teams. In this talk, I will also share insights about running Kafka in Private vs multi-tenant mode, supporting failover and disaster recovery requirements, and how to make Kafka Compliant with regulatory certifications such as ISO, SOC, FEDRAMP, etc.
Presented by Nitin Kumar, Microsoft
Systems Track
(Joseph deBuzna + Zulfikar Quereshi, HVR) Kafka Summit SF 2018
This presentation is a customer story about France-based regional airline HOP! and their need to make better use of data that was contained in various applications. They also needed this information to be available in real time. As one can imagine, airlines manage a wide variety of information such as weather, customer information, flight plans, sensor data from planes and much more.
In this presentation, Joe will discuss how HOP! was delivering their data before and the limitations associated with delivering this data. Joe will then talk about HOP!’s selection of Kafka and HVR as a solution to enabling data availability and real-time information for analysis and action.
In this session, attendees will learn:
-How Kafka was selected and chosen as a solution for HOP!’s complex challenges
-Architecture and capabilities implemented that enabled data feeding from multiple sources to Kafka
-Considerations and challenges with this approach
-Business results and future plans
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
To remain competitive, organizations need to democratize access to fast analytics, not only to gain real-time insights on their business but also to power smart apps that need to react in the moment. In this session, you will learn how Kafka and SingleStore enable modern, yet simple data architecture to analyze both fast paced incoming data as well as large historical datasets. In particular, you will understand why SingleStore is well suited process data streams coming from Kafka.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Achieving end-to-end visibility into complex event-sourcing transactions usin...HostedbyConfluent
Event-sourcing systems usage like Kafka is growing rapidly among Node.js applications. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. With these advantages, we face new challenges - how to get visibility into these complex processes.
Event-driven architecture is async by nature. Tracking the communication between different components is both extremely difficult and important when debugging or figuring out bottlenecks in the system.
In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.
Kafka in the Enterprise—A Two-Year Journey to Build a Data Streaming Platform...confluent
(Benny Lee + Christopher Arthur, Bank of Australia) Kafka Summit SF 2018
Commonwealth Bank of Australia (CBA) is Australia’s largest bank with over 15m customers, 50,000 employees and over USD700 billion in assets. We started the journey two years ago to transform our existing enterprise architecture into an “event driven” architecture. Since then, Kafka has become a mission critical platform in the Bank and it is the core component in our “event driven” architecture strategy.
In this talk, we will walk you through the journey of how we stood up the initial Kafka clusters, the challenges we encountered (both technical and organisational) and how we overcame those challenges.
We will also deep dive into one of the use cases for Kafka (with Kafka Streams and Connectors) in our new real time payment system that was introduced in Australia early this year. We will discuss why we think Kafka was the perfect solution for this use case, and the lessons learned.
Key Takeaways:
-Lessons learned from our experiences (that we think other companies could be able to benefit from)
-Our use cases for Kafka with a particular focus on the new real time payment systems (NPP) initiative in Australia
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...HostedbyConfluent
At Wells-Fargo, we move 150 TB of logs data from our syslogs to Splunk forwarders that get indexed and organized for analytic queries. As we modernize and migrate our applications to our hybrid cloud the performance expectations for this infrastructure will proportionately increase. Those improvements include the resilience of the end to end infrastructure. First, we decoupled the applications from their logging interface through a loglibrary which split the streams of logs from their sources to KAFKA which routed them to two separate destinations Splunk and ELK respectively. We also used prometheus and grafana for monitoring the metrics. We also deployed KAFKA, Splunk, ELK, Prometheus and Grafana on the Kubernetes clusters. Confluent had released a version of KAFKA without Zookeeper and replaced its functionality with Quorum Controller. The Quorum-Controller version exhibited better disposability one of the 12factors that's important for Cloud-Nativeness. We packaged this version into a Kubernetes operator called Keda and deployed this for auto-scaling. We tested this to simulate the amount of logdata that we typically generate in production. Based on the above we have also implemented distributed tracing and help make it just as resilient. We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
With 60+ products and over 24% of the US GDP flowing through it, system integration is a tough problem for Intuit. Seasonality, scale, and massive peaks in products like TurboTax, QuickBooks, and Mint.com add extra layers of difficulty when building shared data services around transaction and user graphs, clickstream processing, a/b testing, and personalization. To reduce complexity and latency, we’ve implemented Kafka as the backbone across these data services. This allows us to asynchronously trigger relevant processing, elegantly scaling up and down as needed around peaks, all without the need for point-to-point integrations.
In this talk, we share what we’ve learned about Kafka at Intuit and describe our data services architecture. We found that Kafka is invaluable in achieving a scalable, clean architecture, allowing engineering teams to focus less on integration and more on product development.
Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...HostedbyConfluent
Quality Matters … and as event-driven architectures (EDA) become increasingly popular in the microservices space, ensuring the delivery and performance of your EDA increases in importance. But while it’s powerful architecture, it does come with its challenges, especially from a testing perspective. For example, most organizations are not reliant on Kafka alone, but a multitude of interconnected APIs like REST, GraphQL and gRPC. One of the questions that arise from this challenge: How do you build end-to-end tests when the APIs are completely different technologies—without relying on fragile scripts? In our talk, we’ll tackle this question and many more when it comes to the testing of Apache Kafka endpoints and your services architecture. We’ll cover what makes testing in EDA difficult; technologies that can help you; and how we at SmartBear are thinking about these testing problems and, most importantly, how we are trying to solve for them.
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior.
While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement.
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Putting the Micro into Microservices with Stateful Stream Processingconfluent
How small can a microservice be? This talk will look at how Stateful Stream Processing is used to build truly autonomous, often minuscule services. With the distributed guarantees of Exactly Once Processing, Event Driven Services supported by Apache Kafka become reliable, fast and nimble, blurring the line between business system and big data pipeline.
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020HostedbyConfluent
One of the key metrics to monitor when working with Apache Kafka, as a data pipeline or a streaming platform, is Consumer Groups Lag.
Lag is the delta between the last produced message and the last committed message of a partition. In other words, lag indicates how far behind your application is in processing up-to-date information.
For a long time, we used our own service to keep track of these metrics, collect them and visualize them. But this didn’t scale well.
You had to perform many manual operations, redeploy it and to do other tedious manual tasks, but most importantly, the biggest gap for us, was that its output was represented in absolute numbers (e.g - your lag is 30K), which basically tells you nothing as a human being.
We understood that we had to find a more suitable solution that will give us better visibility and will allow us to measure the lag in a time-based format that we all understand.
In this talk, I’m going to go over the core concepts of Kafka offsets and lags, and explain why lag even matters and is an important KPI to measure. I’ll also talk about the kind of research we did to find the right tool, what the options in the market were at the time, and eventually why we chose Linkedin’s Burrow as the right tool for us. And finally, I’ll take a closer look at Burrow, its building blocks, how we build and deploy it, how we monitor better with it, and eventually the most important improvement - how we transformed its output from numbers to time-based metrics.
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...confluent
Flipp is an e-commerce company that promotes weekly shopping opportunities. We began our migration to event-driven microservices in November 2016, and have since moved to nearly 300 Kafka-powered microservices. In this presentation we will explore the major strategies we have used in our migration from distributed monoliths to event-driven microservices. There have been a number of painful learnings and pitfalls along the way that we will share with you. Lastly, we will provide recommendations for each step of the way on your journey from monoliths to effective event-driven microservices. The first major section of this presentation deals with the liberation of data from monolithic services. In this section we will cover: Kafka Connect vs System Production, Event Schematization, Entities and Events, The importance of the Single Source of Truth, Consumption patterns and Event update verbosity. The second major section of this presentation discusses the usage of liberated event data in conjunction with other event streams.In this section we will cover common access patterns, handling (lots) of relational data, Stateful Foreign-Key Joins in Kafka Streams (See Kafka KIP-213), High frequency updates (price, stock) vs static properties and how to handle too many data streams. The third major section details how to abstract event complexity away, leverage the single source of truth and the usage of Core Events across a company. In this section we cover abstracting data streams, Core Events as detailed by the Single Source of Truth, Core Events in relation to bounded contexts and using Core Events successfully as a business.
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...HostedbyConfluent
As Kafka deployments grow within your organization, so do the challenges around lifecycle management. For instance, do you really know what streams exist, who is producing and consuming them? What is the effect of upstream changes? How is this information kept up to date, so it is relevant and consistent to others looking to reuse these streams? Ever wish you had a way to view and visualize graphically the relationships between schemas, topics and applications? In this talk we will show you how to do that and get more value from your Kafka Streaming infrastructure using an event portal. It’s like an API portal but specialized for event streams and publish/subscribe patterns. Join us to see how you can automatically discover event streams from your Kafka clusters, import them to a catalog and then leverage code gen capabilities to ease development of new applications.
This session is recommended for anyone interested in understanding how to use AWS big data services to develop real-time analytics applications. In this session, you will get an overview of a number of Amazon's big data and analytics services that enable you to build highly scaleable cloud applications that immediately and continuously analyze large sets of distributed data. We'll explain how services like Amazon Kinesis, EMR and Redshift can be used for data ingestion, processing and storage to enable real-time insights and analysis into customer, operational and machine generated data and log files. We'll explore system requirements, design considerations, and walk through a specific customer use case to illustrate the power of real-time insights on their business.
What to Expect for Big Data and Apache Spark in 2017 Databricks
Big data remains a rapidly evolving field with new applications and infrastructure appearing every year. In this talk, Matei Zaharia will cover new trends in 2016 / 2017 and how Apache Spark is moving to meet them. In particular, he will talk about work Databricks is doing to make Apache Spark interact better with native code (e.g. deep learning libraries), support heterogeneous hardware, and simplify production data pipelines in both streaming and batch settings through Structured Streaming.
Speaker: Matei Zaharia
Video: http://go.databricks.com/videos/spark-summit-east-2017/what-to-expect-big-data-apache-spark-2017
This talk was originally presented at Spark Summit East 2017.
(MED305) Achieving Consistently High Throughput for Very Large Data Transfers...Amazon Web Services
A difficult problem for users of Amazon S3 that deal in large-form data is how to consistently transfer ultralarge files and large sets of files at fast speeds over the WAN. Although a number of tools are available for network transfer with S3 that exploit its multipart APIs, most have practical limitations when transferring very large files or large sets of very small files with remote regions. Transfers can be slow, degrade unpredictably, and for the largest sizes fail altogether. Additional complications include resume, encryption at rest, encryption in transit, and efficient updates for synchronization.
Aspera has expertise and experience in tackling these problems and has created a suite of transport, synchronization, monitoring, and collaboration software that can transfer and store both ultralarge files (up to the 5 TB limit of an S3 object) and large numbers of very small files (millions andlt; 100 KB) consistently fast, regardless of region.
In this session, technical leaders from Aspera explain how to achieve very large file WAN transfers and integrate them into mission-critical workflows across multiple industries. EVS, a media service provider to the 2014 FIFA World Cup Brazil explains how they used Aspera solutions for the delivery of high-speed, live video transport, moving real-time video data from sports matches in Brazil to Europe for AWS-based transcoding, live streaming, and file delivery. Sponsored by Aspera.
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
CloudNativeCon North America 2017, Austin (Texas, USA): Talk by Josef Adersberger (@adersberger, CTO at QAware)
Abstract:
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
The burden of a successful feature: Scaling our real time logging platformFastly
It’s impossible to understand the health of your system without real-time logging. Our logging platform is one of our favorite and most popular features, and currently handles millions of requests and gigabytes per second of traffic. It’s stateless, real-time, and can provide insight and ship data to a multitude of destinations. Fastly has consistently iterated to keep up with the growth of the platform, and we’ve learned many lessons along the way. In this talk, you’ll get a peek into the system, and how we’ve developed it.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
This session will explore how to apply GeoSpatial analytics using Apache Spark on high-velocity streaming (data-in-motion) and high-volume batch (data-at-rest). Demonstrations will be performed throughout the session to cement these concepts.
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit
There are an ever increasing number of use cases, like online fraud detection, for which the response times of traditional batch processing are too slow. In order to be able to react to such events in close to real-time, you need to go beyond classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm. These systems, however, are not sufficient on their own. For an efficient and fault-tolerant setup, you also need a message queue and storage system. One common example for setting up a fast data pipeline is the SMACK stack. SMACK stands for Spark (Streaming) – the stream processing system Mesos – the cluster orchestrator Akka – the system for providing custom actors for reacting upon the analyses Cassandra – the storage system Kafka – the message queue Setting up this kind of pipeline in a scalable, efficient and fault-tolerant manner is not trivial. First, this workshop will discuss the different components in the SMACK stack. Then, participants will get hands-on experience in setting up and maintaining data pipelines.
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Similar to You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard (20)
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud
How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Who We Are
Stephen Parente
Senior Software Engineer I
Global Data Platform
@webmakersteve
Jeff Field
Senior Systems Engineer I
Global Data Platform
@jfield
6. Telemetry V2 (2017-Now)
HTTP
Ingest
Enrichment
Near Real-Time Data
(Elasticsearch Proc)
Longterm Data (Mimiron
V2)
Senders
(SDK)
Ingest Enrich Process Store Access
Metrics (Chromie DB)
RSYSLOG
Log
Enrichment
Logs
(Elasticsearch Proc)
Server
Logs
7. Pipeline Usage
• BIAPI
• 645 discrete message types
• 5.4 TB written / day
• 6.1 billion messages / day
• Telemetry V2
• 1200 discrete message types
• 1.5 TB written / day
• 14 billion messages / day
8. Emerging Data Problems – “Play Nice, Play Fair”
• Disruptive Player Behavior
• Analyze chat data and action accounts immediately
• Risk Forwarding
• Detect abusive and anomalous behavior
11. Core Concepts
FORWARD
Data needs to be available in a central
location.
ISOLATE
Each customer should read their
own stream of data.
FILTER
Data should be filtered as close to the
edge as possible, in all pipes.
12. Telemetry V2 Architecture (Again!)
HTTP
Ingest
Enrichment
Near Real-Time Data
(Elasticsearch Proc)
Longterm Data (Mimiron
V2)
Senders
(SDK)
Ingest Enrich Process Store Access
Metrics (Chromie DB)
RSYSLOG
Log
Enrichment
Logs
(Elasticsearch Proc)
Server
Logs
21. Takeaways
• Be a carrot, not a stick
• Partition & isolate
• Provide a good user experience
• Integrate with existing services
• Don’t name your platform Telemetry
22.
23. THANK YOU
…AND WE’RE HIRING!
https://careers.blizzard.com/en-us/
Please check out node-rdkafka!
https://github.com/Blizzard/node-rdkafka