Kafka is an event processing platform based on a distributed architecture. It handles millions of messages through distributed brokers and partitions across topics. Messages are immutable records written sequentially to topics and cannot be deleted or edited. Producers send messages to Kafka topics, while consumers subscribe to topics to receive messages. Common client libraries for Kafka include Java, Python, and Go.
Distributed real time stream processing- why and howPetr Zapletal
In this talk you will discover various state-of-the-art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs, their intended use-cases, and how to choose between them. Petr will focus on the popular frameworks, including Spark Streaming, Storm, Samza and Flink. You will also explore theoretical introduction, common pitfalls, popular architectures, and much more.
The demand for stream processing is increasing. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications, include trading, social networks, the Internet of Things, and system monitoring, are becoming more and more important. A number of powerful, easy-to-use open source platforms have emerged to address this.
Petr's goal is to provide a comprehensive overview of modern streaming solutions and to help fellow developers with picking the best possible solution for their particular use-case. Join this talk if you are thinking about, implementing, or have already deployed a streaming solution.
Distributed real time stream processing- why and howPetr Zapletal
In this talk you will discover various state-of-the-art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs, their intended use-cases, and how to choose between them. Petr will focus on the popular frameworks, including Spark Streaming, Storm, Samza and Flink. You will also explore theoretical introduction, common pitfalls, popular architectures, and much more.
The demand for stream processing is increasing. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications, include trading, social networks, the Internet of Things, and system monitoring, are becoming more and more important. A number of powerful, easy-to-use open source platforms have emerged to address this.
Petr's goal is to provide a comprehensive overview of modern streaming solutions and to help fellow developers with picking the best possible solution for their particular use-case. Join this talk if you are thinking about, implementing, or have already deployed a streaming solution.
Effective testing for spark programs Strata NY 2015Holden Karau
This session explores best practices of creating both unit and integration tests for Spark programs as well as acceptance tests for the data produced by our Spark jobs. We will explore the difficulties with testing streaming programs, options for setting up integration testing with Spark, and also examine best practices for acceptance tests.
Unit testing of Spark programs is deceptively simple. The talk will look at how unit testing of Spark itself is accomplished, as well as factor out a number of best practices into traits we can use. This includes dealing with local mode cluster creation and teardown during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL.
Testing Spark Streaming programs has a number of interesting problems. These include handling of starting and stopping the Streaming context, and providing mock data and collecting results. As with the unit testing of Spark programs, we will factor out the common components of the tests that are useful into a trait that people can use.
While acceptance tests are not always part of testing, they share a number of similarities. We will look at which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job.
Relevant Spark Packages & Code:
https://github.com/holdenk/spark-testing-base / http://spark-packages.org/package/holdenk/spark-testing-base
https://github.com/holdenk/spark-validator
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
Developing Java Streaming Applications with Apache StormLester Martin
Apache Storm, http://storm.apache.org, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. During this presentation, a simple Java-based streaming application will be built from scratch!
Code examples can be found at https://github.com/lestermartin/streaming-exploration.
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTanel Poder
Troubleshooting Complex Oracle Performance Problems hacking session & presentation by Tanel Poder.
This presentation is about a complex performance issue where the initial symptoms pointed somewhere else than the root cause. Only when systematically following through the troubleshooting drilldown method, we get to the root cause of the problem. This session aims to help you understand (and reason about) the Oracle’s multi-process & multi-layer system behavior, preparing you for independent troubleshooting of such complex performance issues in the future.
Video recordings of this presentation are in my YouTube channel:
1) Hacking Session: https://www.youtube.com/watch?v=INQewGJMdCI
2) Presentation: https://www.youtube.com/watch?v=aaHZ8A8Ygdg
Tanel's blog and training information: https://blog.tanelpoder.com/seminar
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
This is an intro to latch & mutex contention troubleshooting which I've delivered at Hotsos Symposium, UKOUG Conference etc... It's also the starting point of my Latch & Mutex contention sections in my Advanced Oracle Troubleshooting online seminar - but we go much deeper there :-)
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Apache Beam (formerly Google Cloud Dataflow SDK) is an unified model and set of language-specific SDKs for defining and executing data processing workflows. You design pipelines, simplifying the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
This presentation introduces the Beam programming model, and how you can use it to design your pipelines, transporting PCollection and applying some PTransforms. You will see how the same code will be "translated" to a target runtimes thanks to a specific runner. You will also have an overview of the current roadmap, with the new interesting features.
Effective testing for spark programs Strata NY 2015Holden Karau
This session explores best practices of creating both unit and integration tests for Spark programs as well as acceptance tests for the data produced by our Spark jobs. We will explore the difficulties with testing streaming programs, options for setting up integration testing with Spark, and also examine best practices for acceptance tests.
Unit testing of Spark programs is deceptively simple. The talk will look at how unit testing of Spark itself is accomplished, as well as factor out a number of best practices into traits we can use. This includes dealing with local mode cluster creation and teardown during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL.
Testing Spark Streaming programs has a number of interesting problems. These include handling of starting and stopping the Streaming context, and providing mock data and collecting results. As with the unit testing of Spark programs, we will factor out the common components of the tests that are useful into a trait that people can use.
While acceptance tests are not always part of testing, they share a number of similarities. We will look at which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job.
Relevant Spark Packages & Code:
https://github.com/holdenk/spark-testing-base / http://spark-packages.org/package/holdenk/spark-testing-base
https://github.com/holdenk/spark-validator
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
Developing Java Streaming Applications with Apache StormLester Martin
Apache Storm, http://storm.apache.org, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. During this presentation, a simple Java-based streaming application will be built from scratch!
Code examples can be found at https://github.com/lestermartin/streaming-exploration.
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTanel Poder
Troubleshooting Complex Oracle Performance Problems hacking session & presentation by Tanel Poder.
This presentation is about a complex performance issue where the initial symptoms pointed somewhere else than the root cause. Only when systematically following through the troubleshooting drilldown method, we get to the root cause of the problem. This session aims to help you understand (and reason about) the Oracle’s multi-process & multi-layer system behavior, preparing you for independent troubleshooting of such complex performance issues in the future.
Video recordings of this presentation are in my YouTube channel:
1) Hacking Session: https://www.youtube.com/watch?v=INQewGJMdCI
2) Presentation: https://www.youtube.com/watch?v=aaHZ8A8Ygdg
Tanel's blog and training information: https://blog.tanelpoder.com/seminar
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
This is an intro to latch & mutex contention troubleshooting which I've delivered at Hotsos Symposium, UKOUG Conference etc... It's also the starting point of my Latch & Mutex contention sections in my Advanced Oracle Troubleshooting online seminar - but we go much deeper there :-)
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Apache Beam (formerly Google Cloud Dataflow SDK) is an unified model and set of language-specific SDKs for defining and executing data processing workflows. You design pipelines, simplifying the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
This presentation introduces the Beam programming model, and how you can use it to design your pipelines, transporting PCollection and applying some PTransforms. You will see how the same code will be "translated" to a target runtimes thanks to a specific runner. You will also have an overview of the current roadmap, with the new interesting features.
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Databricks
Description:
We are amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application, which we will discuss.
Abstract:
We are amidst the Big Data Zeitgeist era in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that’s continuous, reacts and interacts with data in real-time. We call this continuous application.
In this talk we will explore the concepts and motivations behind the continuous application, how Structured Streaming Python APIs in Apache Spark 2.x enables writing continuous applications, examine the programming model behind Structured Streaming, and look at the APIs that support them.
Through a short demo and code examples, I will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames and Datasets APIs.
You’ll walk away with an understanding of what’s a continuous application, appreciate the easy-to-use Structured Streaming APIs, and why Structured Streaming in Apache Spark 2.x is a step forward in developing new kinds of streaming applications.
Tapad's data pipeline is an elastic combination of technologies (Kafka, Hadoop, Avro, Scalding) that forms a reliable system for analytics, realtime and batch graph-building, and logging. In this talk, I will speak about the creation and evolution of the pipeline, and a concrete example – a day in the life of an event tracking pixel. We'll also talk about common challenges that we've overcome such as integrating different pieces of the system, schema evolution, queuing, and data retention policies.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...apidays
Apidays Paris 2023 - Software and APIs for Smart, Sustainable and Sovereign Societies
December 6, 7 & 8, 2023
Forget TypeScript, Choose Rust to build Robust, Fast and Cheap APIs
Zacaria Chtatar, Backend Software Engineer at HaveSomeCode
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.
Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
15. Kafka is an event processing platform which is based on a
distributed architecture.
16. Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
17. Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
It can handle millions of messages throughput, dependent on
physical disk and RAM on the machines, and is fault tolerant.
18.
19. Messages are sent to topics which are written sequentially in
an immutable log. Kafka can support many topics and can be
replicated and partitioned.
20. Once the records are appended to the topic log
they can’t be deleted or amended.
21. (If the customer wants you to
“replay from the start”, well they can’t)
22. It’s a very simple data structure
where each message is byte encoded.
23. Each Message has:
A key
A value (payload)
A timestamp
A header (for meta data)
24. Producers and consumers can serialise and deserialise data
to various formats.
(I’ll talk about that later)
43. $ bin/kafka-topics --zookeeper localhost:2181 --delete --topic
testtopic
Topic testtopic is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set
to true.
51. Client Libraries: Java and Go.
Java: Full support for all API features.
Go: Exactly once semantics
Kafka Streams
Schema Registry
Simplified Installation
Are NOT supported.
94. At present the main broker Kafka metrics (via JMX)
kafka.server:*
kafka.controller:*
kafka.coordinator.group:*
kafka.coordinator.transaction:*
kafka.log:*
kafka.network:*
kafka.server:*
kafka.utils:*
95. Topic Level Fetch Metrics
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)
Key Name Description
fetch-size-avg The average number of bytes fetched per
request for a specific topic.
fetch-size-max The maximum number of bytes fetched per
request for a specific topic.
bytes-consumed-rate The average number of bytes consumed
per second for a specific topic.
records-per-request-avg The average number of records in each
request for a specific topic.
records-consumed-rate The average number of records consumed
per second for a specific topic.