SlideShare a Scribd company logo
A Lunchtime Introduction to Kafka
Wrote this….
Works with this amazing bunch.
The Rules Haven’t Changed……
You may heckle via Twitter…..
@jasonbelldata
It’s Friday, you might as well.
It’s in print, no one can argue…..
What is Kafka?
It’s an immutable log.
Think of it like this…..
My Kafka Log...
Opinionated thought
approaching.
#grumpy_yorkshireman
IT
IS
NOT
A
DATABASE
Kafka is an event processing platform which is based on a
distributed architecture.
Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
It can handle millions of messages throughput, dependent on
physical disk and RAM on the machines, and is fault tolerant.
Messages are sent to topics which are written sequentially in
an immutable log. Kafka can support many topics and can be
replicated and partitioned.
Once the records are appended to the topic log
they can’t be deleted or amended.
(If the customer wants you to
“replay from the start”, well they can’t)
It’s a very simple data structure
where each message is byte encoded.
Each Message has:
A key
A value (payload)
A timestamp
A header (for meta data)
Producers and consumers can serialise and deserialise data
to various formats.
(I’ll talk about that later)
The Kafka Ecosystem
Image source: Confluent Inc.
The Kafka Cluster
Brokers
Partitions
Topics
Creating a Topic
bin/kafka-topics --create -–zookeeper localhost:2181 --replication-factor 4 
--partitions 10 --topic testtopic --config min.insync.replicas=2
Listing Topics
bin/kafka-topics --zookeeper localhost:2181 --list
Deleting a Topic
$ bin/kafka-topics --zookeeper localhost:2181 --delete --topic
testtopic
Topic testtopic is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set
to true.
Topic Rentention
Rentention by Time
log.retention.hours
log.retention.minutes
log.retention.ms
The default is 168 hours (7 days) if not set.
Note: The smallest value setting take priority, be careful!
Rentention by Size
log.retention.bytes
Defines volume of log bytes retained. Applied per partition, so if set to 1GB
and you have 8 partitions, you are storing 8GB.
Client Libraries
C/C++ github.com/edenhill/librdkafka
Go github.com/confluentinc/confluent-kafka-go
Java Kafka Consumer and Kafka Producer
JMS JMS Client
.NET github.com/confluentinc/confluent-kafka-dotnet
Python github.com/confluentinc/confluent-kafka-python
Client Library Language Support
Client Libraries: Java and Go.
Java: Full support for all API features.
Go: Exactly once semantics
Kafka Streams
Schema Registry
Simplified Installation
Are NOT supported.
Producers
Producers
(Sends messages to the
Kafka Cluster)
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
props.put("acks", “all");
Value Action
-1/all Message has been received by the leader and followers.
0 Fire and Forget
1 Once the message is received by the leader.
public class ProducerWithCallback implements ProducerInterceptor{
private int onSendCount;
private int onAckCount;
private final Logger logger = LoggerFactory.getLogger(ProducerWithCallback.class);
@Override
public ProducerRecord onSend(final ProducerRecord record) {
onSendCount++;
System.out.println(String.format("onSend topic=%s key=%s value=%s %d n",
record.topic(), record.key(), record.value().toString(),
record.partition()));
return record;
}
@Override
public void onAcknowledgement(final RecordMetadata metadata, final Exception exception) {
onAckCount++;
System.out.println(String.format("onAck topic=%s, part=%d, offset=%dn",
metadata.topic(), metadata.partition(), metadata.offset()));
}
@Override
public void close() {
System.out.println("Total sent: " + onSendCount);
System.out.println("Total acks: " + onAckCount);
}
@Override
public void configure(Map<String,?> configs) {
}
}
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
//Only one in-flight messages per Kafka broker connection
// - max.in.flight.requests.per.connection (default 5)
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,1);
//Set the number of retries - retries
props.put(ProducerConfig.RETRIES_CONFIG, 3);
//Request timeout - request.timeout.ms
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 15000);
//Only retry after one second.
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
byte[] Serdes.ByteArray(), Serdes.Bytes()
ByteBuffer Serdes.ByteBuffer()
Double Serdes.Double()
Integer Serdes.Integer()
Long Serdes.Long()
String Serdes.String()
JSON JsonPOJOSerializer()/Deserializer()
Serialize/Deserialize Types
Consumers
public class SimpleConsumer {
public static void main(String[] args) throws Exception {
if(args.length == 0){
System.out.println("Enter topic name");
return;
}
//Kafka consumer configuration settings
String topicName = args[0].toString();
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer
<String, String>(props);
//Kafka Consumer subscribes list of topics here.
consumer.subscribe(Arrays.asList(topicName))
//print the topic name
System.out.println("Subscribed to topic " + topicName);
int i = 0;
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
// print the offset,key and value for the consumer records.
System.out.printf("offset = %d, key = %s, value = %sn",
record.offset(), record.key(), record.value());
}
}
}
Consumer Groups
Consumer Group Offsets
The offset is a simple integer number that is used
by Kafka to maintain the current position of a
consumer.
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group mygroup
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mygroup t1 0 1 3 2 test-consumer
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mygroup t1 0 1 12 11 test-consumer
Streaming API
public class WordCountLambdaExample {
static final String inputTopic = "streams-plaintext-input";
static final String outputTopic = "streams-wordcount-output";
/**
* The Streams application as a whole can be launched like any normal Java application that has a `main()` method.
*/
public static void main(final String[] args) {
final String bootstrapServers = args.length > 0 ? args[0] : "localhost:9092";
// Configure the Streams application.
final Properties streamsConfiguration = getStreamsConfiguration(bootstrapServers);
// Define the processing topology of the Streams application.
final StreamsBuilder builder = new StreamsBuilder();
createWordCountStream(builder);
final KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
static Properties getStreamsConfiguration(final String bootstrapServers) {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "wordcount-lambda-example-client");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getAbsolutePath());
return streamsConfiguration;
}
static void createWordCountStream(final StreamsBuilder builder) {
final KStream<String, String> textLines = builder.stream(inputTopic);
final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((keyIgnored, word) -> word)
.count();
wordCounts.toStream().to(outputTopic, Produced.with(Serdes.String(), Serdes.Long()));
}
}
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Supports exactly once transactions! Yay!
I know what you’re thinking…...
processing.guarantee=exactly_once
What is Kafka Connect?
Provides a mechanism for
data sources and data
sinks.
Image source: Confluent Inc.
Data Source
Data from a thing (Database,
Twitter stream, Logstash etc)
going to Kafka.
Data Sink
Data going from Kafka to a
thing (Database, file,
ElasticSearch)
There are many
community connector
plugins that will fit most
needs.
A Source Example
{
"name": "twitter_source_json_01",
"config": {
"connector.class":
"com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
"twitter.oauth.accessToken": "xxxxxxxxxxxxx",
"twitter.oauth.consumerSecret": "xxxxxxxxxxxx",
"twitter.oauth.consumerKey": "xxxxxxxxxxxx",
"twitter.oauth.accessTokenSecret": "xxxxxxxxxxx",
"kafka.delete.topic": "twitter_deletes_json_01",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false,
"key.converter.schemas.enable": false,
"kafka.status.topic": "twitter_json_01",
"process.deletes": true,
"filter.keywords": "#instil, #virtualdevbash"
}
}
curl -XPOST -H ‘Content-type: application/json’ -d @myconnector.json 
http://localhost:8083/connectors
A Sink Example
{
"name": "string-jdbc-sink",
"config": {
"connector.class": “com.domain.test.StringJDBCSink”,
"connector.type": "sink",
"tasks.max": "1",
"topics": “topic_to_read”,
"topic.type": "avro",
"tablename":"testtopic",
"connection.user": "dbuser",
"connection.password": “dbpassword_encoded”,
"connection.url": "jdbc:mysql://localhost:3307/connecttest",
"query.string": "INSERT INTO testtopic (payload) VALUES(?);",
"db.driver": "com.mysql.jdbc.Driver",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081"
}
}
Dead Letter Queues
Sometimes stuff doesn’t
work……
Image source: Confluent Inc.
“errors.tolerance":"all",
"errors.log.enable":true,
"errors.log.include.messages":true
“errors.deadletterqueue.topic.name":"dlq_topic_name",
"errors.deadletterqueue.topic.replication.factor":1,
"errors.deadletterqueue.context.headers.enable":true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
Monitoring
At present the main broker Kafka metrics (via JMX)
kafka.server:*
kafka.controller:*
kafka.coordinator.group:*
kafka.coordinator.transaction:*
kafka.log:*
kafka.network:*
kafka.server:*
kafka.utils:*
Topic Level Fetch Metrics
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)
Key Name Description
fetch-size-avg The average number of bytes fetched per
request for a specific topic.
fetch-size-max The maximum number of bytes fetched per
request for a specific topic.
bytes-consumed-rate The average number of bytes consumed
per second for a specific topic.
records-per-request-avg The average number of records in each
request for a specific topic.
records-consumed-rate The average number of records consumed
per second for a specific topic.
Producer Level Application Metrics (using JConsole)
Consumer Level Application Metrics (using JConsole)
Thank you.
Twitter: @digitalis_io Web: https://digitalis.io

More Related Content

What's hot

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark Streaming
Sigmoid
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
NAVER D2
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with geventMahendra M
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
Farzad Nozarian
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Andrea Iacono
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
Scott Leberknight
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!
Dainius Jocas
 
Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2
Alexander Shulgin
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
Tanel Poder
 

What's hot (20)

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark Streaming
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with gevent
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Scala+data
Scala+dataScala+data
Scala+data
 
Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!Don't change the partition count for kafka topics!
Don't change the partition count for kafka topics!
 
Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2Adding replication protocol support for psycopg2
Adding replication protocol support for psycopg2
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 

Similar to Virtual Bash! A Lunchtime Introduction to Kafka

Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Jean-Baptiste Onofré
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Microsoft Tech Community
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
Marilyn Waldman
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
Toby Matejovsky
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Caerusone
CaerusoneCaerusone
Caerusone
tech caersusoft
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
Peter Lawrey
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
Ravi Raj
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
Sadayuki Furuhashi
 

Similar to Virtual Bash! A Lunchtime Introduction to Kafka (20)

Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Caerusone
CaerusoneCaerusone
Caerusone
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 

Virtual Bash! A Lunchtime Introduction to Kafka

  • 2. Wrote this…. Works with this amazing bunch.
  • 3. The Rules Haven’t Changed…… You may heckle via Twitter….. @jasonbelldata It’s Friday, you might as well. It’s in print, no one can argue…..
  • 6.
  • 7. Think of it like this…..
  • 10. IT
  • 11. IS
  • 12. NOT
  • 13. A
  • 15. Kafka is an event processing platform which is based on a distributed architecture.
  • 16. Kafka is an event processing platform which is based on a distributed architecture. To producers and consumer subscribed to the system it would appear as a standalone processing engine but production systems are built on many machines.
  • 17. Kafka is an event processing platform which is based on a distributed architecture. To producers and consumer subscribed to the system it would appear as a standalone processing engine but production systems are built on many machines. It can handle millions of messages throughput, dependent on physical disk and RAM on the machines, and is fault tolerant.
  • 18.
  • 19. Messages are sent to topics which are written sequentially in an immutable log. Kafka can support many topics and can be replicated and partitioned.
  • 20. Once the records are appended to the topic log they can’t be deleted or amended.
  • 21. (If the customer wants you to “replay from the start”, well they can’t)
  • 22. It’s a very simple data structure where each message is byte encoded.
  • 23. Each Message has: A key A value (payload) A timestamp A header (for meta data)
  • 24. Producers and consumers can serialise and deserialise data to various formats. (I’ll talk about that later)
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 35.
  • 36.
  • 39. bin/kafka-topics --create -–zookeeper localhost:2181 --replication-factor 4 --partitions 10 --topic testtopic --config min.insync.replicas=2
  • 43. $ bin/kafka-topics --zookeeper localhost:2181 --delete --topic testtopic Topic testtopic is marked for deletion. Note: This will have no impact if delete.topic.enable is not set to true.
  • 46. log.retention.hours log.retention.minutes log.retention.ms The default is 168 hours (7 days) if not set. Note: The smallest value setting take priority, be careful!
  • 48. log.retention.bytes Defines volume of log bytes retained. Applied per partition, so if set to 1GB and you have 8 partitions, you are storing 8GB.
  • 50. C/C++ github.com/edenhill/librdkafka Go github.com/confluentinc/confluent-kafka-go Java Kafka Consumer and Kafka Producer JMS JMS Client .NET github.com/confluentinc/confluent-kafka-dotnet Python github.com/confluentinc/confluent-kafka-python Client Library Language Support
  • 51. Client Libraries: Java and Go. Java: Full support for all API features. Go: Exactly once semantics Kafka Streams Schema Registry Simplified Installation Are NOT supported.
  • 53. Producers (Sends messages to the Kafka Cluster)
  • 54. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 55. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 56. props.put("acks", “all"); Value Action -1/all Message has been received by the leader and followers. 0 Fire and Forget 1 Once the message is received by the leader.
  • 57.
  • 58. public class ProducerWithCallback implements ProducerInterceptor{ private int onSendCount; private int onAckCount; private final Logger logger = LoggerFactory.getLogger(ProducerWithCallback.class); @Override public ProducerRecord onSend(final ProducerRecord record) { onSendCount++; System.out.println(String.format("onSend topic=%s key=%s value=%s %d n", record.topic(), record.key(), record.value().toString(), record.partition())); return record; } @Override public void onAcknowledgement(final RecordMetadata metadata, final Exception exception) { onAckCount++; System.out.println(String.format("onAck topic=%s, part=%d, offset=%dn", metadata.topic(), metadata.partition(), metadata.offset())); } @Override public void close() { System.out.println("Total sent: " + onSendCount); System.out.println("Total acks: " + onAckCount); } @Override public void configure(Map<String,?> configs) { } }
  • 59. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 60. //Only one in-flight messages per Kafka broker connection // - max.in.flight.requests.per.connection (default 5) props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,1); //Set the number of retries - retries props.put(ProducerConfig.RETRIES_CONFIG, 3); //Request timeout - request.timeout.ms props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 15000); //Only retry after one second. props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
  • 61. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 62. byte[] Serdes.ByteArray(), Serdes.Bytes() ByteBuffer Serdes.ByteBuffer() Double Serdes.Double() Integer Serdes.Integer() Long Serdes.Long() String Serdes.String() JSON JsonPOJOSerializer()/Deserializer() Serialize/Deserialize Types
  • 64. public class SimpleConsumer { public static void main(String[] args) throws Exception { if(args.length == 0){ System.out.println("Enter topic name"); return; } //Kafka consumer configuration settings String topicName = args[0].toString(); Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("session.timeout.ms", "30000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer <String, String>(props); //Kafka Consumer subscribes list of topics here. consumer.subscribe(Arrays.asList(topicName)) //print the topic name System.out.println("Subscribed to topic " + topicName); int i = 0; while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) // print the offset,key and value for the consumer records. System.out.printf("offset = %d, key = %s, value = %sn", record.offset(), record.key(), record.value()); } } }
  • 66.
  • 68. The offset is a simple integer number that is used by Kafka to maintain the current position of a consumer.
  • 69. kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group mygroup GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER mygroup t1 0 1 3 2 test-consumer
  • 70. GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER mygroup t1 0 1 12 11 test-consumer
  • 72. public class WordCountLambdaExample { static final String inputTopic = "streams-plaintext-input"; static final String outputTopic = "streams-wordcount-output"; /** * The Streams application as a whole can be launched like any normal Java application that has a `main()` method. */ public static void main(final String[] args) { final String bootstrapServers = args.length > 0 ? args[0] : "localhost:9092"; // Configure the Streams application. final Properties streamsConfiguration = getStreamsConfiguration(bootstrapServers); // Define the processing topology of the Streams application. final StreamsBuilder builder = new StreamsBuilder(); createWordCountStream(builder); final KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration); streams.cleanUp(); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); } static Properties getStreamsConfiguration(final String bootstrapServers) { final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example"); streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "wordcount-lambda-example-client"); streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000); streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0); streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getAbsolutePath()); return streamsConfiguration; } static void createWordCountStream(final StreamsBuilder builder) { final KStream<String, String> textLines = builder.stream(inputTopic); final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS); final KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase()))) .groupBy((keyIgnored, word) -> word) .count(); wordCounts.toStream().to(outputTopic, Produced.with(Serdes.String(), Serdes.Long())); } }
  • 74. Supports exactly once transactions! Yay!
  • 75. I know what you’re thinking…...
  • 76.
  • 78. What is Kafka Connect?
  • 79. Provides a mechanism for data sources and data sinks.
  • 81. Data Source Data from a thing (Database, Twitter stream, Logstash etc) going to Kafka.
  • 82. Data Sink Data going from Kafka to a thing (Database, file, ElasticSearch)
  • 83. There are many community connector plugins that will fit most needs.
  • 85. { "name": "twitter_source_json_01", "config": { "connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector", "twitter.oauth.accessToken": "xxxxxxxxxxxxx", "twitter.oauth.consumerSecret": "xxxxxxxxxxxx", "twitter.oauth.consumerKey": "xxxxxxxxxxxx", "twitter.oauth.accessTokenSecret": "xxxxxxxxxxx", "kafka.delete.topic": "twitter_deletes_json_01", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": false, "key.converter.schemas.enable": false, "kafka.status.topic": "twitter_json_01", "process.deletes": true, "filter.keywords": "#instil, #virtualdevbash" } }
  • 86. curl -XPOST -H ‘Content-type: application/json’ -d @myconnector.json http://localhost:8083/connectors
  • 88. { "name": "string-jdbc-sink", "config": { "connector.class": “com.domain.test.StringJDBCSink”, "connector.type": "sink", "tasks.max": "1", "topics": “topic_to_read”, "topic.type": "avro", "tablename":"testtopic", "connection.user": "dbuser", "connection.password": “dbpassword_encoded”, "connection.url": "jdbc:mysql://localhost:3307/connecttest", "query.string": "INSERT INTO testtopic (payload) VALUES(?);", "db.driver": "com.mysql.jdbc.Driver", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081" } }
  • 94. At present the main broker Kafka metrics (via JMX) kafka.server:* kafka.controller:* kafka.coordinator.group:* kafka.coordinator.transaction:* kafka.log:* kafka.network:* kafka.server:* kafka.utils:*
  • 95. Topic Level Fetch Metrics MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) Key Name Description fetch-size-avg The average number of bytes fetched per request for a specific topic. fetch-size-max The maximum number of bytes fetched per request for a specific topic. bytes-consumed-rate The average number of bytes consumed per second for a specific topic. records-per-request-avg The average number of records in each request for a specific topic. records-consumed-rate The average number of records consumed per second for a specific topic.
  • 96. Producer Level Application Metrics (using JConsole)
  • 97. Consumer Level Application Metrics (using JConsole)
  • 98. Thank you. Twitter: @digitalis_io Web: https://digitalis.io