Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand


Published on

An introduction to what Kafka is, the concepts behind it and its API.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand

  1. 1. Big Data Day LA
  2. 2. Kafka
  3. 3. About Kafka
  4. 4. Kafka is a distributed publish subscribe system It uses a commit log to track changes Kafka was originally created at LinkedIn Open sourced in 2011 Graduated to a top-level Apache project in 2012 Many Big Data projects are open source implementations of closed source products Unlike Hadoop, HBase or Cassandra, Kafka actually isn't a clone of an existing closed source product What Is Kafka?
  5. 5. A publish/subscribe is used to move data Also known as a producer/consumer system The publisher creates data Can be from any source Can be binary or text The subscriber consumes the publisher's data The subscriber will use the data for its algorithms Pub/Sub
  6. 6. Decoupling is removing knowledge about how a system flows A highly coupled system breaks when a simple change is made A highly coupled system needs to know all configurations and destinations A decoupled system is resilient to change It does not break during a change Does not need extensive knowledge about the rest of the system Decoupling
  7. 7. Kafka is proven with Big Data Kafka decouples systems Becoming common in enterprise data flows The same codebase being used for years at LinkedIn answers the questions: Does it scale? Is it fast? Is it robust? Is it production ready? Kafka supports the traditional publish/subscribe features Why Use Kafka?
  8. 8. We will now demonstrate how Kafka works with Legos Concepts shown: Publish/Subscribe Topics Partitioning Commit Logs Log compaction DEMO: Kafka With Legos
  9. 9. Kafka Internals
  10. 10. Producers publish or create the data sent on the cluster All producer data is sent over the network to the Kafka cluster All producer data is sent as keys and values The keys and values can be binary or text Publisher
  11. 11. Consumers receive the producer's data The consumers actually pull the data from the Kafka cluster The consumers receive the keys and values sent by the producer Subscriber
  12. 12. Topics are a way of grouping data together Publishers push data on a topic Consumers receive all of their data on a topic The topic must match exactly on both the publisher and consumer Topics
  13. 13. Kafka API
  14. 14. There are various ways to access Kafka The most common way is to use the Java API It is the only first class citizen Other languages have API implementations but aren't part of the Apache Kafka project The REST interface allows many languages to use Kafka This requires access to the REST Server Kafka Connect allows general purpose integrations Data can be ingested into Hadoop Data can be added to RDBMS Accessing Kafka
  15. 15. import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; Properties props = new Properties(); // Configure brokers to connect to props.put("bootstrap.servers", "broker1:9092"); // Configure serializer classes props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); KafkaProducer<String, String> producer = new KafkaProducer<String, String>( props); // Create ProducerRecord and send it String key = "mykey"; String value = "myvalue"; ProducerRecord<String, String> record = new ProducerRecord<String, String>( "my_topic", key, value); producer.send(record); producer.close(); Creating a Publisher
  16. 16. import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; String topic = "hello_topic"; Properties props = new Properties(); // Configure initial location bootstrap servers props.put("bootstrap.servers", "broker1:9092"); // Configure consumer group props.put("", "group1"); // Configure key and value deserializers props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); // Create the consumer and subscribe to the topic consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); Creating a Consumer (1/2)
  17. 17. while (true) { // Poll for ConsumerRecords for a certain amount of time ConsumerRecords<String, String> records = consumer.poll(100); // Process the ConsumerRecords, if any, that came back for (ConsumerRecord<String, String> record : records) { String key = record.key(); String value = record.value(); // Do something with message } } } public void close() { consumer.close(); } public static void main(String[] args) { MyConsumer consumer = new MyConsumer(); consumer.createConsumer(); consumer.close(); } } Creating a Consumer (2/2)
  18. 18. Current: Instructor, Thought Leader, Monkey Tamer Previously: Curriculum Developer and Instructor @ Cloudera Senior Software Engineer @ Intuit Covered, Conferences and Published In: GigaOM, ArsTecnica, Pragmatic Programmers, Strata, OSCON, Wall Street Journal, CNN, BBC, NPR See Me On: @jessetanderson About Me