SlideShare a Scribd company logo
Introduction to Apache Kafka
Chris Curtin
Head of Technical Research

Atlanta Java Users Group March 2013
About Me
• 20+ years in technology
• Head of Technical Research at Silverpop (12 + years at Silverpop)
• Built a SaaS platform before the term ‘SaaS’ was being used
• Prior to Silverpop: real-time control systems, factory automation
  and warehouse management
• Always looking for technologies and algorithms to help with our
  challenges
• Car nut


                                                                      2
Silverpop Open Positions
•   Senior Software Engineer (Java, Oracle, Spring, Hibernate, MongoDB)
•   Senior Software Engineer – MIS (.NET stack)
•   Software Engineer
•   Software Engineer – Integration Services (PHP, MySQL)
•   Delivery Manager – Engineering
•   Technical Lead – Engineering
•   Technical Project Manager – Integration Services
•   http://www.silverpop.com – Go to Careers under About


                                                                          3
Caveats
• We don’t use Kafka in production
• I don’t have any experience with Kafka in operations
• I am not an expert on messaging systems/JMS/MQSeries etc.




                                                              4
Apache Kafka – from Apache
• Apache Kafka is a distributed publish-subscribe messaging system.
  It is designed to support the following
   – Persistent messaging with O(1) disk structures that provide constant
     time performance even with many TB of stored messages.
   – High-throughput: even with very modest hardware Kafka can support
     hundreds of thousands of messages per second.
   – Explicit support for partitioning messages over Kafka servers and
     distributing consumption over a cluster of consumer machines while
     maintaining per-partition ordering semantics.
   – Support for parallel data load into Hadoop.

                                                                        5
Background
• LinkedIn product donated to Apache
• Most core developers are from LinkedIn
• Pretty good pickup outside of LinkedIn: Air BnB & Urban Airship for
  example
• Fun fact: no logo yet




                                                                    6
Why?




       Data Integration

                          7
Point to Point integration (thanks to LinkedIn for slide)




                                                            8
9
(Thanks to http://linkstate.wordpress.com/2011/04/19/recabling-project/)




                                                                           10
What we’d really like (thanks to LinkedIn for slide)




                                                       11
Looks Familiar: JMS to the rescue!



                                     12
Okay: Data warehouse to the rescue!



                                      13
Okay: CICS to the rescue!



                            14
Kafka changes the paradigm




   Kafka doesn’t keep track of who consumed which message




                                                            15
Consumption Management
• Kafka leaves management of what was consumed up to the
  business logic
• Each message has a unique identifier (within the topic and
  partition)
• Consumers can ask for message by identifier, even if they are days
  old
• Identifiers are sequential within a topic and partition.




                                                                       16
Why is Kafka Interesting?




    Horizontally scalable messaging system



                                             17
Terminology
•   Topics are the main grouping mechanism for messages
•   Brokers store the messages, take care of redundancy issues
•   Producers write messages to a broker for a specific topic
•   Consumers read from Brokers for a specific topic
•   Topics can be further segmented by partitions
•   Consumers can read a specific partition from a Topic




                                                                 18
API (1 of 25) - Basics
• Producer: send(String topic, String key, Message message)
• Consumer: Iterator<Message> fetch(…)




                                                              19
API
• Just kidding – that’s pretty much it for the API

• Minor variation on the consumer for ‘Simple’ consumers but that’s
  really it
• ‘under the covers’ functions to get current offsets or implement
  non-trivial consumers




                                                                      20
Architecture (thanks to LinkedIn for slide)




                                              21
Producers
• Pretty Basic API
• Partitioning is a little odd, requires Producers to know about
  partition scheme
• Producers DO NOT know about consumers




                                                                   22
Consumers: Consumer Groups
• Easiest to get started with
• Kafka makes sure only one thread in the group sees a message for
  a topic (or a message within a Partition)
• Uses Zookeeper to keep track of what messages were consumed in
  which topic/partitions
• No ‘once and only once’ delivery semantics here
• Rebalance may mean a message gets replayed




                                                                 23
Consumers: Simple Consumer
• Consumer subscribes to a specific topic and partition
• Consumer has to keep track of what message offset was last
  consumed
• A lot more error handling required if Brokers have issues
• But a lot more control over which messages are read. Does allow
  for ‘exactly once’ messaging




                                                                    24
Consumer Model Design
• Partition design impacts overall throughput
    – Producers know partitioning class
    – Producers write to single Broker ‘leader’ for a partition
• Offsets as only transaction identifier complicates consumer
    – ‘throw more hardware’ at the backlog is complicated
    – Consumer Groups == 1 thread per partition
        • If expensive operations can’t throw more threads at it
• Not a lot of ‘real world’ examples on balancing # of topics vs. # of
  partitions

                                                                         25
Why is Kafka Interesting?



            Memory Mapped Files



           Kernel-space processing
                                     26
What is a commit log? (thanks to LinkedIn for slide)




                                                       27
Brokers
•   Lightweight, very fast message storing
•   Writes messages to disk using kernel space NOT JVM
•   Uses OS Pagecache
•   Data is stored in flat files on disk, directory per topic and partition
•   Handles the replication




                                                                              28
Brokers continued
• Very low memory utilization – almost nothing is held in memory
• (Remember, Broker doesn’t keep track of who has consumed a
  message)
• Handle TTL operations on data
• Drop a file when the data is too old




                                                                   29
Why is Kafka Interesting?

                 Stuff just works

 Producers and Consumers are about business
                   logic


                                              30
Consumer Use Case: batch loading
•   Consumers don’t have to be online all the time
•   Wake up every hour, ask Kafka for events since last request
•   Load into a database, push to external systems etc.
•   Load into Hadoop (Stream if using MapR)




                                                                  31
Consumer Use Case: Complex Event Processing
• Feed to Storm or similar CEP
• Partition on user id, subsystem, product etc. independent of
  Kafka’s partition
• Execute rules on the data
• Made a mistake? Replay the events and fix it




                                                                 32
Consumer Use Case: Operations Logs
• Load ‘old’ operational messages to debug problems
• Do it without impacting production systems
  (remember, consumers can start at any offset!)
• Have business logic write to different output store than
  production, but drive off production data




                                                             33
Adding New Business Logic (thanks to LinkedIn for slide)




                                                           34
Adding Producers
• Define Topics and # of partitions via Kafka tools
• (possibly tell Kafka to balance leaders across machines)
• Start producing




                                                             35
Adding Consumers
• Using Kafka adding consumers doesn’t impact producers
• Minor impact on Brokers (just keeping track of connections)




                                                                36
Producer Code
public class TestProducer {
 public static void main(String[] args) {
   long events = Long.parseLong(args[0]);
    long blocks = Long.parseLong(args[1]);
    Random rnd = new Random();

        Properties props = new Properties();
        props.put("broker.list", "vrd01.atlnp1:9092,vrd02.atlnp1:9092,vrd03.atlnp1:9092");
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("partitioner.class", "com.silverpop.kafka.playproducer.OrganizationPartitioner");
        ProducerConfig config = new ProducerConfig(props);

        Producer<Integer, String> producer = new Producer<Integer, String>(config);
          for (long nBlocks = 0; nBlocks < blocks; nBlocks++) {
           for (long nEvents = 0; nEvents < events; nEvents++) {
             long runtime = new Date().getTime();
             String msg = runtime + "," + (50 + nBlocks) + "," + nEvents+ "," + rnd.nextInt(1000);
             String key = String.valueOf(orgId);
             KeyedMessage<Integer, String> data = new KeyedMessage<Integer, String>("test1", key, msg);
             producer.send(data);
           }
        }
        producer.close();
    }
}
                                                                                                          37
Simple Consumer Code
 String topic = "test1";
 int partition = 0;
 SimpleConsumer simpleConsumer = new SimpleConsumer("vrd01.atlnp1", 9092,100000, 64 * 1024, "test");
 boolean loop = true;
 long maxOffset = -1;
 while (loop) {
   FetchRequest req = new FetchRequestBuilder().clientId("randomClient")
        .addFetch(topic, partition, maxOffset+1, 100000)
        .build();
   FetchResponse fetchResponse = simpleConsumer.fetch(req);
   loop = false;
   for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(topic, partition)) {
      loop = true;
      ByteBuffer payload = messageAndOffset.message().payload();
      maxOffset = messageAndOffset.offset();
      byte[] bytes = new byte[payload.limit()];
      payload.get(bytes);
      System.out.println(String.valueOf(maxOffset) + ": " + new String(bytes, "UTF-8"));
   }
 }
 simpleConsumer.close();




                                                                                                       38
Consumer Groups Code
// create 4 partitions of the stream for topic “test”, to allow 4 threads to consume
Map<String, List<KafkaStream<Message>>> topicMessageStreams =
   consumerConnector.createMessageStreams(ImmutableMap.of("test", 4));
List<KafkaStream<Message>> streams = topicMessageStreams.get("test");

// create list of 4 threads to consume from each of the partitions
ExecutorService executor = Executors.newFixedThreadPool(4);

// consume the messages in the threads
for(final KafkaStream<Message> stream: streams) {
  executor.submit(new Runnable() {
   public void run() {
     for(MessageAndMetadata msgAndMetadata: stream) {
       // process message (msgAndMetadata.message())
     }
   }
  });
}

                                                                                       39
Demo
•   4- node Kafka cluster
•   4 – node Storm cluster
•   4 – node MongoDB cluster
•   Test Producer in IntelliJ creates website events into Kafka
•   Storm-Kafka Spout reads from Kafka into Storm topology
     – Trident groups by organization and counts visits by day
• Trident end point writes to MongoDB
• MongoDB shell query to see counts change


                                                                  40
LinkedIn Clusters (2012 presentation)
• 8 nodes per datacenter
   – ~20 GB RAM available for Kafka
   – 6TB storage, RAID 10, basic SATA drives
• 10,000 connections into the cluster for both production and
  consumption




                                                                41
Performance (LinkedIn 2012 presentation)
• 10 billion messages/day
• Sustained peak:
    – 172,000 messages/second written
    – 950,000 messages/second read
•   367 topics
•   40 real-time consumers
•   Many ad hoc consumers
•   10k connections/colo
•   9.5TB log retained
•   End-to-end delivery time: 10 seconds (avg)
                                                 42
Questions so far?




                    43
Something completely Different
• Nathan Marz (twitter, BackType)
• Creator of Storm




                                    44
Immutable Applications
• No updates to data
• Either insert or delete
• ‘Functional Applications’

• http://manning.com/marz/BD_meap_ch01.pdf




                                             45
(thanks to LinkedIn for slide)




                                 46
Information
• Apache Kafka site: http://kafka.apache.org/
• List of presentations:
  https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers
  +and+presentations
• Kafka wiki:
  https://cwiki.apache.org/confluence/display/KAFKA/Index
• Paper: http://sites.computer.org/debull/A12june/pipeline.pdf
• Slides: http://www.slideshare.net/chriscurtin
• Me: ccurtin@silverpop.com @ChrisCurtin on twitter

                                                                   47
Silverpop Open Positions
•   Senior Software Engineer (Java, Oracle, Spring, Hibernate, MongoDB)
•   Senior Software Engineer – MIS (.NET stack)
•   Software Engineer
•   Software Engineer – Integration Services (PHP, MySQL)
•   Delivery Manager – Engineering
•   Technical Lead – Engineering
•   Technical Project Manager – Integration Services
•   http://www.silverpop.com/marketing-company/careers/open-
    positions.html

                                                                          48

More Related Content

What's hot

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Rahul Jain
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 

What's hot (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 

Viewers also liked

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Guozhang Wang
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
dave_revell
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
Datio Big Data
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
confluent
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
Knoldus Inc.
 
Kafka connect
Kafka connectKafka connect
Kafka connect
Andrew Stevenson
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Michael Noll
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 

Viewers also liked (15)

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similar to Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
Paolo Platter
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
All Things Open
 
Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computer
Peter Bryzgalov
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
John Bennett
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
Comsysto Reply GmbH
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
Comsysto Reply GmbH
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
WaveMaker, Inc.
 
Avogadro, Open Chemistry and Semantics
Avogadro, Open Chemistry and SemanticsAvogadro, Open Chemistry and Semantics
Avogadro, Open Chemistry and Semantics
Marcus Hanwell
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache Kafka
Daniel Muñoz Garrido
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB
 

Similar to Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013 (20)

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
 
Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computer
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
 
Avogadro, Open Chemistry and Semantics
Avogadro, Open Chemistry and SemanticsAvogadro, Open Chemistry and Semantics
Avogadro, Open Chemistry and Semantics
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache Kafka
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 

More from Christopher Curtin

2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
Christopher Curtin
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Christopher Curtin
 
Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
Christopher Curtin
 
2011 march cloud computing atlanta
2011 march cloud computing atlanta2011 march cloud computing atlanta
2011 march cloud computing atlanta
Christopher Curtin
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
Christopher Curtin
 
AJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleAJUG April 2011 Cascading example
AJUG April 2011 Cascading example
Christopher Curtin
 
AJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop exampleAJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop example
Christopher Curtin
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
Christopher Curtin
 
Nosql East October 2009
Nosql East October 2009Nosql East October 2009
Nosql East October 2009
Christopher Curtin
 
IASA Atlanta September 2009
IASA Atlanta September 2009IASA Atlanta September 2009
IASA Atlanta September 2009
Christopher Curtin
 
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
Christopher Curtin
 

More from Christopher Curtin (12)

2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
 
Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
 
2011 march cloud computing atlanta
2011 march cloud computing atlanta2011 march cloud computing atlanta
2011 march cloud computing atlanta
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
 
AJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleAJUG April 2011 Cascading example
AJUG April 2011 Cascading example
 
AJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop exampleAJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop example
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
Nosql East October 2009
Nosql East October 2009Nosql East October 2009
Nosql East October 2009
 
IASA Atlanta September 2009
IASA Atlanta September 2009IASA Atlanta September 2009
IASA Atlanta September 2009
 
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
 

Recently uploaded

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013

  • 1. Introduction to Apache Kafka Chris Curtin Head of Technical Research Atlanta Java Users Group March 2013
  • 2. About Me • 20+ years in technology • Head of Technical Research at Silverpop (12 + years at Silverpop) • Built a SaaS platform before the term ‘SaaS’ was being used • Prior to Silverpop: real-time control systems, factory automation and warehouse management • Always looking for technologies and algorithms to help with our challenges • Car nut 2
  • 3. Silverpop Open Positions • Senior Software Engineer (Java, Oracle, Spring, Hibernate, MongoDB) • Senior Software Engineer – MIS (.NET stack) • Software Engineer • Software Engineer – Integration Services (PHP, MySQL) • Delivery Manager – Engineering • Technical Lead – Engineering • Technical Project Manager – Integration Services • http://www.silverpop.com – Go to Careers under About 3
  • 4. Caveats • We don’t use Kafka in production • I don’t have any experience with Kafka in operations • I am not an expert on messaging systems/JMS/MQSeries etc. 4
  • 5. Apache Kafka – from Apache • Apache Kafka is a distributed publish-subscribe messaging system. It is designed to support the following – Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages. – High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second. – Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics. – Support for parallel data load into Hadoop. 5
  • 6. Background • LinkedIn product donated to Apache • Most core developers are from LinkedIn • Pretty good pickup outside of LinkedIn: Air BnB & Urban Airship for example • Fun fact: no logo yet 6
  • 7. Why? Data Integration 7
  • 8. Point to Point integration (thanks to LinkedIn for slide) 8
  • 9. 9
  • 11. What we’d really like (thanks to LinkedIn for slide) 11
  • 12. Looks Familiar: JMS to the rescue! 12
  • 13. Okay: Data warehouse to the rescue! 13
  • 14. Okay: CICS to the rescue! 14
  • 15. Kafka changes the paradigm Kafka doesn’t keep track of who consumed which message 15
  • 16. Consumption Management • Kafka leaves management of what was consumed up to the business logic • Each message has a unique identifier (within the topic and partition) • Consumers can ask for message by identifier, even if they are days old • Identifiers are sequential within a topic and partition. 16
  • 17. Why is Kafka Interesting? Horizontally scalable messaging system 17
  • 18. Terminology • Topics are the main grouping mechanism for messages • Brokers store the messages, take care of redundancy issues • Producers write messages to a broker for a specific topic • Consumers read from Brokers for a specific topic • Topics can be further segmented by partitions • Consumers can read a specific partition from a Topic 18
  • 19. API (1 of 25) - Basics • Producer: send(String topic, String key, Message message) • Consumer: Iterator<Message> fetch(…) 19
  • 20. API • Just kidding – that’s pretty much it for the API • Minor variation on the consumer for ‘Simple’ consumers but that’s really it • ‘under the covers’ functions to get current offsets or implement non-trivial consumers 20
  • 21. Architecture (thanks to LinkedIn for slide) 21
  • 22. Producers • Pretty Basic API • Partitioning is a little odd, requires Producers to know about partition scheme • Producers DO NOT know about consumers 22
  • 23. Consumers: Consumer Groups • Easiest to get started with • Kafka makes sure only one thread in the group sees a message for a topic (or a message within a Partition) • Uses Zookeeper to keep track of what messages were consumed in which topic/partitions • No ‘once and only once’ delivery semantics here • Rebalance may mean a message gets replayed 23
  • 24. Consumers: Simple Consumer • Consumer subscribes to a specific topic and partition • Consumer has to keep track of what message offset was last consumed • A lot more error handling required if Brokers have issues • But a lot more control over which messages are read. Does allow for ‘exactly once’ messaging 24
  • 25. Consumer Model Design • Partition design impacts overall throughput – Producers know partitioning class – Producers write to single Broker ‘leader’ for a partition • Offsets as only transaction identifier complicates consumer – ‘throw more hardware’ at the backlog is complicated – Consumer Groups == 1 thread per partition • If expensive operations can’t throw more threads at it • Not a lot of ‘real world’ examples on balancing # of topics vs. # of partitions 25
  • 26. Why is Kafka Interesting? Memory Mapped Files Kernel-space processing 26
  • 27. What is a commit log? (thanks to LinkedIn for slide) 27
  • 28. Brokers • Lightweight, very fast message storing • Writes messages to disk using kernel space NOT JVM • Uses OS Pagecache • Data is stored in flat files on disk, directory per topic and partition • Handles the replication 28
  • 29. Brokers continued • Very low memory utilization – almost nothing is held in memory • (Remember, Broker doesn’t keep track of who has consumed a message) • Handle TTL operations on data • Drop a file when the data is too old 29
  • 30. Why is Kafka Interesting? Stuff just works Producers and Consumers are about business logic 30
  • 31. Consumer Use Case: batch loading • Consumers don’t have to be online all the time • Wake up every hour, ask Kafka for events since last request • Load into a database, push to external systems etc. • Load into Hadoop (Stream if using MapR) 31
  • 32. Consumer Use Case: Complex Event Processing • Feed to Storm or similar CEP • Partition on user id, subsystem, product etc. independent of Kafka’s partition • Execute rules on the data • Made a mistake? Replay the events and fix it 32
  • 33. Consumer Use Case: Operations Logs • Load ‘old’ operational messages to debug problems • Do it without impacting production systems (remember, consumers can start at any offset!) • Have business logic write to different output store than production, but drive off production data 33
  • 34. Adding New Business Logic (thanks to LinkedIn for slide) 34
  • 35. Adding Producers • Define Topics and # of partitions via Kafka tools • (possibly tell Kafka to balance leaders across machines) • Start producing 35
  • 36. Adding Consumers • Using Kafka adding consumers doesn’t impact producers • Minor impact on Brokers (just keeping track of connections) 36
  • 37. Producer Code public class TestProducer { public static void main(String[] args) { long events = Long.parseLong(args[0]); long blocks = Long.parseLong(args[1]); Random rnd = new Random(); Properties props = new Properties(); props.put("broker.list", "vrd01.atlnp1:9092,vrd02.atlnp1:9092,vrd03.atlnp1:9092"); props.put("serializer.class", "kafka.serializer.StringEncoder"); props.put("partitioner.class", "com.silverpop.kafka.playproducer.OrganizationPartitioner"); ProducerConfig config = new ProducerConfig(props); Producer<Integer, String> producer = new Producer<Integer, String>(config); for (long nBlocks = 0; nBlocks < blocks; nBlocks++) { for (long nEvents = 0; nEvents < events; nEvents++) { long runtime = new Date().getTime(); String msg = runtime + "," + (50 + nBlocks) + "," + nEvents+ "," + rnd.nextInt(1000); String key = String.valueOf(orgId); KeyedMessage<Integer, String> data = new KeyedMessage<Integer, String>("test1", key, msg); producer.send(data); } } producer.close(); } } 37
  • 38. Simple Consumer Code String topic = "test1"; int partition = 0; SimpleConsumer simpleConsumer = new SimpleConsumer("vrd01.atlnp1", 9092,100000, 64 * 1024, "test"); boolean loop = true; long maxOffset = -1; while (loop) { FetchRequest req = new FetchRequestBuilder().clientId("randomClient") .addFetch(topic, partition, maxOffset+1, 100000) .build(); FetchResponse fetchResponse = simpleConsumer.fetch(req); loop = false; for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(topic, partition)) { loop = true; ByteBuffer payload = messageAndOffset.message().payload(); maxOffset = messageAndOffset.offset(); byte[] bytes = new byte[payload.limit()]; payload.get(bytes); System.out.println(String.valueOf(maxOffset) + ": " + new String(bytes, "UTF-8")); } } simpleConsumer.close(); 38
  • 39. Consumer Groups Code // create 4 partitions of the stream for topic “test”, to allow 4 threads to consume Map<String, List<KafkaStream<Message>>> topicMessageStreams = consumerConnector.createMessageStreams(ImmutableMap.of("test", 4)); List<KafkaStream<Message>> streams = topicMessageStreams.get("test"); // create list of 4 threads to consume from each of the partitions ExecutorService executor = Executors.newFixedThreadPool(4); // consume the messages in the threads for(final KafkaStream<Message> stream: streams) { executor.submit(new Runnable() { public void run() { for(MessageAndMetadata msgAndMetadata: stream) { // process message (msgAndMetadata.message()) } } }); } 39
  • 40. Demo • 4- node Kafka cluster • 4 – node Storm cluster • 4 – node MongoDB cluster • Test Producer in IntelliJ creates website events into Kafka • Storm-Kafka Spout reads from Kafka into Storm topology – Trident groups by organization and counts visits by day • Trident end point writes to MongoDB • MongoDB shell query to see counts change 40
  • 41. LinkedIn Clusters (2012 presentation) • 8 nodes per datacenter – ~20 GB RAM available for Kafka – 6TB storage, RAID 10, basic SATA drives • 10,000 connections into the cluster for both production and consumption 41
  • 42. Performance (LinkedIn 2012 presentation) • 10 billion messages/day • Sustained peak: – 172,000 messages/second written – 950,000 messages/second read • 367 topics • 40 real-time consumers • Many ad hoc consumers • 10k connections/colo • 9.5TB log retained • End-to-end delivery time: 10 seconds (avg) 42
  • 44. Something completely Different • Nathan Marz (twitter, BackType) • Creator of Storm 44
  • 45. Immutable Applications • No updates to data • Either insert or delete • ‘Functional Applications’ • http://manning.com/marz/BD_meap_ch01.pdf 45
  • 46. (thanks to LinkedIn for slide) 46
  • 47. Information • Apache Kafka site: http://kafka.apache.org/ • List of presentations: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers +and+presentations • Kafka wiki: https://cwiki.apache.org/confluence/display/KAFKA/Index • Paper: http://sites.computer.org/debull/A12june/pipeline.pdf • Slides: http://www.slideshare.net/chriscurtin • Me: ccurtin@silverpop.com @ChrisCurtin on twitter 47
  • 48. Silverpop Open Positions • Senior Software Engineer (Java, Oracle, Spring, Hibernate, MongoDB) • Senior Software Engineer – MIS (.NET stack) • Software Engineer • Software Engineer – Integration Services (PHP, MySQL) • Delivery Manager – Engineering • Technical Lead – Engineering • Technical Project Manager – Integration Services • http://www.silverpop.com/marketing-company/careers/open- positions.html 48