Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kinesis vs-kafka-and-kafka-deep-dive

11,898 views

Published on

Compare Amazon Kinesis and Apache Kafka. Kafka technical deep dive.

Published in: Software
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kinesis vs-kafka-and-kafka-deep-dive

  1. 1. Kinesis vs. Kafka – Kafka Deep Dive Yifeng Jiang Solutions Engineer, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  2. 2. 自己紹介 蒋  逸峰  (Yifeng  Jiang) •  Solutions  Engineer,  Hortonworks •  HBase  book  author •  ⽇日本に来て10年年経ちました… •  趣味は⼭山登り •  Twitter:  @uprush
  3. 3. About Hortonworks Customer Momentum •  556 customers (as of August 5, 2015) •  119 customers added in Q2 2015 •  Publicly traded on NASDAQ: HDP Hortonworks Data Platform •  Completely open multi-tenant platform for any app and any data •  Consistent enterprise services for security, operations, and governance Partner for Customer Success •  Leader in open-source community, focused on innovation to meet enterprise needs •  Unrivaled Hadoop support subscriptions Founded in 2011 Original 24 architects, developers, operators of Hadoop from Yahoo! 740+ E M P L O Y E E S 1350+ E C O S Y S T E M PA R T N E R S
  4. 4. Hortonworks Data Plateform (HDP) Deploy on premises and cloud
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka
  6. 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Amazon Kinesis -- Introduction Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams.
  7. 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Kafka -- Introduction Messaging systems Real-time Scalable to handle large data volume Low Latency Fault tolerant Originated at LinkedIn Aimed at solving data movement across systems Scala and Java Open Source (Apache 2.0) Adapted at many companies
  8. 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Future Similar Futures •  Messaging system for large scale real-time data processing •  High performance, highly scalable, low latency •  Fault tolerant Difference •  Full managed cloud service vs. OSS •  Data durability and performance trade off •  Interface •  AWS service integration vs. OSS or single platform (e.g., HDP) integration
  9. 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Data Durability Kinesis •  Synchronously replicates data across three facilities •  High durability for free Kafka •  Replication across servers in the same DC/AZ. Configurable min # in- sync replica and ACKs. •  Asynchronously mirror data across clusters across datacenters / AZs Performance trade off
  10. 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Interface Kinesis •  REST only •  Client library wraps REST API Kafka •  Low level API •  REST API available (wrapping low level API). Impact throughput and latency
  11. 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Processing Kafka •  Custom consumers •  Event monitoring and alerting use case •  Strom •  Fraud detection, Simple aggregation •  Spark Streaming / Storm Trident •  Micro-batch, near real-time •  Camus •  Batch hadoop ingestion Kinesis •  KCL applications on EC2 •  Storm •  Spark streaming •  EMR for batch ingestion, e.g., write to S3
  12. 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Deployment & Operation Kafka •  HDP: almost one-click deploy with Ambari •  Basic monitoring with Ambari •  Expand and rebalance: partition assignment and consumer rebalance •  Zookeeper can also be managed by Ambari Kinesis •  Fully managed, one-click deploy •  CloudWatch monitoring •  Expand and rebalance: resharding a stream •  Easy operation
  13. 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Deep Dive
  14. 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Concepts * ZK is used by Broker, Consumer Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 Producer Consumer
  15. 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka -- Concepts Topics Partitions •  Offset •  Ordered Replication •  Prevents data loss •  Never read or written to •  Does not increase throughput •  Tolerates Replica-1 failures $[ambari-­‐qa@c6401  bin]$  kafka-­‐topics.sh  -­‐-­‐zookeeper  c6401:2181  -­‐-­‐describe  -­‐-­‐topic  page_visits   Topic:page_visits  PartitionCount:4  ReplicationFactor:2  Configs:    Topic:  page_visits  Partition:  0  Leader:  1    Replicas:  0,1      Isr:  1,0    Topic:  page_visits  Partition:  1  Leader:  0    Replicas:  1,0            Isr:  0,1    Topic:  page_visits  Partition:  2  Leader:  1    Replicas:  0,1    Isr:  1,0    Topic:  page_visits  Partition:  3  Leader:  0    Replicas:  1,0    Isr:  0,1  
  16. 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Broker Store messages (logs) on local disk •  Messages are appended to log file •  Log Retention – time and size based Controller •  Cluster management •  Runs on each broker machine •  One leader, others follower Leader Partition •  Broker that is the leader for certain partitions Use ZK for coordination
  17. 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer New Producer API in 0.8.2 •  Kafka-client.jar •  New Java API •  Default Asynchronous mode Create a new message and publish to a Topic and Partition •  Takes topic, value and optional key and partition id
  18. 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer API (0.8.2) – Cont. •  Original messages are partitioned and then split into batches •  Each split batch is sent to leader broker (and then replicated to ISR) •  Each send is acknowledged by either leader broker and/or all ISR p3 p2 p1 p2 p1m5 m4 m3 m2 m1 Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 App Producer Lib partitioner Split batch
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer Read data from Kafka brokers •  JVM APIs supported out of box by project •  Consumers pull data from brokers •  Consumer apps have to keep track of the topic-partition offset read Consumer API Simple API •  Greater control over consumption of topic/partitions •  Consumer apps will be complex as they need to handle things like offset handling. High-level •  Uses Simple API internally •  Consumer apps will be simple to implement as offset tracking is out of box •  But not flexible in terms of what partitions to read.
  20. 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer – Cont. Consumer Groups •  Allow multiple hosts to form a group to access a topic •  Consume hosts join a group by using same group.id •  Guarantees a message is read by only one consumer in a group •  Partitions are assigned to consumers in a group •  A consumer node may get one or more partitions •  But one partition is assigned to only one consumer host •  Order of the message is guaranteed with in a partition •  Max parallelism – determined by topic partitions •  More consumers than partitions – some consumers will be idle P0 Broker-0 P3 Broker-1 P1 P2 C1 C2 Consumer Group - 1 C3 C4 Consumer Group - 2 C5 C6
  21. 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Why Kafka is fast Fast Writes Writes are appends to file system Partitions improve performance and throughput Uses OS buffer cache Lots of memory on the machine helps Fast Reads Memory mapped files File descriptor to socket descriptor efficient transfer Linux sendfile(), JVM transferTo() implementation Why Performance? Disk flushes are delayed Durability is guaranteed via replication When consumers are reading the latest data, it reads from page cache
  22. 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Cluster Mirroring Mirror Maker •  Mirror data across clusters even in different DCs / AZs •  Stand alone tool uses Consumer and Producer API •  Reads from one or more source cluster and writes to a target cluster •  Whitelist/blacklist topic
  23. 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka REST Interface REST Interface •  Wraps Producer and Consumer API Performance Overhead •  Two hops •  Extra REST server to maintain •  Parse JSON payload
  24. 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka -- Terms Amazon Kinesis Apache Kafka Streams Topics Data Records Messages Producers Producers Kinesis Producer Library Producer API Consumers Consumers Kinesis Applications Consumer Applications Kinesis Client Library Consumer – High level API N/A Consumer – Simple API Shards Partitions N/A (built in MD5 hash on partition keys) Custom partitioner Sequence Numbers Offset Application Name Consumer Group ID
  25. 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow More About Apache Kafka: http://hortonworks.com/hadoop/kafka/

×