Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2013 Impetus Technologies - Confidential1Kafka/Camus ProjectPhase IMountain View, CAMarch 2013(photos courtesy of Linked...
© 2013 Impetus Technologies - Confidential2Agenda• Objective• What tool to use?• Kafka & Camus overview• Infrastructure• A...
© 2013 Impetus Technologies - Confidential3Objective• Customer has events (Data, UI) that happenreal-time, that need to be...
© 2013 Impetus Technologies - Confidential4What tool to use?• JMS:• just an API• Not cross language• Painful• Doesn’t scal...
© 2013 Impetus Technologies - Confidential5Kafka overview• Distributed Scalable Pub/Sub system for bigdata• Producer -> Br...
© 2013 Impetus Technologies - Confidential6Kafka overview• More overview pictures:
© 2013 Impetus Technologies - Confidential7Camus overview• Pipeline out of Kafka to HDFS• Automatic discovery of topics an...
© 2013 Impetus Technologies - Confidential8Infrastructure• Kafka 0.7.2• 3 nodes• Benchmark tool to issue message size, #of...
© 2013 Impetus Technologies - Confidential9Infrastructure• 8 Amazon EC2 large instances• Dual core 2.0 Ghz• 1 7200 rpm SAT...
© 2013 Impetus Technologies - Confidential10CustomerarchitectureGamingShoppingInvitefriendsConsumetopics viaCamusevery hou...
© 2013 Impetus Technologies - Confidential11Performancesummary• Producer:• Avg 20,000 messages / sec• 3.81 MB per sec• Con...
© 2013 Impetus Technologies - Confidential12Performancebenchmarkdata size input Data typeStorage size on HDFS(in bytes)Hiv...
© 2013 Impetus Technologies - Confidential13
© 2013 Impetus Technologies - Confidential14Kafka Speed PerformancebenchmarkKafka 500000 records1 Millionrecords7 Millionr...
© 2013 Impetus Technologies - Confidential15Camus SpeedPerformance benchmarkCamus 500000 records1 Millionrecords7 Millionr...
© 2013 Impetus Technologies - Confidential16Count Speed PerformanceCount 500000 records1 Millionrecords7 Millionrecords10 ...
© 2013 Impetus Technologies - Confidential17Max Speed Performance050100150200250500000 records 1 Million records 7 Million...
© 2013 Impetus Technologies - Confidential18Q&AThank You
Upcoming SlideShare
Loading in …5
×

Architecture of a Kafka camus infrastructure

11,782 views

Published on

Presentation about a project done at a customer utilizing Kafka, Camus, and Hive.

Published in: Technology
  • Be the first to comment

Architecture of a Kafka camus infrastructure

  1. 1. © 2013 Impetus Technologies - Confidential1Kafka/Camus ProjectPhase IMountain View, CAMarch 2013(photos courtesy of LinkedIn)
  2. 2. © 2013 Impetus Technologies - Confidential2Agenda• Objective• What tool to use?• Kafka & Camus overview• Infrastructure• Architecture• Performance benchmarks
  3. 3. © 2013 Impetus Technologies - Confidential3Objective• Customer has events (Data, UI) that happenreal-time, that need to be analyzed• Immediate need for batch-oriented mechanism• Events need to by ETL’ed and analyzed inHadoop• Future need for more real-time streamanalysis• Potential bursts of streaming data
  4. 4. © 2013 Impetus Technologies - Confidential4What tool to use?• JMS:• just an API• Not cross language• Painful• Doesn’t scale• Active MQ• Didn’t work for Linkedin:• http://sites.computer.org/debull/A12june/pipeline.pdf• Apache Flume
  5. 5. © 2013 Impetus Technologies - Confidential5Kafka overview• Distributed Scalable Pub/Sub system for bigdata• Producer -> Broker -> Consumer of messagetopics• Can have multiple clients consuming atdifferent velocities(synchronous/asynchronous)• Notion of consumer group to parallelizeconsumption of messages• Persists messages so ability to rewind
  6. 6. © 2013 Impetus Technologies - Confidential6Kafka overview• More overview pictures:
  7. 7. © 2013 Impetus Technologies - Confidential7Camus overview• Pipeline out of Kafka to HDFS• Automatic discovery of topics and partitions• Finds latest offsets from Kafka nodes• Uses Avro by default; option to use your ownDecoder• Allocates topic pulls among a set # of Hadoopjob tasks• Move data files to HDFS directories accordingto timestamp• Remembers last offset / topic
  8. 8. © 2013 Impetus Technologies - Confidential8Infrastructure• Kafka 0.7.2• 3 nodes• Benchmark tool to issue message size, #of threads, # of messages, topic name,data encoding• CDH 4.2• 1 NN, 1 SNN, 3 slaves for Hadoop• Camus• JSON or Avro decoder• Zookeeper• Hive
  9. 9. © 2013 Impetus Technologies - Confidential9Infrastructure• 8 Amazon EC2 large instances• Dual core 2.0 Ghz• 1 7200 rpm SATA drive• 8 Gigs memory• 200 bytes message• 1 Producer – 1 consumer
  10. 10. © 2013 Impetus Technologies - Confidential10CustomerarchitectureGamingShoppingInvitefriendsConsumetopics viaCamusevery hourKafka topic:Data events(i.e. Userprofileregistrations)Kafka topic:UI events (i.e.gameinteraction)Use Hive toanalyze the data
  11. 11. © 2013 Impetus Technologies - Confidential11Performancesummary• Producer:• Avg 20,000 messages / sec• 3.81 MB per sec• Consumer:• 16,600 messages/ sec• 3.17 MB per sec -> 190 Gig/hr• Customer Goal: “want to scale to 5000 eventsper second at peak.”
  12. 12. © 2013 Impetus Technologies - Confidential12Performancebenchmarkdata size input Data typeStorage size on HDFS(in bytes)Hive Count(in sec)Hive max(in sec) Camus run time Kafka500000 records JSON text data 103779151 38.3 5946 seconds 34.2JSON Serde 103779151 46.3 48.246 seconds 34.2Avro data 60962022 25.2 29.354 seconds 15.91 Million records JSON text data -1M 416556931 27.582 50.8891 minute 40.56JSON Serde -1M 416556931 39.428 32.305 40.56Avro data 1M 122041553 35.806 26.3281 minute 22.367 Million records JSON text data - 7M 1456636071 57.895 111.5983 minutes 50 seconds 388JSON Serde - 7M 1456636071 83.225 83.7763 minutes 50 seconds 388Avro data - 7M 866962131 60.63 62.8964 minutes 50 18110 Million records JSON text data - 10M 1919381181 78.337 144.6675 minutes 1 seconds 558JSON Serde - 10M 1919381181 103.4 1105 minutes 1 seconds 558Avro data - 10M 1239446765 87.042 90.9587 minutes 23 seconds 23015 Million records JSON text data - 15M 3157886975 107.325 201.1256 minutes 24 seconds 851JSON Serde - 15M 3157886975 141.345 153.365 851Avro data - 15M 1865267728 96.9 98.98 minutes 26 seconds 37720 Million records JSON text data - 20M 1159JSON Serde - 20M 1159Avro data - 20M 2476833359 133.606 153.46411 minutes 2 seconds 234
  13. 13. © 2013 Impetus Technologies - Confidential13
  14. 14. © 2013 Impetus Technologies - Confidential14Kafka Speed PerformancebenchmarkKafka 500000 records1 Millionrecords7 Millionrecords10 Millionrecords15 Millionrecords20 MillionrecordsJSON text data 34.2 40.56 388 558 851 1159JSON Serde 34.2 40.56 388 558 851 1159Avro data 15.9 22.36 181 230 377 53434.2 40.56388558851115934.2 40.56388558851115915.9 22.36181230377534500000 records 1 Million records 7 Million records 10 Million records 15 Million records 20 Million recordsKafka comparisonJSON text data JSON Serde Avro data
  15. 15. © 2013 Impetus Technologies - Confidential15Camus SpeedPerformance benchmarkCamus 500000 records1 Millionrecords7 Millionrecords10 Millionrecords15 Millionrecords20 MillionrecordsJSON text data 46 60 230 301 384JSON Serde 46 60 230 301 384Avro data 54 85 290 443 506 6620100200300400500600700500000 records 1 Million records 7 Million records 10 Million records 15 Million records 20 Million recordsCamus comparisonJSON text data JSON Serde Avro data
  16. 16. © 2013 Impetus Technologies - Confidential16Count Speed PerformanceCount 500000 records1 Millionrecords7 Millionrecords10 Millionrecords15 Millionrecords20MillionrecordsJSON text data 38.3 27.58 57.89 78.337 107.325JSON Serde 46.3 39.42 83.2 103.4 141.345Avro data 25.2 35.8 60.6 87.042 96.9 133.606020406080100120140160500000 records 1 Million records 7 Million records 10 Million records 15 Million records 20 Million recordsSelect Count(*) comparisonJSON text data JSON Serde Avro data
  17. 17. © 2013 Impetus Technologies - Confidential17Max Speed Performance050100150200250500000 records 1 Million records 7 Million records 10 Million records 15 Million records 20 Million recordsMax(field) comparisonJSON text data JSON Serde Avro dataMax 500000 records1 Millionrecords 7 Million records10 Millionrecords15 Millionrecords20 MillionrecordsJSON text data 59 50.889 111.598 144.667 201.125JSON Serde 48.2 32.305 83.776 110 153.365Avro data 29.3 26.328 62.896 90.958 98.9 153.464
  18. 18. © 2013 Impetus Technologies - Confidential18Q&AThank You

×