Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Getting Started with Kafka on k8s

175 views

Published on

Presented by Rohini Rajaram at the Fedex Cloud-Native Conference in Pittsburgh on July 12th, 2019.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Getting Started with Kafka on k8s

  1. 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Getting Started w/ Kafka On K8s Rohini Rajaram Sr. Platform Architect, Pivotal rrajaram@pivotal.io July 2019
  2. 2. Cover w/ Image Agenda ■ Kafka Fundamentals - Pub/Sub Done Right ■ Kafka On K8s ■ Building Event Driven Systems ■ Demo ○ Provision a Kafka Cluster On PKS
  3. 3. Data Infrastructure - Point To Point
  4. 4. Data Infrastructure - Centralized Data Pipeline
  5. 5. Messaging Systems Why not traditional messaging systems for the centralized pipeline? Transient Vs Durable Messages Consumer Publish - Push vs Pull Based Mechanism Offset Tracking - Replay Messages On Consumer Failures Horizontal Scalability Distributed - Partitioning & Replication
  6. 6. Key Ideas Key Idea 1: Data parallelism leads to scale out Randomly distribute clients across partitions Key Idea 2: Disks are fast when used sequentially Store messages as a write ahead log Key Idea 3: Batching makes best use of the network Batched transfer, compression, no JVM caching (low memory footprint) & Zero Copy
  7. 7. Architecture Overview Scale Out Architecture Producer Producer Producer Producer Consumer Consumer Consumer Consumer Kafka Broker Cluster Topic Partitions
  8. 8. Producer Consumer Broker 0 Broker 1 Broker 2 Broker 3 Broker 4 Broker n Storage Distributed Commit Log Architecture Overview
  9. 9. Message Offset Msg. Length CRC Magic Attr Timestamp Key Len Key Value Len Value 8 bytes 4 bytes 4 bytes 1 byte 1 byte 8 bytes 4 bytes Varying 4 bytes Varying Bit 0-2 0 – No Compression 1 – gzip 2 – Snappy
  10. 10. Topics 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 P0 P1 P2 Writes Broker 1 Broker 2 Broker 3 Node 1 P0 P1 P2 Topic Logs P0 P0P1 P2 P2 P1 Node 2 Node 3
  11. 11. Distributed Commit Log 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 P0 P1 P2 Producer writes Consumer A Consumer B readsreads
  12. 12. Partitions & Segments Kafka | partition-0 | segment0.log segment0.index segment5.log segment5.index Segment3065011416.log Starting offset: 3065011416 offset: 3065011416 position: 0 isvalid: true payloadsize: 2020 magic: 1 compresscodec: NoCompressionCodec crc: 811055132 payload: {"name": ”Smith", msg: "Hello world"} offset: 3065011417 position: 1779 isvalid: true payloadsize: 2244 magic: 1 compresscodec: NoCompressionCodec crc: 151590202 payload: {"name": ”James", msg: ”Hello to all of you!"} Segment3065011416.index Offset (rel. to Base). Position (on the log) 0 0 1 1779 0 1 2 3 4 5 6 7 8 9 writes Active Segment
  13. 13. Why File System & Not Memory? Lean differences with sequential access b/w file system & memory speeds Kafka runs on JVM ● Heavy object overheads for data stored in memory ● Increased GC Time
  14. 14. Zero Copy Page Cache Socket Buffer NIC Buffer Application Context Kernel Context User Space Buffer OS send-file
  15. 15. Brokers Cluster Aware Receives messages from Producers, Assigns Offset & Writes To Disk Fetches Messages for consumers reading partitions & responding with committed messages. One elected as Controller - Admin, assigns partitions to brokers & Monitoring Topic Retention - Time or Size Based Topic A Partition 0 Topic A Partition 1 Topic A Partition 0 Topic A Partition 1 Broker 0 (Controller) Broker 1 Leader LeaderReplica Replica Kafka Cluster Producer Consumer Messages for A/0 Messages for A/1 Messages from A/0 Messages for A/1
  16. 16. Producers Producers accept a ProducerRecord ProducerRecord Key & Values are serialized into byte array by Serializer Partitioner - Chooses partition by key if not specified & adds record to a specific batch for the partition Separate threads handles sending batches to the brokers Three Methods: 1. Fire & Forget 2. Synchronous 3. Asynchronous
  17. 17. Consumers Consumer Groups For Consumption Scaling Topic Partitions distributed among consumers in a group Partitions are rebalanced on consumer additions or crashes (consumer unavailability & loss of consumer cache)
  18. 18. Replication Broker 1 Broker 2 Broker 3 P1 P2 P3 P1 P1 P3 P2P1 P2 Producer Topic A P1 Leader P1 Followers P1 P2 P3 P1 P1 P3 P2P1 P2 Topic B Leader Followers/ISR
  19. 19. Basics Kubernetes for Data
  20. 20. StorageClass ● Dynamic provisioning persistent volumes ● Allows admins to define different class of storage to offer ○ aws-ebs ○ azure-disk ○ gce-pd ○ vsphere-volume ○ portworx-volume StorageClassName (provisioner=...) Pod Persistent Volume Claim Container
  21. 21. Pod-0 StatefulSet ... StatefulSet ● Stable, unique network identifiers ● Stable, persistent storage ● Ordered, graceful deployment and scaling ● Ordered, automated rolling update Pod-N
  22. 22. Custom Controller + Custom Resource Operator StatefulSet Custom Resource ... ...Deployment... ReplicaSet StatefulSet Controller Custom Controller ... ... Deployment Controller ... ReplicaSet Controller
  23. 23. Operator Pattern ● Kubernetes Native ○ Custom Resource + Custom Controller ● Embedded with operational knowledge of both data software and Kubernetes ○ Backup/restore ○ Scale up/down ○ Rebalance data Observe Analyze Act
  24. 24. Basics Demo
  25. 25. Transforming How The World Builds Software © Copyright 2019 Pivotal Software, Inc. All rights Reserved.

×