Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AIA Tech Talk #1: Dive into Apache Kafka

269 views

Published on

Watch the webcast here: https://videos.confluent.io/watch/SeL3Ffs1ozDcZCRwbrbM9G?

Speaker: Kenneth Cheung

Published in: Technology
  • Be the first to comment

  • Be the first to like this

AIA Tech Talk #1: Dive into Apache Kafka

  1. 1. Tech Talk #1 Dive into Apache Kafka Kenneth Cheung Sr. Solutions Engineer, Greater China Kenneth@confluent.io
  2. 2. Schedule Tech Talks Date/Time TT#1 Dive into Apache Kafka® 11th June 2020, Thursday TT#2 Introduction to Streaming Data and Stream Processing with Apache Kafka 24th June 2020, Wednesday TT#3 Journey to Event Driven Architecture 2nd July 2020, Thursday TT#4 KSQL: Streaming SQL for Apache Kafka 9th July 2020, Thursday TT#5 Capturing Continuous Data Streams with Kafka Connect 23rd July 2020, Thursday TT#6 Avoiding Pitfalls with Large-Scale Kafka Deployments 6th August 2020, Thursday TT#7 The Rise of Real Time – Designed for your Executive Leadership Team 20th August 2020, Thursday
  3. 3. Disclaimer… • Some of you may know what Kafka is or have used it already... • If that’s the case, sit back and take a refresher on Kafka and learn about Confluent
  4. 4. Business Digitization Trends are Revolutionizing your Data Flow
  5. 5. Legacy Data Infrastructure Solutions Have Architectural Flaws App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB These solutions can be ● Batch-oriented, instead of event-oriented in real time ● Complex to scale at high throughput ● Connected point-to-point, instead of publish / subscribe ● Lacking data persistence and retention ● Incapable of in-flight message processing App App
  6. 6. Modern Architectures are Adapting to New Data Requirements NoSQL DBs Big Data Analytics But how do we revolutionize data flow in a world of exploding, distributed and ever changing data? App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB App App
  7. 7. The Solution is a Streaming Platform for Real-Time Data Processing A Streaming Platform provides a single source of truth about your data to everyone in your organization NoSQL DBs Big Data Analytics App App DWH Transactional Databases Analytics Databases Data Flow DB DB App AppApp App Streaming Platform
  8. 8. Apache Kafka® : Open Source Streaming Platform Battle-Tested at Scale More than 1 petabyte of data in Kafka Over 4.5 trillion messages per day 60,000+ data streams Source of all data warehouse & Hadoop data Over 300 billion user-related events per day The birthplace of Apache Kafka
  9. 9. “Architectural Patterns with Apache Kafka
  10. 10. Analytics - Database Offload RDBMS CDC Data Lakes
  11. 11. Stream Processing with Apache Kafka and ksqlDB payment event customer customer payments Stream Processing RDBMS CDC
  12. 12. Transform Once, Use Many customer Stream Processing RDBMS CDC customer payments payment event
  13. 13. Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App <x>
  14. 14. Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y>CDC
  15. 15. Writers Kafka cluster Readers
  16. 16. KAFKA A MODERN, DISTRIBUTED PLATFORM FOR DATA STREAMS
  17. 17. Scalability of a Filesystem • Hundreds of MB/s throughput • Many TB per server • Commodity hardware
  18. 18. Guarantees of a Database • Strict ordering • Persistence
  19. 19. Rewind & Replay Reset to any point in the shared narrative
  20. 20. Distributed by Design • Replication • Fault Tolerance • Partitioning • Elastic Scaling
  21. 21. Kafka Topics my-topic my-topic-partition-0 my-topic-partition-1 my-topic-partition-2 broker-1 broker-2 broker-3
  22. 22. Kafka Topics • Topics: Streams of “related” Messages in Kafka • Is a Logical Representation • Categorizes Messages into Groups • Developers define Topics • Producer Topic: N to N Relation • Unlimited Number of Topics
  23. 23. Creating a Topic $ kafka-topics --zookeeper zk:2181 --create --topic my-topic --replication-factor 3 --partitions 3 Or use the new AdminClient API!
  24. 24. Producing to Kafka Time
  25. 25. Producing to Kafka Time C CC
  26. 26. Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  27. 27. Partition Leadership and Replication – Node Failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  28. 28. Data Elements
  29. 29. Producing to Kafka
  30. 30. Development: A Basic Producer in Java
  31. 31. Clients – Producer Design ProducerProducer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes [Headers] [Key]
  32. 32. The Serializer Kafka doesn’t care about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Protobuf
  33. 33. The Serializer private Properties kafkaProps = new Properties(); kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”); kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”); kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); producer = new KafkaProducer<String, SpecificRecord>(kafkaProps);
  34. 34. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] [Key] Value Record keys determine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner
  35. 35. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] AAAA Value Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner
  36. 36. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] BBBB Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  37. 37. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] CCCC Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  38. 38. Record Keys and why they’re important - Ordering Producer Record Topic [Partition] DDDD Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  39. 39. Record Keys and why they’re important – Key Cardinality Consumers Key cardinality affects the amount of work done by consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ the number of elements in a set or other grouping, as a property of that grouping.
  40. 40. Consuming from Kafka
  41. 41. A Basic Java Consumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { -- Do Some Work -- } } } finally { consumer.close(); } }
  42. 42. Consuming From Kafka – Single Consumer C One consumer will consume from all partitions, maintaining partition offsets
  43. 43. Consuming From Kafka – Grouped Consumers CC C1 CC consumers are separate, operating independently C2
  44. 44. Consuming From Kafka – Grouped Consumers C C C C Consumers in a consumer group share the workload
  45. 45. Consuming From Kafka – Grouped Consumers 0 1 2 3 They organize themselves by ID
  46. 46. Consuming From Kafka – Grouped Consumers 0 1 2 3 Failures will occur
  47. 47. Consuming From Kafka – Grouped Consumers 0, 3 1 2 3 Another consumer in the group picks up for the failed consumer. This is a rebalance.
  48. 48. Use a Good Kafka Client! Clients ● Java/Scala - default clients, comes with Kafka ● C/C++ - https://github.com/edenhill/librdkafka ● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet ● Python - https://github.com/confluentinc/confluent-kafka-python ● Golang - https://github.com/confluentinc/confluent-kafka-go ● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!) New Kafka features will only be available to modern, updated clients!
  49. 49. Without Confluent and Kafka LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD Data architecture is rigid, complicated, and expensive - making it too hard and cost-prohibitive to get mission-critical apps to market quickly
  50. 50. Confluent & Kafka reimagine this as the central nervous system of your business Hadoop ... Device Logs ... App ...MicroserviceMainframes Data Warehouse Splunk ... Data Stores Logs 3rd Party Apps Custom Apps / Microservices Real-time Inventory Real-time Fraud Detection Real-time Customer 360 Machine Learning Models Real-time Data Transformation ... Contextual Event-Driven Applications Universal Event Pipeline
  51. 51. Apache Kafka is one of the most popular open source projects in the world 52 Confluent are the Kafka Experts Founded by the creators of Apache Kafka, Confluent continues to be the major contributor. Confluent invests in Open Source 2020 re-architecture removes the scalability-limiting use of Zookeeper in Apache Kafka
  52. 52. Event Streaming Vision Modernize apps with next-gen infra Unlock the value of event-driven apps across cloud, on-prem, and hybrid environments Future-proof event streaming Partner with the foremost Kafka innovators and continually build state-of-the-art applications
  53. 53. Future-proof event streaming Kafka re-engineered as a fully-managed, cloud-native service by its original creators and major contributors of Kafka Global Automated disaster recovery Global applications with geo-awareness Infinite Efficient and infinite data with tiered storage Unlimited horizontal scalability for clusters Elastic Easy multi-cloud orchestration Persistent bridge to cloud from on-prem
  54. 54. Make your applications more valuable with real time insights enabled by next-gen architecture DATA INTEGRATION Database changes Log events IoT events Web events Connected car Fraud detection Customer 360 Personalized promotions Apps driven by real time data Quality assurance SIEM/SOC Inventory management Proactive patient care Sentiment analysis Capital management Modernize your apps
  55. 55. Build a bridge to the cloud for your data Ensure availability and connectivity regardless of where your data lives 56 Private Cloud Deploy on premises with Confluent Platform Public/Multi-Cloud Leverage a fully managed service with Confluent Cloud Hybrid Cloud Build a persistent bridge from datacenter to cloud
  56. 56. Confluent Platform Dynamic Performance & Elasticity Auto Data Balancer | Tiered Storage Flexible DevOps Automation Operator | Ansible GUI-driven Mgmt & Monitoring Control Center Efficient Operations at Scale Freedom of Choice Committer-driven Expertise Event Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Global Resilience Multi-region Clusters | Replicator Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | Secrets | Audit Logs ARCHITECTOPERATORDEVELOPER Open Source | Community licensed Unrestricted Developer Productivity Production-stage Prerequisites Fully Managed Cloud ServiceSelf-managed Software Training Partners Enterprise Support Professional Services Apache Kafka
  57. 57. https://www.confluent.io/download/
  58. 58. Project Metamorphosis Unveiling the next-gen event streaming platform Listen to replay and Sign up for updates cnfl.io/pm Jay Kreps Co-founder and CEO Confluent
  59. 59. Download your Apache Kafka and Stream Processing O'Reilly Book Bundle Download at: https://www.confluent.io/apache-kafka-stream-processing-book-
  60. 60. Kenneth Cheung Sr. Solutions Engineer, Greater China Kenneth@confluent.io

×