Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Confluent kafka meetupseattle jan2017

461 views

Published on

David Tucker, Director, Partner Engineering and Alliances, Confluent

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Confluent kafka meetupseattle jan2017

  1. 1. 11Confidential State of the Streaming Platform 2017 What’s new in Apache Kafka and the Confluent Platform David Tucker, Confluent
  2. 2. 44Confidential The shift to streams “By 2020, 70% of organizations will adopt data streaming to enable real-time analytics.”1 1: Gartner: Harness Streaming Data for Real-Time Analytics - Nov 2016 2: Forrester’s 2016 Predictions: Turn Data Into Insight And Action - Nov 2015 “Streaming ingestion and analytics will become a must-have for digital winners.”2
  3. 3. 55Confidential Vision of a Streaming Enterprise Search NewSQL / NoSQL RDBMS Monitoring Document StoreReal-time Analytics Data Warehouse Mobile Apps Legacy Apps Hadoop Streaming Platform
  4. 4. 66Confidential What Can You Do with a Streaming Platform ? • Publish and Subscribe to streams of data • Analogous to traditional messaging systems • Store streams of data • Consumers can look back in time • Process streams of data • Analyze and correlate events in real time
  5. 5. 77Confidential The typical integration architecture Search Security Fraud Detection Application User Tracking Operational Logs Operational Metrics Hadoop Data Warehouse MySQL Cassandra Oracle App Databases Storage Interfaces Monitoring App Databases Storage Interfaces
  6. 6. 88Confidential Challenges abound Search Security Fraud Detection Application User Tracking Operational Logs Operational Metrics Hadoop Data Warehouse Espresso Cassandra Oracle App Databases Storage Interfaces Monitoring App Databases Storage Interfaces Difficult to handle massive amounts of data Diverse data sets, arriving at an increasing rate Many complex data pipelines Require a separate cluster for real-time Difficult & time consuming to change Require mission critical availability into most recent/relevant data
  7. 7. 99Confidential Modernized architecture using Apache Kafka Search Security Fraud Detection Application User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle Hadoop Streams API App Streams API Monitoring App Data Warehouse Apache Kafka
  8. 8. 1010Confidential Challenges addressed by a streaming platform Search Security Fraud Detection Application User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle Hadoop Streams API App Streams API Monitoring App Data Warehouse Apache Kafka Rewind data stream to re- load into any target system Scale to meet demands of diverse streams Pub/sub to data streams Lightweight, easy to modify with minimal disruption Decoupled from upstream apps creating agility Real-time, context specific data in the moment
  9. 9. 1111Confidential Stream Data is The Faster the Better Stream Data can be Big or Fast (Lambda) Stream Data will be Big AND Fast (Kappa) From Big Data to Stream Data Apache Kafka is the Enabling Technology of this Transition Big Data was The More the Better ValueofData Volume of Data ValueofData Age of Data Job 1 Job 2 Streams Table 1 Table 2 DB Speed Table Batch Table DB Streams Hadoop
  10. 10. 1212Confidential Ingest, Process, Load, and Serve Data at a Global Scale Data Systeam A … Data System B … Kafka cluster Applications Other data stores Kafka cluster FIX Raw data / Events Kafka Streams (Data Enrichment and Transformation) Kafka Connect (Connectors to Extract and Load data) Confluent Replicator Confluent Replicator Custom Replication Custom Replication
  11. 11. 1313Confidential Confluent: Enterprise Streaming Platform based on Apache Kafka™ Confluent Platform Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Commercial Confluent Enterprise Apache Kafka™ Data Compatibility Monitoring & Administration Operations Clients Connectors Complete Open Trusted Enterprise Grade
  12. 12. 1515Confidential How do I get streams of data into and out of my apps? Connect Clients REST
  13. 13. 1717Confidential Apache KafkaTM Connect – Streaming Data Capture JDBC IRC / Twitter MySQL Elastic NoSQL HDFS Kafka Connect API Kafka Pipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Part of Apache Kafka project Integrated within Confluent Platform’s Control Center
  14. 14. 1818Confidential Apache KafkaTM Connect – Let the framework do the hard work • Serialization / de-serialization • Schema Registry integration • Fault tolerance, automatic fail-over • Partitioning and scale-out • … and let the developer focus on domain specific details on copying data
  15. 15. 1919Confidential Kafka Connect Architecture: Logical Model Connect has three main components: Connectors, Tasks, and Workers Data flowing into / out of the connectors is a stream; each stream is 1 or more partitions. In practice, a stream partition could be a database table, a log file, etc. There may or may not be an exact alignment of streams to Kafka topics.
  16. 16. 2020Confidential Kafka Connect Architecture: Execution Model Host 1 Host 2 Task 1 Task 2 Task 3 Task 4 Worker 1 Worker 2 Worker 3
  17. 17. 2121Confidential Kafka Connect API Library of Connectors * Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing has been performed. Databases * Datastore/File Store * Analytics * Applications / Other
  18. 18. 2222Confidential Kafka Clients Ruby Proxy http/REST Stdin/stdout Apache Kafka Native Clients Confluent Native Clients Community Supported Clients
  19. 19. 2323Confidential REST Proxy: Talking to Legacy Apps and Across Restricted Networks REST Proxy Legacy Applications Native Kafka Applications Schema Registry REST / HTTP Simplifies administrative actions Simplifies message creation and consumption Provides a RESTful interface to a Kafka cluster
  20. 20. 2424Confidential How do I maintain my data formats and ensure compatibility?
  21. 21. 2525Confidential The Challenge of Data Compatibility at Scale App 1 App 2 App 3 Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Example: Date formats Even within a single application, different formats can be presented Incompatibly formatted message
  22. 22. 2626Confidential Schema Registry Elastic NoSQL HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Define the expected fields for each Kafka topic Automatically handle schema changes (e.g. new fields) Prevent backwards incompatible changes Supports multi-datacenter environments
  23. 23. 2727Confidential How do I build stream processing apps?
  24. 24. 2828Confidential Architecture of Kafka Streams API, a Part of Apache Kafka Kafka Streams API Producer Kafka Cluster Topic TopicTopic Consumer Consumer Key benefits • No additional cluster • Easy to run as a service • Supports large aggregations and joins • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Continuous queries • Continuous transformations • Event-triggered processes
  25. 25. 2929Confidential Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™ Example Use Cases • Microservices • Large-scale continuous queries and transformations • Event-triggered processes • Reactive applications • Customer 360-degree view, fraud detection, location- based marketing, smart electrical grids, fleet management, … Key Benefits of Apache Kafka’s Streams API • Build Apps, Not Clusters: no additional cluster required • Elastic, highly-performant, distributed, fault-tolerant, secure • Equally viable for small, medium, and large-scale use cases • “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud Your App Kafka Streams API
  26. 26. 3030Confidential Architecture Example Before: Complexity for development and operations, heavy footprint 1 2 3 Capture business events in Kafka Must process events with separate, special-purpose clusters Write results back to Kafka Your Processing Job
  27. 27. 3131Confidential Architecture Example With Kafka Streams: App-centric architecture that blends well into your existing infrastructure 1 2 3a Capture business events in Kafka Process events fast, reliably, securely with standard Java applications Write results back to Kafka Your App 3b External apps can directly query the latest results AppApp Kafka Streams API
  28. 28. 3333Confidential How do I manage and monitor my streaming platform at scale?
  29. 29. 3434Confidential Confluent Control Center: End-to-end Monitoring See exactly where your messages are going in your Kafka cluster
  30. 30. 3535Confidential Confluent Control Center: Connector Management
  31. 31. 3636Confidential Confluent Control Center: Alerting Alerts • Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more • Manage alerts for different users and applications from a web UI • Manage alerts for different users and applications from a web UI User authentication • Control access to Confluent Control Center • Integrates with existing enterprise authentication systems
  32. 32. 3737Confidential Data Pipeline Demo Real-time data firehose archived to searchable stores
  33. 33. 3838Confidential Demo Scenario: Multiple Streaming Data Pipelines • IRC feed of Wikipedia updates • IRC Source connector publishes real-time stream of Wikipedia updates to Kafka topic • Kafka Streams application parses records and re-writes to new topic • Elasticsearch Sink connector indexes parsed data • Kibana dashboards visualize Wikipedia updates in real time • Twitter feed augmented with sentiment data • Twitter Source connector configured to publish data to Kafka topic • Kafka Streams application strips extraneous twitter fields and adds sentiment score • Sink connector saves K-Streams output to key-value store (eg Couchbase or DynamoDB) • Key-value queries can track sentiment trends
  34. 34. 3939Confidential Wikipedia-to-Elastic Data Pipeline
  35. 35. 4040Confidential Wikipedia Transformation • Raw input records {"createdat":1485386068652,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc- pmtpa","hostname":"special.user"},"message":"[[List of Iranian Americans]] https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313 * 01:445:4080:1510:F1A4:7C08:B276:FA8B * (+0) /* Media/Journalism */"} {"createdat":1485386069199,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc- pmtpa","hostname":"special.user"},"message":"[[In the Bleak Midwinter]] https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970 * Grover cleveland * (+422) /* Settings */"} • Parsed records {"createdat":1485386068652,"wikipage":"List of Iranian Americans","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/i ndex.php?diff=761978901&oldid=760575313","username":"01:445:4080:1510:F1A4:7C08:B276:FA8B","bytech ange":0,"commitmessage":"/* Media/Journalism */"} {"createdat":1485386069199,"wikipage":"In the Bleak Midwinter","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/in dex.php?diff=761978902&oldid=761960970","username":"Grover cleveland","bytechange":422,"commitmessage":"/* Settings */"}
  36. 36. 4141Confidential Twitter Transformation • Raw input records "CreatedAt": 1479252348000, "Id": 798668350956126200, "Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs #England… https://t.co/G13NUaZj8W", "Source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>", "User": { } 128 separate fields • Filtered records {"sentiment":"Negative","sentimentScore":1,"UserName":"tits","CreatedAt":1485387765000,"Text":"RT @STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo https://t.co/QfCLayqRsH https://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}
  37. 37. 4242Confidential Kafka Connect Demonstration Kafka Connect Apache Kafka Brokers K-Streams app(s) 1 4 3 2 5 5 1 2 3 4
  38. 38. 4444Confidential Thank You

×