Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Confluent: Streaming operational data with Kafka – Couchbase Connect 2016


Published on

Since being open sourced, Apache Kafka has been widely adopted by organizations ranging from web companies like Uber, Netflix, and LinkedIn to more traditional enterprises like Cerner, Goldman Sachs, and Cisco. These companies use Kafka in a variety of ways: 1) as a pipeline for collecting high-volume log data to load into Hadoop, 2) as a means of collecting operational metrics to feed monitoring/alerting applications, 3) for low-latency messaging use cases, and 4) to power near real-time stream processing. In this talk you will hear how companies are using Apache Kafka, learn how its unique architecture enables it to be used for both real-time processing and as a bus for feeding batch systems like Hadoop, and explore where it fits in the Big Data ecosystem.

Published in: Software

Confluent: Streaming operational data with Kafka – Couchbase Connect 2016

  1. 1. 1Confidential State of the Streaming Platform 2016 What’s new in Apache Kafka and the Confluent Platform David Tucker, Confluent David Ostrovsky, Couchbase
  2. 2. 3Confidential Who are we ? David Tucker Director of Partner Engineering, Confluent Background : • Architect and designer • HP Alliances: 4 CEO’s, 3 enterprise hardware platforms • Saw the Hadoop light; led partner engineering at MapR • Better living through data (bigger, faster, better) • Expertise • Data management solutions • Cloud services and orchestration David Ostrovsky Senior Solutions Architect, Couchbase Background: • Consultant and author • Hadoop and data processing • Wrote a couple of books about Couchbase • Big data nerd • Experise • Databases and administration • Streaming data processing
  3. 3. 4Confidential What does Kafka do?
  4. 4. 5Confidential Kafka is much more than a pub-sub messaging system
  5. 5. 6Confidential Before: Many Ad Hoc Pipelines Search Security Fraud Detection Application User Tracking Operational Logs Operational Metrics Hadoop App Data Warehouse Espresso Cassandra Oracle Databases Storage Interfaces Monitoring App Databases Storage Interfaces
  6. 6. 7Confidential After: Streaming Platform with Kafka ü Distributed ü Fault Tolerant ü Stores Messages Search Security Fraud Detection Application User Tracking Operational Logs Operational MetricsEspresso Couchbase Oracle Hadoop App Monitoring App Data Warehouse Kafka ü Processes Streams Kafka StreamsKafka Streams
  7. 7. 8Confidential Apache Kafka: A distributed streaming platform
  8. 8. 9Confidential From Big Data to Stream Data Stream Data will be Big AND Fast (Kappa) Volume of Data ValueofData Age of Data ValueofData Streams Hadoop DB Speed table Batch table Streams DB Table 1 Table 2 Job 1 Job 2 Big Data was The More the Better Stream Data is The Faster the Better Stream Data can be Big or Fast (Lambda) Apache Kafka is the Enabling Technology of this Transition
  9. 9. 10Confidential Confluent Platform, the Enterprise Streaming Platform Commercial Open source External Auto-Data Balancing
  10. 10. 11Confidential How do I get streams of data into and out of my apps? Connect Clients REST
  11. 11. 12Confidential Apache KafkaTM Connect – Streaming Data Capture • Fault tolerant • Manage hundreds of data sources and sinks • Preserves data schema • Part of Apache Kafka project • Integrated within Confluent Platform’s Control Center Kafka Brokers MySQL Couch base JDBC HDFS Couch base Elastic Kafka Connect ConnectorConnector ConnectorConnector Connector Connector Sources Sinks
  12. 12. 13Confidential Kafka Connect Library of Connectors Databases Datastore / File Store Analytics Applications / Other JDBC* Couchbase Datastax / Cassandra GoldenGate JustOne DynamoDB MongoDB Hbase InfluxDB Kudu RethinkDB HDFS* Apache Ignite FTP Syslog Hazelcast Elasticsearch* Veritca Mixpanel Attunity AWS / S3 Bloomberg Ticker Striim Solr Syncsort Twitter * Denotes Connectors developed at Confluent and distributed with the Confluent Platform. Extensive validation and testing has been performed.
  13. 13. 14Confidential Kafka Clients Ruby Proxy http/REST Stdin/stdout Apache Kafka Native Clients Confluent Native Clients Community Supported Clients
  14. 14. 15Confidential REST Proxy: Enable Any Application to Access Kafka Data REST/HTTP REST Proxy Schema Registry Native Kafka Java Applications Legacy Applications • Provides a RESTful interface to a Kafka cluster • Simplifies message creation and consumption • Simplifies administrative actions
  15. 15. 16Confidential How do I maintain my data formats and ensure compatibility?
  16. 16. 17Confidential The Challenge of Data Compatibility at Scale App 1 • Many sources without a policy causes mayhem in a centralized data pipeline • Ensuring downstream systems can use the data is key to an operational stream pipeline • Example: date formats • Even within a single application, different formats can be presented App 2 App 3
  17. 17. 18Confidential App 2 ! Confluent: Schema Registry App 1 ! • Define the expected fields for each Kafka topic • Automatically handle schema changes (e.g. new fields) Kafka Topic HDFS Couch base Elastic Example Consumers • Prevent backwards incompatible changes • Support multi-datacenter environments Schema Registry Serializer Serializer
  18. 18. 19Confidential How do I build stream processing apps?
  19. 19. 20Confidential Architecture of Kafka Streams, a Part of Apache KafkaTM Key Benefits • Available as high-level DSL and low-level API, delivering maximum flexibility for application design • No additional cluster required • Easy to run as a service • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Continuous queries • Continuous transformations • Event-triggered processes Topic Topic Topic Kafka StreamsTopic Topic Topic Kafka Cluster Producer Kafka Connect Consumer Consumer Kafka Connect
  20. 20. 22Confidential Kafka Streams simplifies your architecture, decouples your teams App App App 1 Capture business events in Kafka 2 Must process events with separate cluster (e.g. Spark) 4 Other apps access latest results by querying these DBs 3 Must share latest results through separate systems (e.g. MySQL) App App App 1 Capture business events in Kafka 2 Process events with standard Java apps that use Kafka Streams 3 Now other apps can directly query the latest results Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities With Kafka Streams: simplified, app-centric architecture, puts app owners in control Kafka Streams Your App Your “Job”
  21. 21. 26Confidential How do I manage and monitor my streaming platform at scale?
  22. 22. 27Confidential Confluent Control Center: End-to-end Monitoring See exactly where your messages are going in your Kafka cluster
  23. 23. 28Confidential Confluent Control Center: Connector Management
  24. 24. 29Confidential Control Center: Multi-Datacenter Management & Replication Manage multi-cluster deployments • Centralized configuration & monitoring • Replicate clusters or selected topics • Replication of topic configuration • Configurable topic re-names The Kafka Advantage • Reliable • Highly available • Scalable • Cloud Ready
  25. 25. 30Confidential Confluent Control Center: Alerting Alerts • Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more • Manage alerts for different users and applications from a web UI • Manage alerts for different users and applications from a web UI User authentication • Control access to Confluent Control Center • Integrates with existing enterprise authentication systems
  26. 26. 34Confidential Demo
  27. 27. 35Confidential Demo Scenario: Streaming Data Pipeline • Twitter feed with sentiment data • Twitter Source connector configured to publish data to Kafka topic • Kafka Streams application augments twitter records with senitment analysis • K-Streams output saved to Couchbase • Couchbase Source Connector configured to pull data from Couchbase bucket back to Kafka topic • 2nd stage Kafka Streams app saves data to another Couchbase bucket and then on to Elasticsearch
  28. 28. 36Confidential Couchbase Connect Demonstration Kafka Connect Apache Kafka Brokers K-Streams app(s) 1 4 3 2 7 6 5 8
  29. 29. 38Confidential Thank You