Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Couchbase and Kafka: scalable data streaming – Couchbase Connect New York 2017

721 views

Published on

Couchbase and Kafka are both technologies that address high throughput, distributed data management challenges. In fact, they are often deployed together, each one solving a particular need. In this session we’ll explore how Couchbase and Kafka complement each other and examine some real-world use case architectures.
Additionally, you’ll hear directly from the folks that built Kafka, Confluent.io. Since being open sourced, Apache Kafka has been widely adopted by organizations ranging from web companies like Uber, Netflix, and LinkedIn to more traditional enterprises like Cerner, Goldman Sachs, and Cisco. These companies use Kafka in a variety of ways: 1) as a pipeline for collecting high-volume log data to load into Hadoop, 2) as a means of collecting operational metrics to feed monitoring/alerting applications, 3) for low-latency messaging use cases, and 4) to power near real-time stream processing. In this talk you will hear how companies are using Apache Kafka, learn how its unique architecture enables it to be used for both real-time processing and as a bus for feeding batch systems like Hadoop, and explore where it fits in the Big Data ecosystem.

Published in: Software
  • Be the first to comment

Couchbase and Kafka: scalable data streaming – Couchbase Connect New York 2017

  1. 1. 11Couchbase Connect 2017 Couchbase and Confluent Scalable Data Streaming for the Modern Enterprise David Tucker, Confluent
  2. 2. ©2017 Couchbase Inc. Couchbase & Big Data • Beyond big data ingestion • v1 – ingest and archive 2
  3. 3. ©2017 Couchbase Inc. Couchbase & Big Data • Beyond big data ingestion • v1 – ingest and archive • v2 – collate and analyze 3
  4. 4. ©2017 Couchbase Inc. Couchbase & Big Data • Beyond big data ingestion • v1 – ingest and archive • v2 – collate and analyze • v3 – stream & remix 4
  5. 5. ©2017 Couchbase Inc. Couchbase & Big Data • Beyond big data ingestion • v1 – ingest and archive • v2 – collate and analyze • v3 – stream & remix • …. engagement 5
  6. 6. ©2017 Couchbase Inc. Couchbase & Big Data • Beyond big data ingestion • v1 – ingest and archive • v2 – collate and analyze • v3 – stream & remix • …. engagement 6
  7. 7. ©2017 Couchbase Inc. Couchbase & Big Data • Critical connectivity • Analysis • Streaming 7 HDFS DBMS Mobile Other Platforms Data Processing Platform
  8. 8. ©2017 Couchbase Inc. Couchbase & Big Data • Critical connectivity • Analysis • Streaming • Distributed • High throughput 8 HDFS DBMS Mobile Other Platforms Stream Data Platform
  9. 9. ©2017 Couchbase Inc. Couchbase & Kafka • Source • Sink • Custom Filter • Apache Kafka • Confluent Platform 9 Kafka ?
  10. 10. 1010Couchbase Connect 2017 Vision of a Streaming Enterprise Search NewSQL / NoSQL RDBMS Monitoring Document StoreReal-time Analytics Data Warehouse Mobile Apps Legacy Apps Hadoop Streaming Platform
  11. 11. 1111Couchbase Connect 2017 What Can You Do with a Streaming Platform ? • Publish and Subscribe to streams of data • Analogous to traditional messaging systems • Store streams of data • Consumers can look back in time • Process streams of data • Analyze and correlate events in real time
  12. 12. 1212Couchbase Connect 2017 Ingest, Process, Load, and Serve Data at a Global Scale Data System A … Data System B … Kafka cluster Applications Other data stores Kafka cluster Raw data / Events Kafka Streams (Data Enrichment and Transformation) Kafka Connect (Connectors to Extract and Load data) Confluent Replicator Confluent Replicator Custom Replication Custom Replication
  13. 13. 1313Couchbase Connect 2017 Confluent Platform: Enterprise Streaming based on Apache Kafka™ Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Apache Kafka™ Data Compatibility Monitoring & Administration Operations Clients Connectors Complete Open Trusted Enterprise Grade
  14. 14. 1414Couchbase Connect 2017 How do I get streams of data into and out of my apps? Connect Clients REST
  15. 15. 1515Couchbase Connect 2017 Kafka Connect – Streaming Data Capture JDBC IRC / Twitter MySQL Elastic NoSQL HDFS Kafka Connect API Kafka Pipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Part of Apache Kafka project Integrated within Confluent Platform’s Control Center
  16. 16. 1616Couchbase Connect 2017 Kafka Connect – Let the framework do the hard work • Serialization / de-serialization • Schema Registry integration • Fault tolerance, automatic fail-over • Partitioning and scale-out • … and let the developer focus on domain specific details on copying data
  17. 17. 1717Couchbase Connect 2017 Kafka Connect API Library of Connectors * Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing have been performed. Databases * Analytics * Applications / Other Datastore/File Store * *
  18. 18. 1818Couchbase Connect 2017 New in Kafka 0.10.2: Single Message Transforms for Kafka Connect Modify events before storing in Kafka: • Mask sensitive information • Add identifiers • Tag events • Store lineage • Remove unnecessary columns Modify events going out of Kafka: • Route high priority events to faster data stores • Direct events to different ElasticSearch indexes • Cast data types to match destination • Remove unnecessary columns
  19. 19. 1919Couchbase Connect 2017 Kafka Clients Ruby Proxy http/REST Stdin/stdout Apache Kafka Native Clients Confluent Native Clients Community Supported Clients
  20. 20. 2020Couchbase Connect 2017 REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Simplifies administrative actions Simplifies message creation and consumption Provides a RESTful interface to a Kafka cluster
  21. 21. 2121Couchbase Connect 2017 How do I build stream processing apps?
  22. 22. 2222Couchbase Connect 2017 Kafka Streams API: A Part of Apache Kafka Kafka Streams API Producer Kafka Cluster Topic TopicTopic Consumer Consumer Key benefits • No additional cluster • Easy to run as a service • Supports large aggregations and joins • Security and permissions fully integrated from Kafka Example Use Cases • Microservices • Continuous queries • Continuous transformations • Event-triggered processes
  23. 23. 2323Couchbase Connect 2017 Architecture Example Before: Complexity for development and operations, heavy footprint 1 2 3 Capture business events in Kafka Must process events with separate, special-purpose clusters Write results back to Kafka Your Processing Job
  24. 24. 2424Couchbase Connect 2017 Architecture Example With Kafka Streams: App-centric architecture that blends well into your existing infrastructure 1 2 3a Capture business events in Kafka Process events fast, reliably, securely with standard Java applications Write results back to Kafka Your App 3b External apps can directly query the latest results AppApp Kafka Streams API
  25. 25. 2525Couchbase Connect 2017 Data Pipeline Demo Real-time data firehose archived to searchable stores
  26. 26. 2626Couchbase Connect 2017 Demo Scenario: Powerful Streaming Data Pipeline • Twitter feed augmented with sentiment data • Twitter Source connector configured to publish data to Kafka topic • Kafka Streams application adds sentiment score • Sink connector saves K-Streams output to Couchbase (with optional filtering) • Key-value queries can track sentiment trends
  27. 27. 2727Couchbase Connect 2017 Twitter Transformation • Raw input records "CreatedAt": 1479252348000, "Id": 798668350956126200, "Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs #England… https://t.co/G13NUaZj8W", "Source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>", "User": { } 128 separate fields • Filtered records {"sentiment":"Negative","sentimentScore":1,"UserName":”titus","CreatedAt":1485387765000,"Text":"RT @STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo https://t.co/QfCLayqRsH https://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}
  28. 28. 2828Couchbase Connect 2017 Kafka Connect Demonstration Kafka Connect Apache Kafka Brokers K-Streams app(s) 1 52 3 4
  29. 29. ©2017 Couchbase Inc. 29 The Couchbase Connect mobile app Take our in-app survey!
  30. 30. ©2017 Couchbase Inc. 30 Share your opinion on Couchbase 1. Go here: http://gtnr.it/2eRxYWn 2. Create a profile 3. Provide feedback (~15 minutes)
  31. 31. 3131Couchbase Connect 2017 Thank You

×