Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
11
Streaming Data and
Stream Processing with
Apache Kafka™
David Tucker, Director of Partner Engineering,
Confluent
Sid Go...
33
The opportunity: The shift to streams & digital transformation
By 2020, 70% of
organizations will adopt
data streaming ...
44
More Facts & Figures
90% of CEO’s believe the digital economy will have a major
impact on their industry.
- MITSloan / ...
55
Vision of a Streaming Enterprise
Search
NewSQL / NoSQL
RDBMS Monitoring
Document StoreReal-time Analytics Data Warehous...
66
What Can You Do with a Streaming Platform ?
• Publish and Subscribe to streams of data
• Analogous to traditional messa...
77
The typical architecture
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics...
88
Challenges abound
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop...
99
Modernized architecture using Apache Kafka
Search Security
Fraud Detection Application
Streams API
App
Streams API
Moni...
1010
Search Security
Fraud Detection Application
Streams API
App
Streams API
Monitoring
App Data
Warehouse
User Tracking O...
1111
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stream Data will be
Big AND Fast
(Kappa)...
1212
Kafka Adoption in Large Enterprises Growing Rapidly
Travel Global Banks Insurance Telecom
6 of top 10 7 of top 10 8 o...
1313
Industries & Use Cases
Universal Use Cases: IoT, Data Pipelines, Microservices, Monitoring
Industry Use Cases
Financi...
1515
Kafka Adoption Across Key Companies
Financial Services Enterprise Tech Consumer Tech
Entertainment & Media Telecom Re...
1616
Confluent Enterprise
The only enterprise streaming platform
based entirely on Apache KafkaTM
1717
Confluent Platform: Enterprise Streaming based on Apache Kafka™
Database
Changes
Log Events loT Data
Web
Events
…
CRM...
1818
Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise
Apache Kafka
High throughput, low latency, hi...
1919
How do I get streams of data
into and out of my apps?
Connect Clients REST
2020
Apache KafkaTM Connect – Streaming Data Capture
JDBC
IRC / Twitter
CDC
Elastic
NoSQL
HDFS
KafkaConnectAPI
KafkaPipeli...
2121
Kafka Connect API, Part of the Apache KafkaTM Project
Connect any source to any target system with Apache Kafka
Integ...
2222
Kafka Connect API Library of Connectors
* Denotes Connectors developed at Confluent and distributed by Confluent. Ext...
2323
New in Kafka 0.10.2: Single Message Transforms for Kafka Connect
Modify events before storing
in Kafka:
• Mask sensit...
2424
Kafka Clients
Ruby Proxy http/REST
Stdin/stdout
Apache Kafka Native Clients
Confluent Native Clients
Community Suppor...
2525
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Non-Java Applications
Native Kafka J...
2626
How do I maintain my data
formats and ensure compatibility?
2727
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centr...
2828
Schema Registry
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Regi...
2929
How do I build stream
processing apps?
3030
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases
• Microservices
• Large-scale c...
3131
Architecture Example
Before: Complexity for development and operations, heavy footprint
1 2 3
Capture business
events...
3232
Architecture Example
WithKafkaStreams:App-centric architecture that blends well into your existing infrastructure
1 2...
3333
New in Kafka 0.10.2 : Session windows in Kafka Streams API
Groupevents in astream basedon
session windows
• Sessions ...
3535
How do I synchronize and migrate data
to and from the cloud?
3636
Before:Hybrid Cloud Environments Today
DC1
DB2
DB1
DWH
App2
App3
App4
KV2KV3
DB3
App2-v2
App5
App7
App1-v2
AWS
App8
D...
3737
DC1
After: Cloud Synchronization and Migrations with Confluent Platform
DB2
DB1
KV
DWH
App2
App4
KV2KV3
App2-v2
App5 ...
3838
How do I manage and monitor
my streaming platform at scale?
3939
What Does End-to-End Mean?
“Clocks and Cables” Monitoring
How fast is the throughput?
How many CPU cycles are we
usin...
4040
Confluent Control Center: Cluster Health & Administration
Cluster health dashboard
• Monitor the health of your Kafka...
4141
Confluent Control Center: End-to-end Monitoring
See exactly where your messages are going in your Kafka cluster
4242
Confluent Control Center: Connector Management
4343
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connect...
4444
Auto Data Balancing
Dynamically move
partitions to optimize
resource utilization and
reliability
• Easily add and rem...
4545
Multi-Datacenter Replication
An easy reliable way to run Kafka across datacenters
Improve reliability
• Easily config...
4646
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thorou...
4747
Thank You
Upcoming SlideShare
Loading in …5
×

Streaming Data and Stream Processing with Apache Kafka

2,149 views

Published on

stream data and stream processing

Published in: Data & Analytics
  • Be the first to comment

Streaming Data and Stream Processing with Apache Kafka

  1. 1. 11 Streaming Data and Stream Processing with Apache Kafka™ David Tucker, Director of Partner Engineering, Confluent Sid Goel, Partner and Solution Architect, KPI Partners
  2. 2. 33 The opportunity: The shift to streams & digital transformation By 2020, 70% of organizations will adopt data streaming to enable real-time analytics. - Gartner | Nov 2016 Streaming ingestion and analytics will become a must-have for digital winners. - Forrester | Nov. 2015
  3. 3. 44 More Facts & Figures 90% of CEO’s believe the digital economy will have a major impact on their industry. - MITSloan / Capgemini (2013) #1 most important capability executives hope to improve via digital transformation: Ability to support real-time transactions. - The Economist (2015) Digital disruptors will displace 40% of incumbent companies over the next 5 years. - Center forDigital Transformation (2015)
  4. 4. 55 Vision of a Streaming Enterprise Search NewSQL / NoSQL RDBMS Monitoring Document StoreReal-time Analytics Data Warehouse Mobile Apps Legacy Apps Hadoop Streaming Platform
  5. 5. 66 What Can You Do with a Streaming Platform ? • Publish and Subscribe to streams of data • Analogous to traditional messaging systems • Store streams of data • Consumers can look back in time • Process streams of data • Analyze and correlate events in real time
  6. 6. 77 The typical architecture Search Security Fraud Detection Application User Tracking Operational Logs Operational Metrics Data Warehouse App Databases Storage Interfaces Monitoring App Databases Storage Interfaces
  7. 7. 88 Challenges abound Search Security Fraud Detection Application User Tracking Operational Logs Operational Metrics Hadoop Data Warehouse App Databases Storage Interfaces Monitoring App Databases Storage Interfaces Diverse data sets, arriving at an increasing rate Many complex data pipelines Require a separate cluster for real-time Difficult & time consuming to change Require mission critical availability into most recent/relevant data Difficult to handle massive amounts of data
  8. 8. 99 Modernized architecture using Apache Kafka Search Security Fraud Detection Application Streams API App Streams API Monitoring App Data Warehouse User Tracking Operational Logs Operational Metrics
  9. 9. 1010 Search Security Fraud Detection Application Streams API App Streams API Monitoring App Data Warehouse User Tracking Operational Logs Operational Metrics Modernized architecture using Apache Kafka Pub/sub to data streams, alleviate back pressure Lightweight, easy to modify with minimal disruption Decoupled from upstream apps creating agility Real-time, context specific data in the moment Handle any volume of data with ease Scale to meet demands of diverse streams
  10. 10. 1111 Stream Data is The Faster the Better Stream Data can be Big or Fast (Lambda) Stream Data will be Big AND Fast (Kappa) Our vision: from big data to stream data Apache Kafka is the Enabling Technology of this Transition Big Data was The More the Better ValueofData Volume of Data ValueofData Age of Data Job 1 Job 2 Streams Table 1 Table 2 DB Speed Table Batch Table DB Streams Hadoop
  11. 11. 1212 Kafka Adoption in Large Enterprises Growing Rapidly Travel Global Banks Insurance Telecom 6 of top 10 7 of top 10 8 of top 10 9 of top 10 Over 35% of the Fortune 500 are using Apache Kafka™
  12. 12. 1313 Industries & Use Cases Universal Use Cases: IoT, Data Pipelines, Microservices, Monitoring Industry Use Cases Financial Services Fraud Detection, Trade Data Capture, Customer 360 Retail Inventory Management, Product Catalog, A/B Testing, Proactive Alerts Automotive Connected Car, Manufacturing Data Processing Enterprise Tech Analytics, Security Operations, Collect Performance Data Telecom Personalized Ad Placement, Customer 360, Network Integrity Systems Entertainment/Media Log Delivery, Increase Ad Delivery Operations, Cross-Device Insights Travel/ Leisure Visitor Segmentation, Fraud Detection Consumer Tech Streaming Video, Personalized Customer Experience, Device Telemetry and Analytics Healthcare Patient Monitoring, Pharma Substance control, Patient Relapse, Lab Results Alerts
  13. 13. 1515 Kafka Adoption Across Key Companies Financial Services Enterprise Tech Consumer Tech Entertainment & Media Telecom Retail Travel & Leisure
  14. 14. 1616 Confluent Enterprise The only enterprise streaming platform based entirely on Apache KafkaTM
  15. 15. 1717 Confluent Platform: Enterprise Streaming based on Apache Kafka™ Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Apache Kafka™ Data Compatibility Monitoring & Administration Operations Clients Connectors Complete Open Trusted Enterprise Grade
  16. 16. 1818 Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise Apache Kafka High throughput, low latency, high availability, secure distributed streaming platform Kafka Connect API Advanced API for connecting external sources/destinations into Kafka Kafka Streams API Simple library that enables streaming application development within the Kafka framework Additional Clients Supports non-Java clients; C, C++, Python, etc. REST Proxy Provides universal access to Kafka from any network connected device via HTTP Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable Pre-Built Connectors HDFS, JDBC, elasticsearch and other connectors fully certified and fully supported by Confluent Confluent Control Center Enables easy connector management and stream monitoring Auto Data Balancing Rebalancing data across cluster to remove bottlenecks Replication Multi-datacenter replication simplifies and automates MDC Kafka clusters Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365 Confluent Completes Kafka
  17. 17. 1919 How do I get streams of data into and out of my apps? Connect Clients REST
  18. 18. 2020 Apache KafkaTM Connect – Streaming Data Capture JDBC IRC / Twitter CDC Elastic NoSQL HDFS KafkaConnectAPI KafkaPipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Part of Apache Kafkaproject Integrated within Confluent Platform’s Control Center
  19. 19. 2121 Kafka Connect API, Part of the Apache KafkaTM Project Connect any source to any target system with Apache Kafka Integrated • 100% compatible with Kafka v0.9 and higher • Integrated with Confluent’s Schema Registry • Easy to manage with Confluent Control Center Flexible • 40+ open source connectors available • Easy to develop additional connectors • Flexible support for data types and formats Compatible • Maintains critical metadata • Preserves schema information • Supports schema evolution Reliable • Automated failover • At-least-once guaranteed • Balances workload between nodes
  20. 20. 2222 Kafka Connect API Library of Connectors * Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing have been performed. Databases * Analytics * Applications / Other Datastore/File Store * *
  21. 21. 2323 New in Kafka 0.10.2: Single Message Transforms for Kafka Connect Modify events before storing in Kafka: • Mask sensitive information • Add identifiers • Tag events • Store lineage • Remove unnecessary columns Modify events going out of Kafka: • Route high priority events to faster data stores • Direct events to different ElasticSearch indexes • Cast data types to match destination • Remove unnecessary columns
  22. 22. 2424 Kafka Clients Ruby Proxy http/REST Stdin/stdout Apache Kafka Native Clients Confluent Native Clients Community Supported Clients
  23. 23. 2525 REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Simplifies administrative actions Simplifies message creation and consumption Provides a RESTful interface to a Kafka cluster
  24. 24. 2626 How do I maintain my data formats and ensure compatibility?
  25. 25. 2727 The Challenge of Data Compatibility at Scale App 1 App 2 App 3 Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Example: Date formats Even within a single application, different formats can be presented Incompatibly formatted message
  26. 26. 2828 Schema Registry Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Define the expected fields for each Kafka topic Automatically handle schema changes (e.g. new fields) Prevent backwards incompatible changes Supports multi-datacenter environments
  27. 27. 2929 How do I build stream processing apps?
  28. 28. 3030 Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™ Example Use Cases • Microservices • Large-scale continuous queries and transformations • Event-triggered processes • Reactive applications • Customer 360-degree view, fraud detection, location- based marketing, smart electrical grids, fleet management, … KeyBenefits of ApacheKafka’s Streams API • Build Apps, Not Clusters: no additional cluster required • Elastic, highly-performant, distributed, fault-tolerant, secure • Equally viable for small, medium, and large-scale use cases • “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud Your App Kafka Streams API
  29. 29. 3131 Architecture Example Before: Complexity for development and operations, heavy footprint 1 2 3 Capture business events in Kafka Must process events with separate, special-purpose clusters Write results back to Kafka Your Processing Job
  30. 30. 3232 Architecture Example WithKafkaStreams:App-centric architecture that blends well into your existing infrastructure 1 2 3a Capture business events in Kafka Process events fast, reliably, securely with standard Java applications Write results back to Kafka 3b Query latest results directly from external apps AppApp Your App Kafka Streams API
  31. 31. 3333 New in Kafka 0.10.2 : Session windows in Kafka Streams API Groupevents in astream basedon session windows • Sessions are periods of activity terminated by a gap of inactivity • Purely time-based windows are incorrect for session- based data analysis Input data Colors represent different users event Results User sessions, grouped by event-time session windows processing-time event-time session windowing Alice Bob Dave
  32. 32. 3535 How do I synchronize and migrate data to and from the cloud?
  33. 33. 3636 Before:Hybrid Cloud Environments Today DC1 DB2 DB1 DWH App2 App3 App4 KV2KV3 DB3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Challenges • Each team/department must execute their own cloud migration • May be moving the same data multiple times • Each box represented here require development, testing, deployment, monitoring and maintenance KV
  34. 34. 3737 DC1 After: Cloud Synchronization and Migrations with Confluent Platform DB2 DB1 KV DWH App2 App4 KV2KV3 App2-v2 App5 App7 App1-v2 AWS App8 DWH App1 Kafka Kafka App3 Benefits • Continuous low-latency synchronization • Centralized manageability and monitoring – Track at event level data produced in all data centers • Security and governance – Track and control where data comes from and who is accessing it • Cost Savings – Move Data Once DB3
  35. 35. 3838 How do I manage and monitor my streaming platform at scale?
  36. 36. 3939 What Does End-to-End Mean? “Clocks and Cables” Monitoring How fast is the throughput? How many CPU cycles are we using? End-to-End Monitoring Did you leave? Did you arrive?
  37. 37. 4040 Confluent Control Center: Cluster Health & Administration Cluster health dashboard • Monitor the health of your Kafka clusters and get alerts if any problems occur • Measure system load, performance, and operations • View aggregate statistics or drill down by broker or topic Cluster administration • Monitor topic configurations
  38. 38. 4141 Confluent Control Center: End-to-end Monitoring See exactly where your messages are going in your Kafka cluster
  39. 39. 4242 Confluent Control Center: Connector Management
  40. 40. 4343 Confluent Control Center: Alerting Alerts • Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more • Manage alerts for different users and applications from a web UI • Manage alerts for different users and applications from a web UI User authentication • Control access to Confluent Control Center • Integrates with existing enterprise authentication systems
  41. 41. 4444 Auto Data Balancing Dynamically move partitions to optimize resource utilization and reliability • Easily add and remove nodes from your Kafka cluster • Rack aware algorithm rebalances partitions across a cluster • Traffic from balancer is throttled when data transfer occurs Before After Rebalanc e
  42. 42. 4545 Multi-Datacenter Replication An easy reliable way to run Kafka across datacenters Improve reliability • Easily configure & maintain cross cluster replication Simplify management • Centralized configuration and monitoring • Replicate entire cluster or a subset of topics • Automatic replication of topic configuration • Use Kafka’s SASL for Kerberos, Active Directory • SSL encryption between datacenters
  43. 43. 4646 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  44. 44. 4747 Thank You

×