Confluent kafka meetupseattle jan2017

11Confidential
State of the Streaming
Platform 2017
What’s new in Apache Kafka and the Confluent Platform
David Tucker, Confluent

44Confidential
The shift to streams
“By 2020, 70% of organizations will adopt data
streaming to enable real-time analytics.”1
1: Gartner: Harness Streaming Data for Real-Time Analytics - Nov 2016
2: Forrester’s 2016 Predictions: Turn Data Into Insight And Action - Nov 2015
“Streaming ingestion and analytics will become
a must-have for digital winners.”2

55Confidential
Vision of a Streaming Enterprise
Search
NewSQL / NoSQL
RDBMS Monitoring
Document StoreReal-time Analytics Data Warehouse
Mobile Apps
Legacy Apps
Hadoop
Streaming Platform

66Confidential
What Can You Do with a Streaming Platform ?
• Publish and Subscribe to streams of data
• Analogous to traditional messaging systems
• Store streams of data
• Consumers can look back in time
• Process streams of data
• Analyze and correlate events in real time

77Confidential
The typical integration architecture
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
MySQL Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces

88Confidential
Challenges abound
Search Security
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
Espresso Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
Difficult to handle
massive amounts of data
Diverse data sets, arriving
at an increasing rate
Many complex data
pipelines
Require a separate
cluster for real-time
Difficult & time
consuming to change
Require mission critical
availability into most
recent/relevant data

99Confidential
Modernized architecture using Apache Kafka
Search Security
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka

1010Confidential
Challenges addressed by a streaming platform
Search Security
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka
Rewind data stream to re-
load into any target system
Scale to meet demands
of diverse streams
Pub/sub to data
streams
Lightweight, easy to
modify with minimal
disruption
Decoupled from upstream
apps creating agility Real-time, context specific
data in the moment

1111Confidential
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stream Data will be
Big AND Fast (Kappa)
From Big Data to Stream Data
Apache Kafka is the Enabling Technology of this Transition
Big Data was
The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data
Job 1 Job 2
Streams
Table 1 Table 2
DB
Speed Table Batch Table
DB
Streams Hadoop

1212Confidential
Ingest, Process, Load, and Serve Data at a Global Scale
Data Systeam A
…
Data System B
…
Kafka cluster
Applications
Other data
stores
Kafka cluster
FIX
Raw data / Events
Kafka Streams
(Data Enrichment and Transformation)
Kafka Connect
(Connectors to Extract and Load data)
Confluent
Replicator
Confluent
Replicator
Custom
Replication
Custom
Replication

1313Confidential
Confluent: Enterprise Streaming Platform based on Apache Kafka™
Confluent
Platform
Database
Changes
Log
Events
loT Data
Web
Events
…
CRM
Data
Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
Apache Open Source Confluent Open Source Confluent Commercial
Confluent Enterprise
Apache Kafka™
Data Compatibility
Monitoring & Administration
Operations
Clients Connectors
Complete
Open
Trusted
Enterprise Grade

1515Confidential
How do I get streams of data
into and out of my apps?
Connect Clients REST

1717Confidential
Apache KafkaTM Connect – Streaming Data Capture
JDBC
IRC /
Twitter
MySQL
Elastic
NoSQL
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Part of Apache Kafka
project
Integrated within
Confluent Platform’s
Control Center

1818Confidential
Apache KafkaTM Connect – Let the framework do the hard work
• Serialization / de-serialization
• Schema Registry integration
• Fault tolerance, automatic fail-over
• Partitioning and scale-out
• … and let the developer focus on domain specific details on copying data

1919Confidential
Kafka Connect Architecture: Logical Model
Connect has three main components: Connectors, Tasks, and Workers
Data flowing into / out of the connectors is a stream; each stream is 1 or more
partitions. In practice, a stream partition could be a database table, a log file, etc.
There may or may not be an exact alignment of streams to Kafka topics.

2020Confidential
Kafka Connect Architecture: Execution Model
Host 1 Host 2
Task 1 Task 2 Task 3 Task 4
Worker 1 Worker 2 Worker 3

2121Confidential
Kafka Connect API Library of Connectors
* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing has been performed.
Databases
*
Datastore/File Store
*
Analytics
*
Applications / Other

2222Confidential
Kafka Clients
Ruby Proxy http/REST
Stdin/stdout
Apache Kafka Native Clients
Confluent Native Clients
Community Supported Clients

2323Confidential
REST Proxy: Talking to Legacy Apps and Across Restricted Networks
REST Proxy
Legacy Applications
Native Kafka
Applications
Schema Registry
REST / HTTP
Simplifies administrative
actions
Simplifies message
creation and consumption
Provides a RESTful
interface to a Kafka
cluster

2424Confidential
How do I maintain my data
formats and ensure compatibility?

2525Confidential
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems
can use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message

2626Confidential
Schema Registry
Elastic
NoSQL
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema Registry
Define the expected fields for each Kafka topic
Automatically handle schema changes (e.g. new fields)
Prevent backwards incompatible changes
Supports multi-datacenter environments

2727Confidential
How do I build stream
processing apps?

2828Confidential
Architecture of Kafka Streams API, a Part of Apache Kafka
Kafka
Streams API
Producer
Kafka Cluster
Topic TopicTopic
Consumer Consumer
Key benefits
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully
integrated from Kafka
Example Use Cases
• Microservices
• Continuous queries
• Continuous transformations
• Event-triggered processes

2929Confidential
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases
• Microservices
• Large-scale continuous queries and transformations
• Event-triggered processes
• Reactive applications
• Customer 360-degree view, fraud detection, location-
based marketing, smart electrical grids, fleet
management, …
Key Benefits of Apache Kafka’s Streams API
• Build Apps, Not Clusters: no additional cluster required
• Elastic, highly-performant, distributed, fault-tolerant,
secure
• Equally viable for small, medium, and large-scale use
cases
• “Run Everywhere”: integrates with your existing
deployment strategies such as containers, automation,
cloud
Your App
Kafka
Streams API

3030Confidential
Architecture Example
Before: Complexity for development and operations, heavy footprint
1 2 3
Capture business
events in Kafka
Must process events with
separate, special-purpose
clusters
Write results
back to Kafka
Your Processing Job

3131Confidential
Architecture Example
With Kafka Streams: App-centric architecture that blends well into your existing infrastructure
1 2 3a
Capture business
events in Kafka
Process events fast, reliably, securely
with standard Java applications
Write results
back to Kafka
Your App
3b
External apps can directly
query the latest results
AppApp
Kafka
Streams API

3333Confidential
How do I manage and monitor
my streaming platform at scale?

3434Confidential
Confluent Control Center: End-to-end Monitoring
See exactly where your messages are going in your Kafka cluster

3535Confidential
Confluent Control Center: Connector Management

3636Confidential
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connector
status, and more
• Manage alerts for different users and
applications from a web UI
• Manage alerts for different users and
applications from a web UI
User authentication
• Control access to Confluent Control Center
• Integrates with existing enterprise
authentication systems

3737Confidential
Data Pipeline Demo
Real-time data firehose archived to searchable stores

3838Confidential
Demo Scenario: Multiple Streaming Data Pipelines
• IRC feed of Wikipedia updates
• IRC Source connector publishes real-time stream of Wikipedia updates to Kafka topic
• Kafka Streams application parses records and re-writes to new topic
• Elasticsearch Sink connector indexes parsed data
• Kibana dashboards visualize Wikipedia updates in real time
• Twitter feed augmented with sentiment data
• Twitter Source connector configured to publish data to Kafka topic
• Kafka Streams application strips extraneous twitter fields and adds sentiment score
• Sink connector saves K-Streams output to key-value store (eg Couchbase or DynamoDB)
• Key-value queries can track sentiment trends

3939Confidential
Wikipedia-to-Elastic Data Pipeline

4040Confidential
Wikipedia Transformation
• Raw input records
{"createdat":1485386068652,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[List of Iranian
Americans]] https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313 *
01:445:4080:1510:F1A4:7C08:B276:FA8B * (+0) /* Media/Journalism */"}
{"createdat":1485386069199,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[In the Bleak
Midwinter]] https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970 * Grover cleveland *
(+422) /* Settings */"}
• Parsed records
{"createdat":1485386068652,"wikipage":"List of Iranian
Americans","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/i
ndex.php?diff=761978901&oldid=760575313","username":"01:445:4080:1510:F1A4:7C08:B276:FA8B","bytech
ange":0,"commitmessage":"/* Media/Journalism */"}
{"createdat":1485386069199,"wikipage":"In the Bleak
Midwinter","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/in
dex.php?diff=761978902&oldid=761960970","username":"Grover
cleveland","bytechange":422,"commitmessage":"/* Settings */"}

4141Confidential
Twitter Transformation
• Raw input records
"CreatedAt": 1479252348000,
"Id": 798668350956126200,
"Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs
#England… https://t.co/G13NUaZj8W",
"Source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>",
"User": { }
128 separate fields
• Filtered records
{"sentiment":"Negative","sentimentScore":1,"UserName":"tits","CreatedAt":1485387765000,"Text":"RT
@STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo
https://t.co/QfCLayqRsH
https://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}

4242Confidential
Kafka Connect Demonstration
Kafka Connect
Apache Kafka Brokers
K-Streams app(s)
1
4
3
2 5
5
1
2
3 4

Confluent kafka meetupseattle jan2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Confluent kafka meetupseattle jan2017

Similar to Confluent kafka meetupseattle jan2017 (20)

More from Nitin Kumar

More from Nitin Kumar (14)

Recently uploaded

Recently uploaded (20)

Confluent kafka meetupseattle jan2017