Inside Kafka Streams—Monitoring Comcast’s Outside Plant

INSIDE KAFKA STREAMS -
MONITORING COMCAST’S OUTSIDE PLANT
October 17, 2018

2
MONITORING
”OUTSIDE PLANT”
EXTERNAL FACTORS
• Weather
• Power
• Construction
REQUIREMENTS
• Fast
• Resilient
• Evolvable

3
NODENODENODE
MONITORING ”OUTSIDE PLANT”
CMTS
ANALYZE
NODES
GENERATE
EVENTS
SCORE
NODES
PRIORITIZE
WORK
POLL
POLL
THRESHOLDS
DEVICES
USE KAFKA
STREAMS TO
MONITOR
AND
RESPOND TO
OUTSIDE
PLANT
CONDITIONS

4
TOPOLOGY
Copy
XX%
STREAMING APPLICATION APIS
KAFKA STREAMING TECH STACK
CONCEPTS
INFORMATION SYSTEM
CLIENTPROCESSOR
API
STREAMS
API
KSQL
STREAMING APPLICATIONS
KAFKA SERVER PLATFORM
ANALYSIS EVENTS NODE SCORE
BROKERS ZOOKEEPER CONNECT
PRODUCER TOPIC CONSUMER
CLIENTJAR
ADMIN CLIENT
STREAMSJARKSQLJAR
SOURCE PROCESSOR SINK
STATE STORE
BUILDER
KSTREAM OPERATIONS
KTABLE
STREAM KSQL
TABLE

5
PARTITION MECHANICS
KAFKA CONSUMERS & PRODUCERS
CONCEPTS
PRODUCERS
TOPIC CONSUMER GROUP
BROKERBROKERBROKERBROKER
P 0
P 0
P 0
P 1
P 2
P 1
P 1
P 2
P 2
KEY
VALUE
KEY
VALUE
KEY
VALUE
KEY
VALUE
PRODUCER
PRODUCER
PRODUCER
PRODUCER
H
A
S
H
CONSUMER
CONSUMER
CONSUMER
CONSUMER
CONSUMER GROUP
CONSUMER

6
PARTITION MECHANICS IN THE PROCESSOR API
KAFKA STREAMS - PROCESSOR API - PARTITIONING
CONCEPTS
SOURCE-TOPICS
P 0
P 1
P 2
SINK-TOPICS
P 0
P 1
P 2
STATE-STORE-CHANGELOG-TOPICS
P 0 P 1 P 2
TOPOLOGY
SOURCES SINKS
STATE STORES
INSTANCE 0
SOURCES PROCESSORS SINKS
STATE STORES
INSTANCE 1
SOURCES PROCESSOR SINKS
STATE STORES
INSTANCE 2
SOURCES PROCESSORS SINKS
STATE STORES
PROCESSORS

7
PROCESSOR
STATE STORE DURABILITY
KAFKA STREAMS - PROCESSOR API – STATE STORE
CONCEPTS
STATE STORE (P 0)
STATE-STORE-CHANGELOG-TOPIC (P 0)
K:1
V:a
K:1
V:a
K:2
V:b
K:2
V:b
K:3
V:c
K:3
V:c
K:1
V:a
K:2
V:b
K:3
V:c
K:1
V:d
K:1
V:d
K:1
V:d
K:2
V:e
K:1
V:f
K:2
V:e
K:2
V:e
K:1
V:f
K:1
V:f
LOG COMPACTION
K:1
V:a
K:2
V:b
K:1
V:d
SOURCE TOPIC (P 0)
K:1
V:a
K:2
V:b
K:3
V:c
K:1
V:d
K:2
V:e
K:1
V:f

8
TODAY’S DISCUSSION – PROCESSOR API PATTERNS
Monitoring Outside Plant
Streaming Concepts
Processor API Patterns
Tech Notes

9
STATE STORE USE
STATE STORES ARE USED FOR
• Deduplication of requests
• Materialized view of table data
• Rolling aggregates

1 0
STATE STORE
DE-DEPLICATION OF NODE REQUESTS
• Use Case: Do not have duplicate node polls running concurrently
• This architecture easily prevents duplicate node requests and reduces
resource stress and avoids database locking
• Works in conjunction with sockets to return results to multiple requestors
(UI), or will publish results to a state store

1 1
STATE STORE
DE-DUPLICATION OF NODE REQUESTS
NA
Publish
processor
Node
State
store
request
Set to “running”
Node LK1 “at rest”
Node AnalysisNA
Request
processor
Set to “at rest”
Node Poll Results
DE-DUPLICATION OF NODE REQUESTS
UI (or
other)

1 2
STATE STORE
MATERIALIZED VIEW OF TABLE DATA
• Use Case: Need to have a current list of device data (location & device type),
also ref data, poll data and other needs. We are reshaping the raw data for
use, this is faster that retrieving and reshaping constantly.
• We use the UI, a timer or an external system “push” to load data to a topic
and then a state store. In the future, we would like to get all changes in data
streamed to our application
• We query the app not the DB

1 3
STATE STORE
MATERIALIZED VIEW OF TABLE DATA
Topic with
raw data
processor
State store
Device &
location
State
store
stats
State store
performance
Timer
UI App
Rest
Endpoint
Rest of
application

1 4
STATE STORE
ROLLING AGGREGATES
• Use Case: Keeping track of the steps of a node request
• Can pull stats from this data, and is a powerful tool for checking health of our
application
• Gives us a window into the node analysis steps

1 5
STATE STORE
ROLLING AGGREGATES
Our UI displays the results
from each stage of a node
request. This is a useful tool
for operations.

1 6
TIMERS AND STREAM LOGIC
TIMERS AND STREAMING SOLUTION
• Data Pump
• Plant Validator

1 7
DATA PUMP
• Use Case: Load data from other systems/ sources on a schedule
• Most other systems don’t push change messages to us, and generally make
the data available through rest APIs
• We use a timer to kick off getting the data
• We stream the data to a topic and a processor takes the data, shapes it to the
needed use, and populates a state store or topic.

1 8
Copy
XX%
Copy
XX%
Copy
XX%
Copy
XX%
CODE SAMPLE FOR A TIMER
TIMER DATA LOAD
Spring application
@EnableScheduling
Use @Scheduled
Runs the data load
this one uses web service call

1 9
EVENT VALIDATOR FUNCTION
• Use Case: Some issues found in the outside plant need a “soaking” period to
confirm the issue to determine the correct action (if any)
• We need to move “soaking” events to “confirmed”, “closed” or “outbound”
• We use a one minute timer, check the status of an event, check if event is still
valid and update the status
• Possible through the use of a state store for the soaking events. No need to
query a DB

2 0
EVENT VALIDATOR PROCESS
Validation
Result
Processor
Soaking
Event
store
request
Get soaking events
Node AnalysisValidation
Processor
Event Actions
Timer
Set to “outbound”

2 1
ASYNC REST SERVICE CALLS
USED IN COMMUNICATION WITH LEGACY SYSTEM
• Use Case: Remove latency when calling rest web services
• One thread available per stream topology, so we use Flux (from Reactor Web)
to make async rest service calls to prevent blocking
• To poll a node, we call a webservice that retrieves all the SNMP data for the
devices. The call takes 10+ seconds and the streaming
• When the async calls returns, we are no longer in streams processing, so we
use a producer to create the next topic message.

2 2
NODENODENODE
CMTS
ANALYZE
NODES
GENERATE
EVENTS
SCORE
NODES
PRIORITIZE
WORK
POLL
POLL
THRESHOLDS
ACCOUNTS

2 3
Copy
XX%
CODE SAMPLE CREATING THE ASYNC CALL
WebService call
callClient is a
WebClient
Returns a Mono

2 4
Copy
XX%
Copy
XX%
Copy
XX%
Copy
XX%
CODE SAMPLE PROCESSING THE ASYNC RETURN
Code runs when
call completes
receives the
response object
runs business
logic

2 5
Copy
XX%
Copy
XX%
Copy
XX%
Copy
XX%
CODE SAMPLE PROCESSING THE ASYNC RETURN, GETTING BACK INTO STREAMS USING A PRODUCER
Response returns
We are out of
streaming framework
Need to use a producer

2 6
ADVANTAGES OF STREAMING
ADVANTAGES INCLUDE:
• Looked for solutions to refactor our current application code base
• Faster than a traditional database for this use case
• Streams solution is not monolithic. We’ll be able to update the app quicker
• Modular, once we had the basic framework for Node Analysis developed.
• Programming model is very straightforward
• We only need a few code patterns to solve many of our needs
• Kafka streams is fundamentally different from the other solutions we had
available

2 7
TODAY’S DISCUSSION – TECH NOTES
Monitoring Outside Plant
Streaming Concepts
Processor API Patterns
Tech Notes

2 8
INSTANCE2
REST API ßà STATE STORE
INTERACTIVE QUERY
TECH NOTES
STATE-STORE-CHANGELOG-TOPIC
P 0 P 1 P 2
INSTANCE0
REST API
STATE STORE
INSTANCE1
REST API
STATE STORE STATE STORE
PROPERTY: SERVER & PORT
INJECT TOPOLOGY BUILDER
GET STREAMS FROM TOPOLOGY
STREAMS.STORE (LOOP / TIMER)
META: SERVER MAP WITH PARTITIONS
REST API

2 9
PROCESSOR API - DIAGRAMING & NAMING
DIAGRAMING
TECH NOTES

3 0
MICROSERVICES & DEPLOYMENT
TECH NOTES
APPLICATION INTEGRATION
STREAMING APPLICATIONS
KAFKA SERVER PLATFORM
APP 1 APP 2
TOPICS

3 1
SCRIPT TO MANAGE TOPICS
TOPICS SCRIPT
TECH NOTES
ENVIRONMENT PREFIX
TOPOLOGIES, CONSUMER GROUPS, & TOPICS
KAFKA TOPICS
MIKE- DAN- DEV-
MIKE- DAN- DEV- TEST-
TEST-
Topic Prefix
Partition Count
Target Kafka Cluster
Delete Existing Topics?
Create New Topics?

3 2
GRAFANA & PROMETHEUS
MONITORING
TECH NOTES
KAFKA SERVER
PLATFORM
BROKER
JMX
EXPORTER
GRAFANA DASHBOARDS
YAML
PROMETHEUS PLATFORM
JSON
JNDI(:1234)

3 3
OPEN SOURCE AT COMCAST
APACHE TRAFFIC
CONTROL
Graduated to TLP May 16, 2018
Build a large scale content
delivery network using open
source.
MIRROR-TOOL-FOR-KAFKA-
CONNECT
Comcast’s Rhys McCaig
Mirroring Kafka topics between
clusters using the Kafka
Connect framework.
145 OPEN SOURCE
REPOSITORIES
Comcast is committed to open
source software.
We use OSS to build products,
attract talent and evolve the
technology we use to improve
the customer experience.
comcast.github.io
github.com/comcast
labs.comcast.com

comcast.github.io
github.com/comcast
Mike Graham
twitter datumgeek
github datumgeek
charles_graham@comcast.com
Dan Carroll
twitter dcarroll
github dcarroll
daniel_carroll@comcast.com
Thank You !!!!
J J

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Similar to Inside Kafka Streams—Monitoring Comcast’s Outside Plant (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Inside Kafka Streams—Monitoring Comcast’s Outside Plant