5. Data Flow Apps
Powered by NiFi
Kafka Producers Kafka Topics Kafka TopicsKafka Consumers & Producers Kafka Consumers
US West Fleet
Truck Sensors C++
Agent
US Central Fleet
Truck Sensors C++
Agent
US East Fleet
Truck Sensors C++
Agent
Analytics App 1
Analytics App 2
Analytics App 5
Analytics App 3
Analytics App 4
Streaming
Analytics
Reference
Architecture
6. Data Flow Apps
Powered by NiFi
Kafka Producers Kafka Topics Kafka TopicsKafka Consumers & Producers Kafka Consumers
US West Fleet
Truck Sensors C++
Agent
US Central Fleet
Truck Sensors C++
Agent
US East Fleet
Truck Sensors C++
Agent
Analytics App 1
Analytics App 2
Analytics App 5
Analytics App 3
Analytics App 4
Kafka is Everywhere.
Has Become Critical
Component
In Streaming Architectures
7. What is
“Kafka Blindness”?
Who is
Affected?
What are the
Symptoms?
Kafka’s Omnipresence Has Led to the Onset of “Kafka Blindness”
8. What is “Kafka Blindness”?
Customers who use Kafka today struggle with monitoring /
“seeing”/troubleshooting what is happening in their clusters
9. Who is Affected?
• Platform Operation Teams
• Developers / DevOps Teams
• Security / Governance Teams
10. • Difficulty seeing who is producing & consuming data
• Difficulty understanding the flow of data from producers >
topics > consumers
• Difficulty troubleshooting/monitoring
What are the Symptoms?
11. • Kafka bin CLI utilities
• Various Open Source utilities
• Proprietary Solutions (Confluent Control Center)
• Homegrown solutions
Current Treatments for
“Kafka Blindness”
Has Been Ineffective
14. What is SMM?
New Open Source project led by
Hortonworks to Cure the “Kafka Blindness”
Single Monitoring Dashboard for all
your Kafka Clusters across 4 entities
–Broker
–Producer
–Topic
–Consumer
Designed for the Enterprise
–Support for Secure/Kerborized Kafka cluster
–Rich Access Control Policies (ACLS)
REST as a First Class Citizen
15. Kafka Monitoring
Use Cases:
SMM Addresses
Comprehensive Set of
Kafka Monitoring Use Cases Kafka Platform Operations
Team Concerns
Kafka App Dev/ Dev Ops
Team Concerns
16. Kafka Platform Operations
Team Concerns
What is my aggregate
throughput into (bytes in and
out) and out of my cluster?
Which partitions are
located on each broker?
Which partition on
what brokers are
the hottest?
How many total topics does
my Kafka cluster have?
Are all my replicas in
my topic in-synch?
Which of my topics has
produced/consumers the most
messages over the last N
minutes/hours?
Are any of my
brokers
down?
How of the cluster is being used,
how much capacity do I have
available per broker / cluster?
How many total active
producers and
consumers exist now?
Are any of my brokers
skewed with respect
to throughput?
Do I have any offline
topic partitions?
What hosts are my brokers
located on?
What hosts are my
brokers located on?
Are any of my brokers
running out of disk space?
17. Kafka App Dev/ Dev Ops
Team Concerns
What is the replication
factor for a topic?
What is the retention
rate for a topic?
What are all the producers and
consumers connected to a given
topic right now?
What brokers holds the partitions
for a given topic?
Which broker is the active leader for
a given topic partition?
What is in a given topic / topic
exploration and search?
What is the total number of messages into my
topic over the last N minutes/hours?
Did a consumer
rebalance occur for
a given topic?
Are there consumers in a
consumer-group for a given
topic slow/falling behind?
Are any of my
consumers/consumer-groups that
are over-consuming?
Are any of my
consumers/consumer-groups
that are under-consuming?
How many active consumers instances
are in a given consumer group?
What topic(s) are the consumer group
consuming messages from?
18. Data Flow Apps
Powered by NiFi
US West Fleet
Truck Sensors C++
Agent
US Central Fleet
Truck Sensors C++
Agent
US East Fleet
Truck Sensors C++
Agent
Analytics App 1
Analytics App 2
Analytics App 5
Analytics App 3
Analytics App 4
Lets Use
SMM to “XRAY”
What's Happening in
this Kafka Based Application