Why Loggly Loves Apache 
Kafka, and How We Use Its 
Unbreakable Messaging for 
Better Apache Log Storm 
Management 
Infrastructure Engineering Team 
June 2014 
| Log management as a service Simplify Log Management
What Loggly Does 
World’s most popular cloud-based 
log management service 
§ More than 5,000 customers 
§ Near real-time indexing of events 
Distributed architecture, built on AWS 
Initial production services in 2011 
§ Loggly Generation 2 released in Sept 2013 
| Log management as a service Simplify Log Management
Loggly: Addressing the first big data 
problem every company faces 
§ Centralized logging 
and archival 
§ Real-time processing, 
analysis and 
visualization 
§ Monitoring, alerting 
and troubleshooting 
| Log management as a service Simplify Log Management
Agenda for this Presentation 
§ The challenges of Log 
Management at scale 
§ Overview of Loggly’s 
processing pipeline 
§ Alternative technologies 
considered 
§ Why we love Apache Kafka 
§ How Kafka has added 
flexibility to our pipeline 
| Log management as a service Simplify Log Management
The Challenges of Log Management at Scale 
§ Big data 
– >750 billion events logged to 
date 
– Sustained bursts of 100,000+ 
events per second 
– Data space measured in 
petabytes 
§ Need for high fault tolerance 
§ Near real-time indexing 
requirements 
§ Time-series index 
management 
| Log management as a service Simplify Log Management
Log Management Processing Pipeline: 
Overview 
Load Balancing 
Kafka 
Stage 
2 
Loggly 
Custom 
Module 
| Log management as a service Simplify Log Management
Collectors Can Easily Outpace 
Downstream Processes 
Load Balancing 
Kafka 
Stage 
2 
Loggly 
Custom 
Module 
§ Written in C++ 
§ Designed to ingest 
massive data volumes 
§ Need to collect 
regardless of what’s 
happening 
downstream 
| Log management as a service Simplify Log Management
Solution: 
Queue That’s External to Collector 
Load Balancing 
Kafka 
Stage 
2 
Loggly 
Custom 
Module 
§ Based on Apache 
Kafka 
§ Highly performant 
and reliable 
| Log management as a service Simplify Log Management
Alternate/ Supplementary 
Approaches Considered 
§ Internal buffering in collectors 
– Added complexity 
§ Cassandra 
– Not as good a queue as Kafka 
§ Apache Storm 
– In initial Gen2 architecture, removed after launch 
| Log management as a service Simplify Log Management
The Secret to Log Management at Scale: 
Keep It Simple, Stupid 
Results: 
§ Can process sustained rates of 
100,000+ events per second per cluster 
§ Average message 300 bytes 
| Log management as a service Simplify Log Management
Why We Love 
Kafka 
| Log management as a service Simplify Log Management
What Attracted Us in the First Place 
No single point 
of failure 
• Terabytes of data move through our Kafka cluster 
every day without losing a single event 
• We use age-based retention to purge old data on disks 
Low latency • 99.99999% of the time our data is coming from disk 
cache and RAM; only very rarely do we hit disk 
Performance • Crazy good! 
• We currently have a bunch of Kafka brokers running 
on m2.xlarge instances backed by provisioned IOPS. 
• One of consumer group (eight threads) which maps a 
log to a customer can process about 200,000 events 
per second draining from 192 partitions spread across 
three brokers 
Scalability • Ability to increase partition count per topic and 
downstream consumer threads provides flexibility to 
increase throughput when desired 
| Log management as a service Simplify Log Management
How Our Kafka Crush Has Deepened 
Distributed log 
collection 
• Local pods and collectors spread all over the Internet with 
local Kafka deployments to collect data from customers 
located all over world 
• Can collect logs even when we lose connectivity 
• When network comes back, Kafka sends the logs 
downstream to the rest of the pipeline 
More efficient, 
effective 
DevOps 
• Deploying Kafka throughout pipeline makes it easy to 
disable certain parts of system (for troubleshooting or 
upgrades) 
• No worrying that we will lose customer data 
• Example: Add support for new log type into our 
automatic parsing capabilities by turning off existing 
parser, deploying new one, and processing logs that 
Kafka has queued up 
Controlling 
resource 
utilization 
• Keep collectors as simple as possible for resilience and 
reliability reasons 
• Add intelligence into our pipelines using Kafka 
| Log management as a service Simplify Log Management
Resource Utilization Example: 
“Noisy Neighbors” 
| Log management as a service Simplify Log Management
“Noisy Neighbors” are 
Inherent to SaaS 
§ Sending many times their “normal” level of 
logging volume, inadvertently or because their 
application is in big trouble 
§ Routing logs to separate queue minimizes 
impact on other customers 
| Log management as a service Simplify Log Management
Kafka Queues Add Flexibility to Loggly 
Pipeline 
§ Because Kafka topics are very cheap from a 
performance and overhead standpoint, we 
can create as many queues as we want 
§ Scaled to the performance we want 
§ Optimizing resource utilization across the system 
§ Because they can be created dynamically, we 
can make business rules very flexible 
§ Makes us confident that pipeline will scale as 
customer data volumes do 
| Log management as a service Simplify Log Management
Conclusion: 
Kafka Frees Our Development Team 
to Build Differentiating Features 
§ Kafka deployment working without us thinking 
about it 
§ Plenty of other things to do to keep our 
position as the world’s most popular cloud-based 
log management service! 
| Log management as a service Simplify Log Management
Does Log Management 
Sound Hard? It Should! 
Let us do the heavy lifting for you! 
Try Loggly FREE for 30 days 
About Us: 
Loggly is the world’s most popular cloud-based log management solution, used by 
more than 5,000 happy customers to effortlessly spot problems in real-time, easily 
pinpoint root causes and resolve issues faster to ensure application success. 
Visit us at loggly.com or follow @loggly on Twitter. 
| Log management as a service Simplify Log Management
Did you like this presentation? 
Head over to our blog for 
more great content! 
Take me to the Loggly Blog 
| Log management as a service Simplify Log Management

Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management

  • 1.
    Why Loggly LovesApache Kafka, and How We Use Its Unbreakable Messaging for Better Apache Log Storm Management Infrastructure Engineering Team June 2014 | Log management as a service Simplify Log Management
  • 2.
    What Loggly Does World’s most popular cloud-based log management service § More than 5,000 customers § Near real-time indexing of events Distributed architecture, built on AWS Initial production services in 2011 § Loggly Generation 2 released in Sept 2013 | Log management as a service Simplify Log Management
  • 3.
    Loggly: Addressing thefirst big data problem every company faces § Centralized logging and archival § Real-time processing, analysis and visualization § Monitoring, alerting and troubleshooting | Log management as a service Simplify Log Management
  • 4.
    Agenda for thisPresentation § The challenges of Log Management at scale § Overview of Loggly’s processing pipeline § Alternative technologies considered § Why we love Apache Kafka § How Kafka has added flexibility to our pipeline | Log management as a service Simplify Log Management
  • 5.
    The Challenges ofLog Management at Scale § Big data – >750 billion events logged to date – Sustained bursts of 100,000+ events per second – Data space measured in petabytes § Need for high fault tolerance § Near real-time indexing requirements § Time-series index management | Log management as a service Simplify Log Management
  • 6.
    Log Management ProcessingPipeline: Overview Load Balancing Kafka Stage 2 Loggly Custom Module | Log management as a service Simplify Log Management
  • 7.
    Collectors Can EasilyOutpace Downstream Processes Load Balancing Kafka Stage 2 Loggly Custom Module § Written in C++ § Designed to ingest massive data volumes § Need to collect regardless of what’s happening downstream | Log management as a service Simplify Log Management
  • 8.
    Solution: Queue That’sExternal to Collector Load Balancing Kafka Stage 2 Loggly Custom Module § Based on Apache Kafka § Highly performant and reliable | Log management as a service Simplify Log Management
  • 9.
    Alternate/ Supplementary ApproachesConsidered § Internal buffering in collectors – Added complexity § Cassandra – Not as good a queue as Kafka § Apache Storm – In initial Gen2 architecture, removed after launch | Log management as a service Simplify Log Management
  • 10.
    The Secret toLog Management at Scale: Keep It Simple, Stupid Results: § Can process sustained rates of 100,000+ events per second per cluster § Average message 300 bytes | Log management as a service Simplify Log Management
  • 11.
    Why We Love Kafka | Log management as a service Simplify Log Management
  • 12.
    What Attracted Usin the First Place No single point of failure • Terabytes of data move through our Kafka cluster every day without losing a single event • We use age-based retention to purge old data on disks Low latency • 99.99999% of the time our data is coming from disk cache and RAM; only very rarely do we hit disk Performance • Crazy good! • We currently have a bunch of Kafka brokers running on m2.xlarge instances backed by provisioned IOPS. • One of consumer group (eight threads) which maps a log to a customer can process about 200,000 events per second draining from 192 partitions spread across three brokers Scalability • Ability to increase partition count per topic and downstream consumer threads provides flexibility to increase throughput when desired | Log management as a service Simplify Log Management
  • 13.
    How Our KafkaCrush Has Deepened Distributed log collection • Local pods and collectors spread all over the Internet with local Kafka deployments to collect data from customers located all over world • Can collect logs even when we lose connectivity • When network comes back, Kafka sends the logs downstream to the rest of the pipeline More efficient, effective DevOps • Deploying Kafka throughout pipeline makes it easy to disable certain parts of system (for troubleshooting or upgrades) • No worrying that we will lose customer data • Example: Add support for new log type into our automatic parsing capabilities by turning off existing parser, deploying new one, and processing logs that Kafka has queued up Controlling resource utilization • Keep collectors as simple as possible for resilience and reliability reasons • Add intelligence into our pipelines using Kafka | Log management as a service Simplify Log Management
  • 14.
    Resource Utilization Example: “Noisy Neighbors” | Log management as a service Simplify Log Management
  • 15.
    “Noisy Neighbors” are Inherent to SaaS § Sending many times their “normal” level of logging volume, inadvertently or because their application is in big trouble § Routing logs to separate queue minimizes impact on other customers | Log management as a service Simplify Log Management
  • 16.
    Kafka Queues AddFlexibility to Loggly Pipeline § Because Kafka topics are very cheap from a performance and overhead standpoint, we can create as many queues as we want § Scaled to the performance we want § Optimizing resource utilization across the system § Because they can be created dynamically, we can make business rules very flexible § Makes us confident that pipeline will scale as customer data volumes do | Log management as a service Simplify Log Management
  • 17.
    Conclusion: Kafka FreesOur Development Team to Build Differentiating Features § Kafka deployment working without us thinking about it § Plenty of other things to do to keep our position as the world’s most popular cloud-based log management service! | Log management as a service Simplify Log Management
  • 18.
    Does Log Management Sound Hard? It Should! Let us do the heavy lifting for you! Try Loggly FREE for 30 days About Us: Loggly is the world’s most popular cloud-based log management solution, used by more than 5,000 happy customers to effortlessly spot problems in real-time, easily pinpoint root causes and resolve issues faster to ensure application success. Visit us at loggly.com or follow @loggly on Twitter. | Log management as a service Simplify Log Management
  • 19.
    Did you likethis presentation? Head over to our blog for more great content! Take me to the Loggly Blog | Log management as a service Simplify Log Management