Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ELK @ LinkedIn
Scaling ELK with Kafka
Introduction
Tin Le (tinle@linkedin.com)
Senior Site Reliability Engineer
Formerly part of Mobile SRE team, responsible fo...
Problems
● Multiple data centers, ten of thousands of servers,
hundreds of billions of log records
● Logging, indexing, se...
Solutions
● Commercial
o Splunk, Sumo Logic, HP ArcSight Logger, Tibco,
XpoLog, Loggly, etc.
● Open Source
o Syslog + Grep...
Criterias
● Scalable - horizontally, by adding more nodes
● Fast - as close to real time as possible
● Inexpensive
● Flexi...
ELK!
The winner is...
Splunk ???
ELK at LinkedIn
● 100+ ELK clusters across 20+ teams and 6
data centers
● Some of our larger clusters have:
o Greater than...
ELK + Kafka
Summary: ELK is a popular open sourced application stack for
visualizing and analyzing logs. ELK is currently ...
What is Kafka?
● Apache Kafka is a high-throughput distributed
messaging system
o Invented at LinkedIn and Open Sourced in...
Kafka at LinkedIn
● Common data transport
● Available and supported by dedicated team
o 875 Billion messages per day
o 200...
Logging using Kafka at LinkedIn
● Dedicated cluster for logs in each data center
● Individual topics per application
● Def...
ELK Architectural Concerns
● Network Concerns
o Bandwidth
o Network partitioning
o Latency
● Security Concerns
o Firewalls...
Multi-colo ELK Architecture
ELK Dashboard
13
Services
ELK Search
Clusters
Log
Transport
Kafka
ELK Search
Clusters
LinkedIn...
ELK Search Architecture
Kibana
Elasticsearch
(tribe)
Kafka
Elasticsearch
(master)
Logstash
Elasticsearch
(data node)
Logst...
Operational Challenges
● Data, lots of it.
o Transporting, queueing, storing, securing,
reliability…
o Ingesting & Indexin...
Operational Challenges...
● Centralized vs Siloed Cluster Management
● Aggregated views of data across the entire
infrastr...
The future of ELK at LinkedIn
● More ELK clusters being used by even more teams
● Clusters with 300+ billion docs (300+TB)...
Extra slides
Next two slides contain example logstash
configs to show how we use input pipe plugin
with Kafka Console Cons...
KCC pipe input config
pipe {
type => "mobile"
command => "/opt/bin/kafka-console-consumer/kafka-console-consumer.sh 
--for...
Monitoring Logstash metrics
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
if “metric” in [tags] [
...
Upcoming SlideShare
Loading in …5
×

ELK at LinkedIn - Kafka, scaling, lessons learned

20,300 views

Published on

How LinkedIn uses and scale ELK clusters using Kafka. Lessons learned.

There are useful notes in the PowerPoint version.

Published in: Internet
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 10M EPS: not bad, congrats
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Great KT ppt
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

ELK at LinkedIn - Kafka, scaling, lessons learned

  1. 1. ELK @ LinkedIn Scaling ELK with Kafka
  2. 2. Introduction Tin Le (tinle@linkedin.com) Senior Site Reliability Engineer Formerly part of Mobile SRE team, responsible for servers handling mobile apps (IOS, Android, Windows, RIM, etc.) traffic. Now responsible for guiding ELK @ LinkedIn as a whole
  3. 3. Problems ● Multiple data centers, ten of thousands of servers, hundreds of billions of log records ● Logging, indexing, searching, storing, visualizing and analysing all of those logs all day every day ● Security (access control, storage, transport) ● Scaling to more DCs, more servers, and even more logs… ● ARRGGGGHHH!!!!!
  4. 4. Solutions ● Commercial o Splunk, Sumo Logic, HP ArcSight Logger, Tibco, XpoLog, Loggly, etc. ● Open Source o Syslog + Grep o Graylog o Elasticsearch o etc.
  5. 5. Criterias ● Scalable - horizontally, by adding more nodes ● Fast - as close to real time as possible ● Inexpensive ● Flexible ● Large user community (support) ● Open source
  6. 6. ELK! The winner is... Splunk ???
  7. 7. ELK at LinkedIn ● 100+ ELK clusters across 20+ teams and 6 data centers ● Some of our larger clusters have: o Greater than 32+ billion docs (30+TB) o Daily indices average 3.0 billion docs (~3TB)
  8. 8. ELK + Kafka Summary: ELK is a popular open sourced application stack for visualizing and analyzing logs. ELK is currently being used across many teams within LinkedIn. The architecture we use is made up of four components: Elasticsearch, Logstash, Kibana and Kafka. ● Elasticsearch: Distributed real-time search and analytics engine ● Logstash: Collect and parse all data sources into an easy-to-read JSON format ● Kibana: Elasticsearch data visualization engine ● Kafka: Data transport, queue, buffer and short term storage
  9. 9. What is Kafka? ● Apache Kafka is a high-throughput distributed messaging system o Invented at LinkedIn and Open Sourced in 2011 o Fast, Scalable, Durable, and Distributed by Design o Links for more:  http://kafka.apache.org  http://data.linkedin.com/opensource/kafka
  10. 10. Kafka at LinkedIn ● Common data transport ● Available and supported by dedicated team o 875 Billion messages per day o 200 TB/day In o 700 TB/day Out o Peak Load  10.5 Million messages/s  18.5 Gigabits/s Inbound  70.5 Gigabits/s Outbound
  11. 11. Logging using Kafka at LinkedIn ● Dedicated cluster for logs in each data center ● Individual topics per application ● Defaults to 4 days of transport level retention ● Not currently replicating between data centers ● Common logging transport for all services, languages and frameworks
  12. 12. ELK Architectural Concerns ● Network Concerns o Bandwidth o Network partitioning o Latency ● Security Concerns o Firewalls and ACLs o Encrypting data in transit ● Resource Concerns o A misbehaving application can swamp production resources
  13. 13. Multi-colo ELK Architecture ELK Dashboard 13 Services ELK Search Clusters Log Transport Kafka ELK Search Clusters LinkedIn Services DC1 Services Kafka ELK Search Clusters DC2 Services Kafka ELK Search Clusters DC3 Tribes Corp Data Centers
  14. 14. ELK Search Architecture Kibana Elasticsearch (tribe) Kafka Elasticsearch (master) Logstash Elasticsearch (data node) Logstash Elasticsearch (data node) Users
  15. 15. Operational Challenges ● Data, lots of it. o Transporting, queueing, storing, securing, reliability… o Ingesting & Indexing fast enough o Scaling infrastructure o Which data? (right data needed?) o Formats, mapping, transformation  Data from many sources: Java, Scala, Python, Node.js, Go
  16. 16. Operational Challenges... ● Centralized vs Siloed Cluster Management ● Aggregated views of data across the entire infrastructure ● Consistent view (trace up/down app stack) ● Scaling - horizontally or vertically? ● Monitoring, alerting, auto-remediating
  17. 17. The future of ELK at LinkedIn ● More ELK clusters being used by even more teams ● Clusters with 300+ billion docs (300+TB) ● Daily indices average 10+ billion docs, 10TB - move to hourly indices ● ~5,000 shards per cluster
  18. 18. Extra slides Next two slides contain example logstash configs to show how we use input pipe plugin with Kafka Console Consumer, and how to monitor logstash using metrics filter.
  19. 19. KCC pipe input config pipe { type => "mobile" command => "/opt/bin/kafka-console-consumer/kafka-console-consumer.sh --formatter com.linkedin.avro.KafkaMessageJsonWithHexFormatter --property schema.registry.url=http://schema- server.example.com:12250/schemaRegistry/schemas --autocommit.interval.ms=60000 --zookeeper zk.example.com:12913/kafka-metrics --topic log_stash_event --group logstash1" codec => “json” }
  20. 20. Monitoring Logstash metrics filter { metrics { meter => "events" add_tag => "metric" } } output { if “metric” in [tags] [ stdout { codec => line { format => “Rate: %{events.rate_1m}” } } }

×