SlideShare a Scribd company logo
ELK
Log processing at Scale
#DevOpsDays 2015, Singapore
@DevOpsDaysSG
Angad Singh
About me
DevOps at Viki, Inc - A global
video streaming site with
subtitles.
Previously a Twitter SRE,
National University of Singapore
Twitter @angadsg,
Github @angad
Elasticsearch - Log Indexing and Searching
Logstash - Log Ingestion plumbing
Kibana - Frontend
{
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Logging
● Useful for debugging
● Catch-all
● Full text searching
● Computationally intensive, harder
to scale
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Alerting and Monitoring at Viki
Deeper level
debugging with
application logs
Success Rate
Alert for
service X
Logs
● Application logs - Stack Traces, Handled Exceptions
● Access Logs - Status codes, URI, HTTP Method at all levels of the stack
● Client Logs - Direct HTTP requests containing log events from client-side
Javascript or Mobile application (android/ios)
● Standardized log format to JSON - easy to add / remove fields.
● Request tracing through various services using Unique-ID at Load Balancer
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
Kibana
● Frontend
● Visualizations,
Dashboards
● Supports Geo
visualizations
● Uses ES REST API
Input
Any Stream
● local file
● queue
● tcp, udp
● twitter
● etc..
Logstash
Filter
Mutation
● add/remove field
● parse as json
● ruby code
● parse geoip
● etc..
Output
● elasticsearch
● redis
● queue
● file
● pagerduty
● etc..
● Golang program that sits next to log files, lumberjack protocol
● Forwards logs from a file to a logstash server
● Removes the need for a buffer (such as redis, or a queue) for
logs pending ingestion to logstash.
● Docker container with volume mounted /var/log.
Configuration stored in Consul.
● Application containers with volume mounted /var/log to
/var/log/docker/<container>/application.log
Logstash Forwarder
Logstash pool with HAProxy
4 x logstash machines, 8 cores, 16 GB
RAM
7 x logstash processes per machine, 5 for
application logs, 2 for HTTP client logs.
Fronted by HAProxy for both lumberjack
protocol as well as HTTP protocol.
Easily scalable by adding more machines
and spinning up more logstash processes.
Application
Service
Container 1
Application
Service
Container 2
Logstash-Forwarder
Container
Mounted /var/log
to
/var/log/docker/
on host
Elasticsearch Hardware
12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks.
20 nodes, 20 shards, 3 replicas (with 1 primary).
Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB.
Average 6k-8k logs per second, peak 25k logs per second.
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Elasticsearch Hardware
● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap
● Sweet spot - 64GB of RAM with half available for Lucene file buffers.
● SSD or RAID 0 (or multiple path directories similar to RAID 0).
● If SSD then set I/O scheduler to deadline instead of cfq.
● RAID0 - no need to worry about disks failing as machines can easily be
replaced due to multiple copies of data.
● Disable swap.
Hardware Tuning
● 20 days of indexes open based on available memory, rest closed - open on
demand
● Field data - cache used while sorting and aggregating data.
● Circuit breaker - cancels requests which require large memory, prevent OOM,
http://elasticsearch:9200/_cache/clear if field data is very close to memory
limit.
● Shards >= Number of nodes
● Lucene forceMerge - minor performance improvements for older indexes
(https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize.
html)
Elasticsearch Configuration
Prevent split brain situation to avoid losing data - set minimum number of master
eligible nodes to (n/2 + 1)
Set higher ulimit for elasticsearch process
Daily cronjob which deletes data older than 90 days, closes indices older than 20
days, optimizes (forceMerge) indices older than 2 days
And also...
Marvel - Official plugin from Elasticsearch
KOPF - Index management plugin
CAT APIs - REST APIs to view cluster information
Curator - Data management
Monitoring
Thanks
email: angad@viki.com
twitter: @angadsg

More Related Content

What's hot

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
Taro L. Saito
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
Simon Hanmer
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
José Manuel Ciges Regueiro
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
kbajda
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
nullowaspmumbai
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
HostedbyConfluent
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
John Lynch
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
John Hubbard
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
Yann Cluchey
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
Bill Graham
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較
Yoshiyasu SAEKI
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
John Muellerleile
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
Cysinfo Cyber Security Community
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 

What's hot (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 

Similar to Scaling ELK Stack - DevOpsDays Singapore

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
BrettTasker
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
Red_Hat_Storage
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
Divante
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
Dmytro Semenov
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
Ceph Community
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
Alluxio, Inc.
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
Red_Hat_Storage
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
Colleen Corrice
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013
Syed Shaaf
 

Similar to Scaling ELK Stack - DevOpsDays Singapore (20)

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013
 

Recently uploaded

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
TristanJasperRamos
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
Himani415946
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
ShahulHameed54211
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 

Recently uploaded (16)

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 

Scaling ELK Stack - DevOpsDays Singapore

  • 1. ELK Log processing at Scale #DevOpsDays 2015, Singapore @DevOpsDaysSG Angad Singh
  • 2. About me DevOps at Viki, Inc - A global video streaming site with subtitles. Previously a Twitter SRE, National University of Singapore Twitter @angadsg, Github @angad
  • 3. Elasticsearch - Log Indexing and Searching Logstash - Log Ingestion plumbing Kibana - Frontend {
  • 4. Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 5. Logging ● Useful for debugging ● Catch-all ● Full text searching ● Computationally intensive, harder to scale Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 6. Alerting and Monitoring at Viki Deeper level debugging with application logs Success Rate Alert for service X
  • 7. Logs ● Application logs - Stack Traces, Handled Exceptions ● Access Logs - Status codes, URI, HTTP Method at all levels of the stack ● Client Logs - Direct HTTP requests containing log events from client-side Javascript or Mobile application (android/ios) ● Standardized log format to JSON - easy to add / remove fields. ● Request tracing through various services using Unique-ID at Load Balancer
  • 8. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash
  • 9. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable
  • 10. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable Kibana ● Frontend ● Visualizations, Dashboards ● Supports Geo visualizations ● Uses ES REST API
  • 11.
  • 12. Input Any Stream ● local file ● queue ● tcp, udp ● twitter ● etc.. Logstash Filter Mutation ● add/remove field ● parse as json ● ruby code ● parse geoip ● etc.. Output ● elasticsearch ● redis ● queue ● file ● pagerduty ● etc..
  • 13. ● Golang program that sits next to log files, lumberjack protocol ● Forwards logs from a file to a logstash server ● Removes the need for a buffer (such as redis, or a queue) for logs pending ingestion to logstash. ● Docker container with volume mounted /var/log. Configuration stored in Consul. ● Application containers with volume mounted /var/log to /var/log/docker/<container>/application.log Logstash Forwarder
  • 14. Logstash pool with HAProxy 4 x logstash machines, 8 cores, 16 GB RAM 7 x logstash processes per machine, 5 for application logs, 2 for HTTP client logs. Fronted by HAProxy for both lumberjack protocol as well as HTTP protocol. Easily scalable by adding more machines and spinning up more logstash processes.
  • 16. Elasticsearch Hardware 12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks. 20 nodes, 20 shards, 3 replicas (with 1 primary). Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB. Average 6k-8k logs per second, peak 25k logs per second. https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
  • 18. ● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap ● Sweet spot - 64GB of RAM with half available for Lucene file buffers. ● SSD or RAID 0 (or multiple path directories similar to RAID 0). ● If SSD then set I/O scheduler to deadline instead of cfq. ● RAID0 - no need to worry about disks failing as machines can easily be replaced due to multiple copies of data. ● Disable swap. Hardware Tuning
  • 19. ● 20 days of indexes open based on available memory, rest closed - open on demand ● Field data - cache used while sorting and aggregating data. ● Circuit breaker - cancels requests which require large memory, prevent OOM, http://elasticsearch:9200/_cache/clear if field data is very close to memory limit. ● Shards >= Number of nodes ● Lucene forceMerge - minor performance improvements for older indexes (https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize. html) Elasticsearch Configuration
  • 20. Prevent split brain situation to avoid losing data - set minimum number of master eligible nodes to (n/2 + 1) Set higher ulimit for elasticsearch process Daily cronjob which deletes data older than 90 days, closes indices older than 20 days, optimizes (forceMerge) indices older than 2 days And also...
  • 21.
  • 22. Marvel - Official plugin from Elasticsearch KOPF - Index management plugin CAT APIs - REST APIs to view cluster information Curator - Data management Monitoring