SlideShare a Scribd company logo
ELK
Log processing at Scale
#DevOpsDays 2015, Singapore
@DevOpsDaysSG
Angad Singh
About me
DevOps at Viki, Inc - A global
video streaming site with
subtitles.
Previously a Twitter SRE,
National University of Singapore
Twitter @angadsg,
Github @angad
Elasticsearch - Log Indexing and Searching
Logstash - Log Ingestion plumbing
Kibana - Frontend
{
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Logging
● Useful for debugging
● Catch-all
● Full text searching
● Computationally intensive, harder
to scale
Metrics vs Logging
Metrics
● Numeric timeseries data
● Actionable
● Counts, Statistical (p90, p99 etc.)
● Scalable cost-effective solutions
already available
Alerting and Monitoring at Viki
Deeper level
debugging with
application logs
Success Rate
Alert for
service X
Logs
● Application logs - Stack Traces, Handled Exceptions
● Access Logs - Status codes, URI, HTTP Method at all levels of the stack
● Client Logs - Direct HTTP requests containing log events from client-side
Javascript or Mobile application (android/ios)
● Standardized log format to JSON - easy to add / remove fields.
● Request tracing through various services using Unique-ID at Load Balancer
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
● Log aggregator
● Log preprocessing
(Filtering etc.)
● 3 stage pipeline
● Input > Filter > Output
Logstash Elasticsearch
● Full text searching and
indexing
● on top of Apache
Lucene
● RESTful web interface
● Horizontally scalable
Kibana
● Frontend
● Visualizations,
Dashboards
● Supports Geo
visualizations
● Uses ES REST API
Input
Any Stream
● local file
● queue
● tcp, udp
● twitter
● etc..
Logstash
Filter
Mutation
● add/remove field
● parse as json
● ruby code
● parse geoip
● etc..
Output
● elasticsearch
● redis
● queue
● file
● pagerduty
● etc..
● Golang program that sits next to log files, lumberjack protocol
● Forwards logs from a file to a logstash server
● Removes the need for a buffer (such as redis, or a queue) for
logs pending ingestion to logstash.
● Docker container with volume mounted /var/log.
Configuration stored in Consul.
● Application containers with volume mounted /var/log to
/var/log/docker/<container>/application.log
Logstash Forwarder
Logstash pool with HAProxy
4 x logstash machines, 8 cores, 16 GB
RAM
7 x logstash processes per machine, 5 for
application logs, 2 for HTTP client logs.
Fronted by HAProxy for both lumberjack
protocol as well as HTTP protocol.
Easily scalable by adding more machines
and spinning up more logstash processes.
Application
Service
Container 1
Application
Service
Container 2
Logstash-Forwarder
Container
Mounted /var/log
to
/var/log/docker/
on host
Elasticsearch Hardware
12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks.
20 nodes, 20 shards, 3 replicas (with 1 primary).
Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB.
Average 6k-8k logs per second, peak 25k logs per second.
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Elasticsearch Hardware
● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap
● Sweet spot - 64GB of RAM with half available for Lucene file buffers.
● SSD or RAID 0 (or multiple path directories similar to RAID 0).
● If SSD then set I/O scheduler to deadline instead of cfq.
● RAID0 - no need to worry about disks failing as machines can easily be
replaced due to multiple copies of data.
● Disable swap.
Hardware Tuning
● 20 days of indexes open based on available memory, rest closed - open on
demand
● Field data - cache used while sorting and aggregating data.
● Circuit breaker - cancels requests which require large memory, prevent OOM,
http://elasticsearch:9200/_cache/clear if field data is very close to memory
limit.
● Shards >= Number of nodes
● Lucene forceMerge - minor performance improvements for older indexes
(https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize.
html)
Elasticsearch Configuration
Prevent split brain situation to avoid losing data - set minimum number of master
eligible nodes to (n/2 + 1)
Set higher ulimit for elasticsearch process
Daily cronjob which deletes data older than 90 days, closes indices older than 20
days, optimizes (forceMerge) indices older than 2 days
And also...
Marvel - Official plugin from Elasticsearch
KOPF - Index management plugin
CAT APIs - REST APIs to view cluster information
Curator - Data management
Monitoring
Thanks
email: angad@viki.com
twitter: @angadsg

More Related Content

What's hot

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
Taro L. Saito
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
Simon Hanmer
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
José Manuel Ciges Regueiro
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
kbajda
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
nullowaspmumbai
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
HostedbyConfluent
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
John Lynch
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
John Hubbard
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
Yann Cluchey
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
Bill Graham
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較
Yoshiyasu SAEKI
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
John Muellerleile
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
Cysinfo Cyber Security Community
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Jen Aman
 

What's hot (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較ストリーミングデータのアドホック分析エンジンの比較
ストリーミングデータのアドホック分析エンジンの比較
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
 
Security Analytics using ELK stack
Security Analytics using ELK stack	Security Analytics using ELK stack
Security Analytics using ELK stack
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 

Similar to Scaling ELK Stack - DevOpsDays Singapore

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
BrettTasker
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
Red_Hat_Storage
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
Divante
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
Dmytro Semenov
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
Ceph Community
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
Alluxio, Inc.
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
Colleen Corrice
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
Red_Hat_Storage
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013
Syed Shaaf
 

Similar to Scaling ELK Stack - DevOpsDays Singapore (20)

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Silverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applicationsSilverstripe at scale - design & architecture for silverstripe applications
Silverstripe at scale - design & architecture for silverstripe applications
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013Getting to know the Grid - Goto Aarhus 2013
Getting to know the Grid - Goto Aarhus 2013
 

Recently uploaded

Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
narwatsonia7
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
Emre Gündoğdu
 
Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
Federico Ast
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
GNAMBIKARAO
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
thezot
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
dtagbe
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
3a0sd7z3
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
APNIC
 
How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
Infosec train
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
Tarandeep Singh
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
rtunex8r
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
APNIC
 

Recently uploaded (13)

Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
 
Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
 
How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
 

Scaling ELK Stack - DevOpsDays Singapore

  • 1. ELK Log processing at Scale #DevOpsDays 2015, Singapore @DevOpsDaysSG Angad Singh
  • 2. About me DevOps at Viki, Inc - A global video streaming site with subtitles. Previously a Twitter SRE, National University of Singapore Twitter @angadsg, Github @angad
  • 3. Elasticsearch - Log Indexing and Searching Logstash - Log Ingestion plumbing Kibana - Frontend {
  • 4. Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 5. Logging ● Useful for debugging ● Catch-all ● Full text searching ● Computationally intensive, harder to scale Metrics vs Logging Metrics ● Numeric timeseries data ● Actionable ● Counts, Statistical (p90, p99 etc.) ● Scalable cost-effective solutions already available
  • 6. Alerting and Monitoring at Viki Deeper level debugging with application logs Success Rate Alert for service X
  • 7. Logs ● Application logs - Stack Traces, Handled Exceptions ● Access Logs - Status codes, URI, HTTP Method at all levels of the stack ● Client Logs - Direct HTTP requests containing log events from client-side Javascript or Mobile application (android/ios) ● Standardized log format to JSON - easy to add / remove fields. ● Request tracing through various services using Unique-ID at Load Balancer
  • 8. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash
  • 9. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable
  • 10. ● Log aggregator ● Log preprocessing (Filtering etc.) ● 3 stage pipeline ● Input > Filter > Output Logstash Elasticsearch ● Full text searching and indexing ● on top of Apache Lucene ● RESTful web interface ● Horizontally scalable Kibana ● Frontend ● Visualizations, Dashboards ● Supports Geo visualizations ● Uses ES REST API
  • 11.
  • 12. Input Any Stream ● local file ● queue ● tcp, udp ● twitter ● etc.. Logstash Filter Mutation ● add/remove field ● parse as json ● ruby code ● parse geoip ● etc.. Output ● elasticsearch ● redis ● queue ● file ● pagerduty ● etc..
  • 13. ● Golang program that sits next to log files, lumberjack protocol ● Forwards logs from a file to a logstash server ● Removes the need for a buffer (such as redis, or a queue) for logs pending ingestion to logstash. ● Docker container with volume mounted /var/log. Configuration stored in Consul. ● Application containers with volume mounted /var/log to /var/log/docker/<container>/application.log Logstash Forwarder
  • 14. Logstash pool with HAProxy 4 x logstash machines, 8 cores, 16 GB RAM 7 x logstash processes per machine, 5 for application logs, 2 for HTTP client logs. Fronted by HAProxy for both lumberjack protocol as well as HTTP protocol. Easily scalable by adding more machines and spinning up more logstash processes.
  • 16. Elasticsearch Hardware 12 core, 64GB RAM with RAID 0 - 2 x 3TB 7200rpm disks. 20 nodes, 20 shards, 3 replicas (with 1 primary). Each day ~300GB x 4 copies (3 + 1) ~ 3 months of data on 120TB. Average 6k-8k logs per second, peak 25k logs per second. https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
  • 18. ● < 30.5 GB Heap - JAVA compressed pointers below 30.5GB heap ● Sweet spot - 64GB of RAM with half available for Lucene file buffers. ● SSD or RAID 0 (or multiple path directories similar to RAID 0). ● If SSD then set I/O scheduler to deadline instead of cfq. ● RAID0 - no need to worry about disks failing as machines can easily be replaced due to multiple copies of data. ● Disable swap. Hardware Tuning
  • 19. ● 20 days of indexes open based on available memory, rest closed - open on demand ● Field data - cache used while sorting and aggregating data. ● Circuit breaker - cancels requests which require large memory, prevent OOM, http://elasticsearch:9200/_cache/clear if field data is very close to memory limit. ● Shards >= Number of nodes ● Lucene forceMerge - minor performance improvements for older indexes (https://www.elastic.co/guide/en/elasticsearch/client/curator/current/optimize. html) Elasticsearch Configuration
  • 20. Prevent split brain situation to avoid losing data - set minimum number of master eligible nodes to (n/2 + 1) Set higher ulimit for elasticsearch process Daily cronjob which deletes data older than 90 days, closes indices older than 20 days, optimizes (forceMerge) indices older than 2 days And also...
  • 21.
  • 22. Marvel - Official plugin from Elasticsearch KOPF - Index management plugin CAT APIs - REST APIs to view cluster information Curator - Data management Monitoring