Technology behind Real Time Log Analytics
ELK- Elasticsearch, Logstash and Kibana
By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference
28 May 2016
SUPAKET WONGKAMPOO
Software Engineer @ Agoda
*DevOps in passion*
- Full Stack Developer
- Virtualisation and Infrastruction as code (Puppet/Ansible)
- Release Management and continuous development
- Real time Log Analytics
State of the Art, Logging Terminology
in Large Scale Data processing
Common use cases
•*Issue debugging
•*Performance analysis
•Security analysis
•*Predictive analysis
•Internet of things (IoT) and logging
Challenges in log analysis
•*Non-consistent log format
•*Decentralized logs
•Expert knowledge requirement
Non-consistent log format
TOMCAT LOGS
A typical tomcat server startup log entry will look like this:
May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deployment of web application archive softapache-tomcat-7.0.62webappssample.war has
finished in 253 ms
APACHE ACCESS LOGS – COMBINED LOG FORMAT
A typical Apache access log entry will look like this:
127.0.0.1 - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630
IIS LOGS
A typical IIS log entry will look like this:
2012-05-02 17:42:15 172.24.255.255 - 172.20.255.255 80 GET /images/favicon.ico - 200 Mozilla/
4.0+(compatible;MSIE+5.5;+Windows+2000+Server)
DECENTRALIZED LOGS
For one or two servers' setup, finding out some information from logs involves running cat or tail commands or
piping these results to grep command.
Elasticsearch
Elasticsearch - Key feature
•• Schema-free, REST & JSON based document store
•• Distributed and horizontally scalable
•• Open Source: Apache License 2.0
•• Zero configuration
•• Written in Java, extensible
Elasticsearch - Term
• Index - Logical collection of data; might be time based Analogous to a database
• Replications - Read scalability, Removing SPOF
• Sharding - Split logical data over several machines Write scalability, Control
data flows
Elasticsearch - Distributed and scalable
Elasticsearch - Distributed and scalable
Elasticsearch - use cases
• Product search engine, Products grouped, Allowing to filter
• Scoring
✴ Possible influential factors, Age of the product, been ordered in last 24h In
Stock?, No shipping costs, Special offer, Rating
• Analytics
✴ Aggregation, multidimensional (Average revenue per category id per day)
Logstash
Logstash
• Managing events and logs
• Collect, parse, enrich, store data
• Modular: many, many inputs and outputs
• Open Source: Apache License 2.0
• Ruby app
• Part of Elasticsearch family
Why collect & centralize logs?
•Access log files without system access
•Shell scripting: Too limited or slow
•Using unique ids for errors, aggregate it across your stack
•Reporting (everyone can create his/her own report)
•Bonus points: Unify your data to make it easily
•Searchable
Logstash-Architecture
? ?
outputFilterInput
Logstash-Inputs
• Monitoring: collectd, graphite, ganglia, snmptrap, zenoss
• Datastores: elasticsearch, redis, sqlite, s3
• Queues: rabbitmq, zeromq, kafka
• Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log
Logstash-Filters
•alter, anonymize, checksum, csv, drop, multiline
•dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range
•json, urldecode, useragent
Logstash-Outputs
• Store: elasticsearch, gemfire, mongodb, redis, riak, rabbitmq
• Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabb
• Notification: email, hipchat, irc, pagerduty, sns
• Protocol: http, lumberjack, metriccatcher, stomp,
Kibana
•Flexible analytics and data visualization platform
Kibana
Combine - ELK
Hands on - ELK
Web
Web
Web
Web
Web
Web
KafKa
Q&A

Technology behind-real-time-log-analytics

  • 1.
    Technology behind RealTime Log Analytics ELK- Elasticsearch, Logstash and Kibana By Supaket Wongkampoo @ Predictive Analytics and Data Science Conference 28 May 2016
  • 2.
    SUPAKET WONGKAMPOO Software Engineer@ Agoda *DevOps in passion* - Full Stack Developer - Virtualisation and Infrastruction as code (Puppet/Ansible) - Release Management and continuous development - Real time Log Analytics
  • 3.
    State of theArt, Logging Terminology in Large Scale Data processing
  • 4.
    Common use cases •*Issuedebugging •*Performance analysis •Security analysis •*Predictive analysis •Internet of things (IoT) and logging
  • 5.
    Challenges in loganalysis •*Non-consistent log format •*Decentralized logs •Expert knowledge requirement
  • 6.
    Non-consistent log format TOMCATLOGS A typical tomcat server startup log entry will look like this: May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR INFO: Deployment of web application archive softapache-tomcat-7.0.62webappssample.war has finished in 253 ms APACHE ACCESS LOGS – COMBINED LOG FORMAT A typical Apache access log entry will look like this: 127.0.0.1 - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630 IIS LOGS A typical IIS log entry will look like this: 2012-05-02 17:42:15 172.24.255.255 - 172.20.255.255 80 GET /images/favicon.ico - 200 Mozilla/ 4.0+(compatible;MSIE+5.5;+Windows+2000+Server)
  • 7.
    DECENTRALIZED LOGS For oneor two servers' setup, finding out some information from logs involves running cat or tail commands or piping these results to grep command.
  • 8.
  • 9.
    Elasticsearch - Keyfeature •• Schema-free, REST & JSON based document store •• Distributed and horizontally scalable •• Open Source: Apache License 2.0 •• Zero configuration •• Written in Java, extensible
  • 10.
    Elasticsearch - Term •Index - Logical collection of data; might be time based Analogous to a database • Replications - Read scalability, Removing SPOF • Sharding - Split logical data over several machines Write scalability, Control data flows
  • 11.
  • 12.
  • 13.
    Elasticsearch - usecases • Product search engine, Products grouped, Allowing to filter • Scoring ✴ Possible influential factors, Age of the product, been ordered in last 24h In Stock?, No shipping costs, Special offer, Rating • Analytics ✴ Aggregation, multidimensional (Average revenue per category id per day)
  • 14.
  • 15.
    Logstash • Managing eventsand logs • Collect, parse, enrich, store data • Modular: many, many inputs and outputs • Open Source: Apache License 2.0 • Ruby app • Part of Elasticsearch family
  • 16.
    Why collect &centralize logs? •Access log files without system access •Shell scripting: Too limited or slow •Using unique ids for errors, aggregate it across your stack •Reporting (everyone can create his/her own report) •Bonus points: Unify your data to make it easily •Searchable
  • 17.
  • 18.
    Logstash-Inputs • Monitoring: collectd,graphite, ganglia, snmptrap, zenoss • Datastores: elasticsearch, redis, sqlite, s3 • Queues: rabbitmq, zeromq, kafka • Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log
  • 19.
    Logstash-Filters •alter, anonymize, checksum,csv, drop, multiline •dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •json, urldecode, useragent
  • 20.
    Logstash-Outputs • Store: elasticsearch,gemfire, mongodb, redis, riak, rabbitmq • Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabb • Notification: email, hipchat, irc, pagerduty, sns • Protocol: http, lumberjack, metriccatcher, stomp,
  • 21.
    Kibana •Flexible analytics anddata visualization platform
  • 22.
  • 23.
  • 24.
    Hands on -ELK Web Web Web Web Web Web KafKa
  • 25.