Search and Analyze Data
in Real Time
Prashant Shewale and Rohit Kalsarpe
Agenda
1 Problem in validating logs
2 How Logstash can help
3 ELK Stack (Elastic Search, Logstash, Kibana)
4 Some hands on
5 How we used ELK stack in our automation framework
6 World beyond
Problem in validating logs
 Follow active log files.
 Logs keep growing and are rotated.
 Collating multiline logs in single event is difficult task.
 We have different kinds of applications and hence different
kinds of logs. And that have different formats.
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com
"http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500
www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com
"http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com
"http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ...)" "-"
Sample Apache Log
Feb 4 06:10:09 techy sendmail[5392]: o140e90B005392: from=, size=2434, class=0, nrcpts=1,
msgid=<201002040040.o140e9Mi005380@techy.bounceme.net>, proto=ESMTP, daemon=MTA,
relay=localhost [127.0.0.1]
Feb 4 06:10:09 techy sendmail[5380]: o140e9Mi005380: to=root, ctladdr=root (0/0), delay=00:00:00,
xdelay=00:00:00, mailer=relay, pri=32168, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent
(o140e90B005392 Message accepted for delivery)
Sample SendMail Log
Oct 20 03:45:50 hostname kernel: iptables denied: IN=eth0 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=1059 TOS=0x00 PREC=0x00
TTL=115 ID=31368 DF PROTO=TCP SPT=17992 DPT=80 WINDOW=16477 RES=0x00 ACK PSH URGP=0
Oct 20 03:46:02 hostname kernel: iptables denied: IN=eth0 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00
TTL=52 ID=763 DF PROTO=TCP SPT=20229 DPT=22 WINDOW=15588 RES=0x00 ACK URGP=0
Oct 20 03:46:14 hostname kernel: iptables denied: IN=eth0 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=324 TOS=0x00 PREC=0x00
TTL=49 ID=64245 PROTO=TCP SPT=47237 DPT=80 WINDOW=470 RES=0x00 ACK PSH URGP=0
Oct 20 03:46:26 hostname kernel: iptables denied: IN=eth0 OUT=
MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00
TTL=45 ID=2010 PROTO=TCP SPT=48322 DPT=80 WINDOW=380 RES=0x00 ACK URGP=0
Sample IPTable Log
Use RegEx to parse data
Source:
xkcd.com
Actual RegEx to parse Apace log
Source:
xkcd.com
How Logstash can help
 LogStash is a data pipeline that helps you process logs from
a variety of systems.
 Logstash allows you to parse data and converge on a
common format.
 Logstash provides a fast and convenient way to custom logic
for parsing these logs
 Support for multiple plugins
LogStash
Input Section Filter Section Output Section
• File
• Stdin
• Syslog
• SNMP Traps
• TCP/UDP
• and many more
• Grok
• Mutate
• Geoip
• Drop
• and many more
• Elastic Search
• File
• Email
• and many more
Logstash Config File
input {
...
}
filter {
...
}
output {
...
}
Logstash-forwarder
 A tool to collect logs locally
for processing elsewhere
 Secure, low latency, low
resource usage, and
reliable.
 Another option: Log-courier
Logstash-forwarder
Logstash
ELK Stack
 Elasticsearch, Logstash and Kibana
 End-to-end stack that delivers actionable insights in real time
from almost any type of structured and unstructured data
source
I. Logstash is used for cooking data
II. Elastic Search is used for storing this cooked data
III. Kibana gives shape to your data
 Each one is packed and fully self contained in a jar and easy
to use
What is ELK?
Shipper
Shipper
Shipper
What is ELK?
Shipper
Shipper
Shipper
Elastic Search
 Real time search and indexing tool
 Easy to setup; RESTful API
 Easy to cluster and scale
 High Availability
 Schema-Free
What is ELK?
Shipper
Shipper
Shipper
Kibana
 Seamless Integration with Elasticsearch
 Give Shape to Your Data
 Sophisticated Analytics
 Easy Setup
 Simple Data Export
What is ELK?
Shipper
Shipper
Shipper
Demo
How we used ELK stack in
our automation framework
Automation
Box 1
Automation
Box 2
Automation
Box n
Mail
Server
Mail
Server
Mail
Server
Logstash
Cook
Correlate
Elastic Search
Index
Store
Mail
Logs
Structured
data
Structured
data
World Beyond
 Analytics - count things and summarize your data.
 Crawling and Document Processing
1. For crawling, people are using both Scrapy and Nutch together
with Elasticsearch.
 Variety of companies are using ELK stack to pump their
search infrastructure.
1. Wikimedia
2. Empowers GitHub's 4 million members through providing
search across GitHub's 8 million+ code repositories.
Thank You

Search and analyze data in real time

  • 1.
    Search and AnalyzeData in Real Time Prashant Shewale and Rohit Kalsarpe
  • 2.
    Agenda 1 Problem invalidating logs 2 How Logstash can help 3 ELK Stack (Elastic Search, Logstash, Kibana) 4 Some hands on 5 How we used ELK stack in our automation framework 6 World beyond
  • 3.
    Problem in validatinglogs  Follow active log files.  Logs keep growing and are rotated.  Collating multiline logs in single event is difficult task.  We have different kinds of applications and hence different kinds of logs. And that have different formats.
  • 4.
    192.168.198.92 - -[22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-" 192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-" 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-" 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-" 192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-" 192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-" 192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ...)" "-" Sample Apache Log
  • 5.
    Feb 4 06:10:09techy sendmail[5392]: o140e90B005392: from=, size=2434, class=0, nrcpts=1, msgid=<201002040040.o140e9Mi005380@techy.bounceme.net>, proto=ESMTP, daemon=MTA, relay=localhost [127.0.0.1] Feb 4 06:10:09 techy sendmail[5380]: o140e9Mi005380: to=root, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=32168, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (o140e90B005392 Message accepted for delivery) Sample SendMail Log
  • 6.
    Oct 20 03:45:50hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=1059 TOS=0x00 PREC=0x00 TTL=115 ID=31368 DF PROTO=TCP SPT=17992 DPT=80 WINDOW=16477 RES=0x00 ACK PSH URGP=0 Oct 20 03:46:02 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=52 ID=763 DF PROTO=TCP SPT=20229 DPT=22 WINDOW=15588 RES=0x00 ACK URGP=0 Oct 20 03:46:14 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=324 TOS=0x00 PREC=0x00 TTL=49 ID=64245 PROTO=TCP SPT=47237 DPT=80 WINDOW=470 RES=0x00 ACK PSH URGP=0 Oct 20 03:46:26 hostname kernel: iptables denied: IN=eth0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx SRC=x.x.x.x DST=x.x.x.x LEN=52 TOS=0x00 PREC=0x00 TTL=45 ID=2010 PROTO=TCP SPT=48322 DPT=80 WINDOW=380 RES=0x00 ACK URGP=0 Sample IPTable Log
  • 7.
    Use RegEx toparse data Source: xkcd.com
  • 8.
    Actual RegEx toparse Apace log
  • 9.
  • 10.
    How Logstash canhelp  LogStash is a data pipeline that helps you process logs from a variety of systems.  Logstash allows you to parse data and converge on a common format.  Logstash provides a fast and convenient way to custom logic for parsing these logs  Support for multiple plugins
  • 11.
    LogStash Input Section FilterSection Output Section • File • Stdin • Syslog • SNMP Traps • TCP/UDP • and many more • Grok • Mutate • Geoip • Drop • and many more • Elastic Search • File • Email • and many more
  • 12.
    Logstash Config File input{ ... } filter { ... } output { ... }
  • 13.
    Logstash-forwarder  A toolto collect logs locally for processing elsewhere  Secure, low latency, low resource usage, and reliable.  Another option: Log-courier Logstash-forwarder Logstash
  • 14.
    ELK Stack  Elasticsearch,Logstash and Kibana  End-to-end stack that delivers actionable insights in real time from almost any type of structured and unstructured data source I. Logstash is used for cooking data II. Elastic Search is used for storing this cooked data III. Kibana gives shape to your data  Each one is packed and fully self contained in a jar and easy to use
  • 15.
  • 16.
  • 17.
    Elastic Search  Realtime search and indexing tool  Easy to setup; RESTful API  Easy to cluster and scale  High Availability  Schema-Free
  • 18.
  • 19.
    Kibana  Seamless Integrationwith Elasticsearch  Give Shape to Your Data  Sophisticated Analytics  Easy Setup  Simple Data Export
  • 20.
  • 21.
  • 22.
    How we usedELK stack in our automation framework
  • 23.
    Automation Box 1 Automation Box 2 Automation Boxn Mail Server Mail Server Mail Server Logstash Cook Correlate Elastic Search Index Store Mail Logs Structured data Structured data
  • 24.
    World Beyond  Analytics- count things and summarize your data.  Crawling and Document Processing 1. For crawling, people are using both Scrapy and Nutch together with Elasticsearch.  Variety of companies are using ELK stack to pump their search infrastructure. 1. Wikimedia 2. Empowers GitHub's 4 million members through providing search across GitHub's 8 million+ code repositories.
  • 25.