2. Why use Logstash?
• We already have splunk, syslog-ng, chukwa,
graylog2, scribe, flume and so on.
• But we want a free, light-weight and high-
integrality frame for our log:
• non free --> splunk
• heavy java --> scribe,flume
• lose data --> syslog
• non flex --> nxlog
3. How logstash works?
• Ah, just like others, logstash has
input/filter/output plugins.
• Attention: logstash process events, not (only)
loglines!
• "Inputs generate events, filters modify them,
outputs ship them elsewhere." -- [the life of an
event in logstash]
• "events are passed from each phase using
internal queues......Logstash sets each queue
size to 20." -- [the life of an event in logstash]
8. Usage in cluster - agent install
• Only an 'all in one' jar download in
http://logstash.net/
• All source include ruby and JRuby in
http://github.com/logstash/
• But we want a lightweight agent in cluster.
11. Usage in cluster - server install
• Server is another agent run some filter and
storages.
• Message queue(RabbitMQ is too heavy, Redis
just enough):
– yum install redis-server
– service redis-server start
• Storage: mongo/elasticsearch/Riak
• Visualization: kibana/statsd/riemann/opentsdb
• Run:
– java -jar logstash-1.1.0-monolithic.jar agent -f logstash/etc/server.conf
13. Usage in cluster - grok
• jls-grok is a pattern tool wrote by JRuby
• Lots of examples can be found at:
https://github.com/logstash/logstash/tree/master/patterns
• Here is my "nginx" patterns:
– NGINXURI %{URIPATH}(?:%{URIPARAM})*
– NGINXACCESS [%{HTTPDATE}] %{NUMBER:code:int} %{IP:client} %
{HOSTNAME} %{WORD:method} %{NGINXURI:req} %{URIPROTO}/%
{NUMBER:version} %{IP:upstream}(:%{POSINT:port})? %
{NUMBER:upstime:float} %{NUMBER:reqtime:float} %{NUMBER:size:int}
"(%{URIPROTO}://%{HOST:referer}%{NGINXURI:referer}|-)" %
{QS:useragent} "(%{IP:x_forwarder_for}|-)"
14. Usage in cluster - elasticsearch
• ElasticSearch is a production build-on Luence
for the cloud compute.
• more information at:
– http://www.elasticsearch.cn/
• Logstash has an embedded ElasticSearch
already!
• Attention: If you want to build your own
distributed elasticsearch cluster, make sure the
server version is equal to the client used by
logstash!
16. Usage in cluster - elasticsearch
• The embedded web front for ES is too simple,
sometimes naïve~Try Kibana and EShead.
• https://github.com/rashidkpc/Kibana
• https://github.com/mobz/elasticsearch-head.git
• Attention:there is a bug about ES ---- ifdown
your external network before ES starting and
ifup later.Otherwase your ruby client cannot
connect ES server!
17. Try it please!
• Ah, do not want install,install,install and install?
• Here is a killer application:
– sudo zypper install virtualbox rubygems
– gem install vagrant
– git clone https://github.com/mediatemple/log_wrangler.git
– cd log_wrangler
– PROVISION=1 vagrant up
18. Other output example
• For monitor(example):
– filter {
– grep {
– type => "linux-syslog"
– match => [ "@message","(error|ERROR|CRITICAL)" ]
– add_tag => [ "nagios-update" ]
– add_field => [ "nagios_host", "%{@source_host}", "nagios_service", "the name of your
nagios service check" ]
– }
– }
– output{
– nagios {
– commandfile => “/usr/local/nagios/var/rw/nagios.cmd"
– tags => "nagios-update"
– type => "linux-syslog"
– }
– }
19. Other output example
• For metric
– output {
– statsd {
– increment => "apache.response.%{response}"
– count => [ "apache.bytes", "%{bytes}" ]
– }
– }
20. Advanced Questions
• Is ruby1.8.7 stability enough?
• Try Message::Passing module in CPAN, I love perl~
• Is ElasticSearch high-speedy enough?
• Try Sphinx, see report in ELSA project:
– In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to
slowest for indexing speeds (non-scientifically tested):
1. Tokyo Cabinet
2. MongoDB
3. TokuDB MySQL plugin
4. Elastic Search (Lucene)
5. Splunk
6. HBase
7. CouchDB
8. MySQL Fulltext
• http://code.google.com/p/enterprise-log-search-and-archive/wiki/Documentation#Why_ELSA?
21. Advanced Testing
• How much event/sec can ElasticSearch hold?
• - Logstash::Output::Elasticsearch(HTTP) can only indexes 200+ msg/sec for
one thread.
• - Try _bulk API by myself using perl ElasticSearch::Transport::HTTPLite
module.
• -- speed testing result is 2500+ msg/sec
• -- tesing record see:
http://chenlinux.com/2012/09/16/elasticsearch-bulk-index-speed-testing/
WHY?!
22. Maybe…
• Logstash use an experimental module, we can
see the Logstash::Output::ElasticsearchHTTP
use ftw as http client but it cannot hold bulk size
larger than 200!!
• So we all suggest to use multi-output block in
agent.conf.
23. Advanced ES Settings(1)--problems
• Kibana can search data by using facets APIs.
But when you indexes URLs, they would be
auto-splitted by ‘/’~~
• And search facets at ip from 1000w msgs use
0.1s,but at urls use…ah, timeout!
• When you check your indices size, you will find
that (indices size/indices count) : message
length ~~ 10:1 !!
24. Advanced ES Settings(2)--solution
• Setting ElasticSearch default _mapping
template!
• In fact, ES “store” index data, and then “store”
store data… Yes! If you don’t set “store” : “no”,
all the data reduplicate stored.
• And ES has many analyze plugins.They
automate split words by whitespaces, path
hierachy, keword etc.
• So, set “index”:”not_analyzed” and facets 100k+
URLs can be finished in 1s.
25. Advanced ES Settings(2)--solution
• Optimze:
• Call _optimze API everyday may decrease some
indexed size~
• You can found those solutions in:
• https://github.com/logstash/logstash/wiki/Elasticsearch-Storage-Optimization
• https://github.com/logstash/logstash/wiki/Elasticsearch----Using-index-templates-&-dynamic-
26. Advanced Input -- question
• Now we know how to disable _all field, but there
are still duplicated fields: @fields and
@message!
• Logstash search ES default in @message field
but logstash::Filter::Grok default capture
variables into @fields just from @message!
• How to solve?
27. Advanced Input -- solution
• We know some other systems like
Message::Passing have encode/decode in
addition to input/filter/output.
• In fact logstash has them too~but rename them
as ‘format’.
• So we can define the message format ourself,
just using logformat in nginx.conf.
• (example as follow)
29. Advanced Input -- json_event
• Now define input block with format:
– input {
– stdin {
– type => "nginx“
– format => "json_event“
– }
– }
• And start in command line:
– tail -F /data/nginx/logs/access.json
– | sed 's/upstreamtime":-/upstreamtime":0/'
– | /usr/local/logstash/bin/logstash -f /usr/local/logstash/etc/agent.conf &
• Attention: Upstreamtime may be “-” if status is 400.
30. Advanced Web GUI
• Write your own website using ElasticSearch
RESTful API to search as follows:
– curl -XPOST http://es.domain.com:9200/logstash-2012.09.18/nginx/_search?pretty=1 –d ‘
{
“query”: {
“range”: {
“from”: “now-1h”,
“to”: “now”
}
},
“facets”: {
“curl_test”: {
“date_histogram”: {
“key_field”: “@timestamp”,
“value_field”: “url”,
“interval “: “5m”
}
}
},
“size”: 0
}
’
31. Additional Message::Passing demo
• I do write a demo using Message::Passing,
Regexp::Log, ElasticSearch and so on perl
modules working similar to logstash usage
showed here.
• See:
– http://chenlinux.com/2012/09/16/message-passing-agent/
– http://chenlinux.com/2012/09/16/regexp-log-demo-for-nginx/
– http://chenlinux.com/2012/09/16/message-passing-filter-demo/