Logstash

31,141 views

Published on

how to collect nginxs' accesslogs into elasticsearch by logstash or message-passing.

Published in: Technology
6 Comments
71 Likes
Statistics
Notes
No Downloads
Views
Total views
31,141
On SlideShare
0
From Embeds
0
Number of Embeds
1,930
Actions
Shares
0
Downloads
608
Comments
6
Likes
71
Embeds 0
No embeds

No notes for slide

Logstash

  1. 1. Logstash::Intro @ARGV
  2. 2. Why use Logstash?• We already have splunk, syslog-ng, chukwa, graylog2, scribe, flume and so on.• But we want a free, light-weight and high- integrality frame for our log:• non free --> splunk• heavy java --> scribe,flume• lose data --> syslog• non flex --> nxlog
  3. 3. How logstash works?• Ah, just like others, logstash has input/filter/output plugins.• Attention: logstash process events, not (only) loglines!• "Inputs generate events, filters modify them, outputs ship them elsewhere." -- [the life of an event in logstash]• "events are passed from each phase using internal queues......Logstash sets each queue size to 20." -- [the life of an event in logstash]
  4. 4. Existing plugins
  5. 5. Most popular plugins(inputs)• amqp• eventlog• file• redis• stdin• syslog• ganglia
  6. 6. Most popular plugins(filters)• date• grep• grok• multiline
  7. 7. Most popular plugins(outputs)• amqp• elasticsearch• email• file• ganglia• graphite• mongodb• nagios• redis• stdout• zabbix• websocket
  8. 8. Usage in cluster - agent install• Only an all in one jar download in http://logstash.net/• All source include ruby and JRuby in http://github.com/logstash/• But we want a lightweight agent in cluster.
  9. 9. Usage in cluster - agent install• Edit Gemfile like: – source "http://ruby.taobao.org/" – gem "cabin", "0.4.1" – gem "bunny" – gem "uuidtools" – gem "filewatch", "0.3.3"• clone logstash/[bin|lib]: – git clone https://github.com/chenryn/logstash.git – git branch pure-ruby• Gem install – gem install bundler – bundle• Run – ruby logstash/bin/logstash -f logstash/etc/logstash-agent.conf
  10. 10. Usage in cluster - agent configuration – input { – file { – type => "nginx" – path => ["/data/nginx/logs/access.log" ] – } – } – output { – redis { – type => "nginx" – host => "5.5.5.5" – key => "nginx" – data_type => "channel" – } – }
  11. 11. Usage in cluster - server install• Server is another agent run some filter and storages.• Message queue(RabbitMQ is too heavy, Redis just enough): – yum install redis-server – service redis-server start• Storage: mongo/elasticsearch/Riak• Visualization: kibana/statsd/riemann/opentsdb• Run: – java -jar logstash-1.1.0-monolithic.jar agent -f logstash/etc/server.conf
  12. 12. Usage in cluster - server configuration – input { – redis { – type => "nginx" – host => "5.5.5.5" – data_type => "channel" – key => "nginx" – } – } – filter { – grok { – type => "nginx" – pattern => "%{NGINXACCESS}" – patterns_dir => ["/usr/local/logstash/etc/patterns"] – } – } – output { – elasticsearch { – cluster => logstash – host => 10.5.16.109 – port => 9300 – } – }
  13. 13. Usage in cluster - grok• jls-grok is a pattern tool wrote by JRuby• Lots of examples can be found at: https://github.com/logstash/logstash/tree/master/patterns• Here is my "nginx" patterns: – NGINXURI %{URIPATH}(?:%{URIPARAM})* – NGINXACCESS [%{HTTPDATE}] %{NUMBER:code:int} %{IP:client} % {HOSTNAME} %{WORD:method} %{NGINXURI:req} %{URIPROTO}/% {NUMBER:version} %{IP:upstream}(:%{POSINT:port})? % {NUMBER:upstime:float} %{NUMBER:reqtime:float} %{NUMBER:size:int} "(%{URIPROTO}://%{HOST:referer}%{NGINXURI:referer}|-)" % {QS:useragent} "(%{IP:x_forwarder_for}|-)"
  14. 14. Usage in cluster - elasticsearch• ElasticSearch is a production build-on Luence for the cloud compute.• more information at: – http://www.elasticsearch.cn/• Logstash has an embedded ElasticSearch already!• Attention: If you want to build your own distributed elasticsearch cluster, make sure the server version is equal to the client used by logstash!
  15. 15. Usage in cluster - elasticsearch• elasticsearch/config/elasticsearch.yml: – cluster.name: logstash – node.name: "ES109" – node.master: true – node.data: false – index.number_of_replicas: 0 – index.number_of_shards: 1 – path.data: /data1/ES/data – path.logs: /data1/ES/logs – network.host: 10.5.16.109 – transport.tcp.port: 9300 – transport.tcp.compress: true – gateway.type: local – discovery.zen.minimum_master_nodes: 1
  16. 16. Usage in cluster - elasticsearch• The embedded web front for ES is too simple, sometimes naïve~Try Kibana and EShead.• https://github.com/rashidkpc/Kibana• https://github.com/mobz/elasticsearch-head.git• Attention:there is a bug about ES ---- ifdown your external network before ES starting and ifup later.Otherwase your ruby client cannot connect ES server!
  17. 17. Try it please!• Ah, do not want install,install,install and install?• Here is a killer application: – sudo zypper install virtualbox rubygems – gem install vagrant – git clone https://github.com/mediatemple/log_wrangler.git – cd log_wrangler – PROVISION=1 vagrant up
  18. 18. Other output example• For monitor(example): – filter { – grep { – type => "linux-syslog" – match => [ "@message","(error|ERROR|CRITICAL)" ] – add_tag => [ "nagios-update" ] – add_field => [ "nagios_host", "%{@source_host}", "nagios_service", "the name of your nagios service check" ] – } – } – output{ – nagios { – commandfile => “/usr/local/nagios/var/rw/nagios.cmd" – tags => "nagios-update" – type => "linux-syslog" – } – }
  19. 19. Other output example• For metric – output { – statsd { – increment => "apache.response.%{response}" – count => [ "apache.bytes", "%{bytes}" ] – } – }
  20. 20. Advanced Questions• Is ruby1.8.7 stability enough?• Try Message::Passing module in CPAN, I love perl~• Is ElasticSearch high-speedy enough?• Try Sphinx, see report in ELSA project: – In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to slowest for indexing speeds (non-scientifically tested): 1. Tokyo Cabinet 2. MongoDB 3. TokuDB MySQL plugin 4. Elastic Search (Lucene) 5. Splunk 6. HBase 7. CouchDB 8. MySQL Fulltext• http://code.google.com/p/enterprise-log-search-and-archive/wiki/Documentation#Why_ELSA?
  21. 21. Advanced Testing• How much event/sec can ElasticSearch hold?• - Logstash::Output::Elasticsearch(HTTP) can only indexes 200+ msg/sec for one thread.• - Try _bulk API by myself using perl ElasticSearch::Transport::HTTPLite module.• -- speed testing result is 2500+ msg/sec• -- tesing record see: http://chenlinux.com/2012/09/16/elasticsearch-bulk-index-speed-testing/ WHY?!
  22. 22. Maybe…• Logstash use an experimental module, we can see the Logstash::Output::ElasticsearchHTTP use ftw as http client but it cannot hold bulk size larger than 200!!• So we all suggest to use multi-output block in agent.conf.
  23. 23. Advanced ES Settings(1)--problems• Kibana can search data by using facets APIs. But when you indexes URLs, they would be auto-splitted by ‘/’~~• And search facets at ip from 1000w msgs use 0.1s,but at urls use…ah, timeout!• When you check your indices size, you will find that (indices size/indices count) : message length ~~ 10:1 !!
  24. 24. Advanced ES Settings(2)--solution• Setting ElasticSearch default _mapping template!• In fact, ES “store” index data, and then “store” store data… Yes! If you don’t set “store” : “no”, all the data reduplicate stored.• And ES has many analyze plugins.They automate split words by whitespaces, path hierachy, keword etc.• So, set “index”:”not_analyzed” and facets 100k+ URLs can be finished in 1s.
  25. 25. Advanced ES Settings(2)--solution• Optimze:• Call _optimze API everyday may decrease some indexed size~• You can found those solutions in:• https://github.com/logstash/logstash/wiki/Elasticsearch-Storage-Optimization• https://github.com/logstash/logstash/wiki/Elasticsearch----Using-index-templates-&-dynamic-
  26. 26. Advanced Input -- question• Now we know how to disable _all field, but there are still duplicated fields: @fields and @message!• Logstash search ES default in @message field but logstash::Filter::Grok default capture variables into @fields just from @message!• How to solve?
  27. 27. Advanced Input -- solution• We know some other systems like Message::Passing have encode/decode in addition to input/filter/output.• In fact logstash has them too~but rename them as ‘format’.• So we can define the message format ourself, just using logformat in nginx.conf.• (example as follow)
  28. 28. Advanced Input -- nginx.conf – logformat json {"@timestamp":"$time_iso8601", "@source":"$server_addr",‘ "@fields":{‘ "client":"$remote_addr", "size":$body_bytes_sent, "responsetime":$request_time, "upstreamtime": $upstream_response_time, "oh":"$upstream_addr", "domain":"$host", "url":"$uri", "status":"$status"}}; – access_log /data/nginx/logs/access.json json;• See http://cookbook.logstash.net/recipes/apache-json-logs/
  29. 29. Advanced Input -- json_event• Now define input block with format: – input { – stdin { – type => "nginx“ – format => "json_event“ – } – }• And start in command line: – tail -F /data/nginx/logs/access.json – | sed s/upstreamtime":-/upstreamtime":0/ – | /usr/local/logstash/bin/logstash -f /usr/local/logstash/etc/agent.conf &• Attention: Upstreamtime may be “-” if status is 400.
  30. 30. Advanced Web GUI• Write your own website using ElasticSearch RESTful API to search as follows: – curl -XPOST http://es.domain.com:9200/logstash-2012.09.18/nginx/_search?pretty=1 –d ‘ { “query”: { “range”: { “from”: “now-1h”, “to”: “now” } }, “facets”: { “curl_test”: { “date_histogram”: { “key_field”: “@timestamp”, “value_field”: “url”, “interval “: “5m” } } }, “size”: 0 } ’
  31. 31. Additional Message::Passing demo• I do write a demo using Message::Passing, Regexp::Log, ElasticSearch and so on perl modules working similar to logstash usage showed here.• See: – http://chenlinux.com/2012/09/16/message-passing-agent/ – http://chenlinux.com/2012/09/16/regexp-log-demo-for-nginx/ – http://chenlinux.com/2012/09/16/message-passing-filter-demo/
  32. 32. Reference• http://logstash.net/docs/1.1.1/tutorials/metrics-from-logs• http://logwrangler.mtcode.com/• https://www.virtualbox.org/wiki/Linux_Downloads• http://vagrantup.com/v1/docs/getting-started/index.html• http://www.elasticsearch.cn• http://search.cpan.org/~bobtfish/Message-Passing- 0.010/lib/Message/Passing.pm

×