Solr for Indexing and Searching Logs

35,267 views
35,917 views

Published on

How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.

Published in: Technology, Design
1 Comment
11 Likes
Statistics
Notes
No Downloads
Views
Total views
35,267
On SlideShare
0
From Embeds
0
Number of Embeds
25,516
Actions
Shares
0
Downloads
180
Comments
1
Likes
11
Embeds 0
No embeds

No notes for slide

Solr for Indexing and Searching Logs

  1. 1. Using Solr to Search and Analyze Logs Radu Gheorghe @sematext @radu0gheorghe
  2. 2. Logsene Kibana Elasticsearch API Logstash syslog receiver syslogd
  3. 3. What about ?
  4. 4. defining and handling logs in general 4 sets of tools to send logs to Performance tuning and SolrCloud
  5. 5. Defining and Handling Logs (story time!) syslog syslog ? syslog syslog
  6. 6. Requirements 1) What’s wrong? ( for debugging) http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
  7. 7. Problem looooots of messages coming in http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
  8. 8. Solved with no indexing BUT
  9. 9. Elasticsearch
  10. 10. Requirements 1) What’s wrong? ✓ 2) What will go wrong? (stats)
  11. 11. Parsing Raw Logs still slow BUT user format changes item time mickey mouse 10
  12. 12. Parsing Raw Logs still slow BUT format changes add error code mickey mouse 0 10
  13. 13. Facets. Logging in JSON 2013-11-06… mickey mouse { "date": "2013-11-06", "message": "mickey mouse" }
  14. 14. Facets. Logging in JSON 2013-11-06… mickey mouse 2013-11-06… @cee:{"user": "mickey"} { { "date": "2013-11-06", "message": "mickey mouse" } "date": "2013-11-06", "user": "mickey" }
  15. 15. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ 3) Handle logs like production data ✓
  16. 16. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ What is a log? 3) Handle logs like production data ✓ How to handle logs?
  17. 17. 4 Ways of Sending Logs to Solr logger Logstash files
  18. 18. Schemaless % cd solr-4.5.1/example/ % mv solr solr.bak % cp -R example-schemaless/solr/ .
  19. 19. Automatic ID generation solrconfig.xml <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> …….. <processor class="solr.UUIDUpdateProcessorFactory"> <str name="fieldName">id</str> </processor> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
  20. 20. mmjsonparse /dev/log logger omprog + script
  21. 21. /dev/log -> parse -> format -> send to Solr % logger '@cee: {"hello": "world"}' rsyslog.conf module(load="imuxsock") # version 7+
  22. 22. /dev/log -> parse -> format -> send to Solr ... module(load="mmjsonparse") action(type="mmjsonparse")
  23. 23. /dev/log -> parse -> format -> send to Solr ... template(name="CEE" type="list") { property(name="$!all-json") constant(value="n") }
  24. 24. /dev/log -> parse -> format -> send to Solr ... action(type="mmjsonparse") template(name="CEE" … module(load="omprog") if $parsesuccess == "OK" then action(type="omprog" binary="/opt/json-to-solr.py" template="CEE")
  25. 25. /dev/log -> parse -> format -> send to Solr import json, pysolr, sys solr = pysolr.Solr('http://localhost:8983/solr/') while True: line = sys.stdin.readline() doc = json.loads(line) solr.add([doc])
  26. 26. Morphline Solr Sink Avro
  27. 27. Avro -> buffer -> parse -> send to Solr https://github.com/mpercy/flume-log4j-example flume.conf agent.sources = avroSrc agent.sources.avroSrc.type = avro agent.sources.avroSrc.bind = 0.0.0.0 agent.sources.avroSrc.port = 41414
  28. 28. Avro -> buffer -> parse -> send to Solr flume.conf agent.channels = solrMemoryChannel agent.channels.solrMemoryChannel.type = memory agent.sources.avroSrc.channels = solrMemoryChannel
  29. 29. Avro -> buffer -> parse -> send to Solr flume.conf agent.sinks = solrSink agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.morphlineFile = conf/morphline.conf agent.sinks.solrSink.channel = solrMemoryChannel
  30. 30. Avro -> buffer -> parse -> send to Solr morphline.conf ... commands : [ { readLine { charset : UTF-8 }} { grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{INT:pid} %{DATA:message}""" ... https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
  31. 31. Avro -> buffer -> parse -> send to Solr morphline.conf SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/" } ... commands : [ ... { loadSolr { solrLocator : ${SOLR_LOCATOR} ...
  32. 32. fluent-logger fluent-plugin-solr
  33. 33. fluent-logger -> fluentd -> fluent-plugin-solr % pip install fluent-logger from fluent import sender,event sender.setup('solr.test') event.Event('forward', {'hello': 'world'})
  34. 34. fluent-logger -> fluentd -> fluent-plugin-solr <source> type forward </source> <match solr.**> type solr host localhost port 8983 core collection1 </match>
  35. 35. fluent-logger -> fluentd -> fluent-plugin-solr % gem install fluent-plugin-solr https://github.com/btigit/fluent-plugin-solr out_solr.rb doc = Solr::Document.new(:hello => record["hello"])
  36. 36. grok filter file input file solr_http output Logstash
  37. 37. file input -> grok filter -> solr_http output % echo '2 world' >> /tmp/testlog logstash.conf: input { file { path => "/tmp/testlog" } }
  38. 38. file input -> grok filter -> solr_http output logstash.conf: filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] } } {"pid": "2", "hello":"world"}
  39. 39. file input -> grok filter -> solr_http output logstash.conf: output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" } }
  40. 40. Fast and Cloud
  41. 41. “It Depends” load test monitor: SPM 20% off: LR2013SPM20 http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
  42. 42. |>>>>|Single Core: # of docs/update http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
  43. 43. |>>>>|Single Core: Commits <autoSoftCommit> <maxTime>... <autoCommit> <openSearcher>false <maxTime>??? <ramBufferSizeMB>??? http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpg http://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
  44. 44. |>>>>|Single Core: Size and Merges omitNorms="true" omitTermFreqAndPositions="true" <mergeFactor>?? http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.png http://mergewords.com/gfx/logo-big.png
  45. 45. |>>>>|Single Core: Caches facets <fieldValueCache ... size="???" autowarmCount="0" changing data to sort&facet docValues="true" http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.png http://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
  46. 46. SolrCloud: ZooKeeper bin/zkServer.sh start OR java -DzkRun … -jar start.jar http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  47. 47. SolrCloud: ZooKeeper zkcli.sh -cmd upconfig -zkhost SERVER:2181 -confdir solr/collection1/conf/ -confname start -Dbootstrap_confdir=solr/collection1/conf Dcollection.configName=start http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  48. 48. SolrCloud: Start Nodes java -DzkHost=SERVER:2181 -jar start.jar
  49. 49. Timed Collections optimize 04 Nov 05 Nov search latest 06 Nov search all 07 Nov index
  50. 50. Collections API action=DELETE &name=05Nov 05 Nov 06 Nov 07 Nov 08 Nov action=CREATE &name=08Nov &numShards=4
  51. 51. Aliases. Optimize 07Nov/update?optimize=true 05 Nov 06 Nov 07 Nov action=CREATEALIAS &name=LATEST &collection=08Nov 08 Nov action=CREATEALIAS &name=ALL &collection=06Nov,07Nov,08Nov
  52. 52. logs = production data
  53. 53. logs = production data Logstash
  54. 54. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  55. 55. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  56. 56. commits docs/update mergeFactor logs = production data docValues omit* caches time Logstash Collections API aliases optimize
  57. 57. We’re hiring! sematext.com/about/jobs
  58. 58. Thank you! radu.gheorghe@sematext.com @radu0gheorghe @sematext And @ our booth :)

×