Solr for Indexing and Searching Logs
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Solr for Indexing and Searching Logs

on

  • 18,208 views

How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.

How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.

Statistics

Views

Total Views
18,208
Views on SlideShare
4,084
Embed Views
14,124

Actions

Likes
7
Downloads
91
Comments
0

30 Embeds 14,124

http://blog.sematext.com 13701
http://cogniteev.tumblr.com 131
http://feedly.com 85
http://cloud.feedly.com 73
http://newsblur.com 24
http://www.newsblur.com 18
http://www.feedspot.com 16
http://feeds.feedburner.com 13
http://www.google.ca 10
http://translate.googleusercontent.com 9
http://www.tuicool.com 6
http://inoreader.com 5
http://www.google.co.il 4
https://assets.txmblr.com 3
http://digg.com 3
http://sematext.wordpress.com 2
http://www.google.com.hk 2
http://feedproxy.google.com 2
http://reader.aol.com 2
http://sematext.com 2
http://www.linkedin.com 2
http://blogsematext.newvirtuallife.com 2
https://www.google.com 2
https://reader.aol.com 1
http://www.google.co.uk 1
http://www.google.fr 1
https://www.commafeed.com 1
http://larrysworld.cannell.org 1
http://smashingreader.com 1
http://www.google.de 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Solr for Indexing and Searching Logs Presentation Transcript

  • 1. Using Solr to Search and Analyze Logs Radu Gheorghe @sematext @radu0gheorghe
  • 2. Logsene Kibana Elasticsearch API Logstash syslog receiver syslogd
  • 3. What about ?
  • 4. defining and handling logs in general 4 sets of tools to send logs to Performance tuning and SolrCloud
  • 5. Defining and Handling Logs (story time!) syslog syslog ? syslog syslog
  • 6. Requirements 1) What’s wrong? ( for debugging) http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
  • 7. Problem looooots of messages coming in http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
  • 8. Solved with no indexing BUT
  • 9. Elasticsearch
  • 10. Requirements 1) What’s wrong? ✓ 2) What will go wrong? (stats)
  • 11. Parsing Raw Logs still slow BUT user format changes item time mickey mouse 10
  • 12. Parsing Raw Logs still slow BUT format changes add error code mickey mouse 0 10
  • 13. Facets. Logging in JSON 2013-11-06… mickey mouse { "date": "2013-11-06", "message": "mickey mouse" }
  • 14. Facets. Logging in JSON 2013-11-06… mickey mouse 2013-11-06… @cee:{"user": "mickey"} { { "date": "2013-11-06", "message": "mickey mouse" } "date": "2013-11-06", "user": "mickey" }
  • 15. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ 3) Handle logs like production data ✓
  • 16. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ What is a log? 3) Handle logs like production data ✓ How to handle logs?
  • 17. 4 Ways of Sending Logs to Solr logger Logstash files
  • 18. Schemaless % cd solr-4.5.1/example/ % mv solr solr.bak % cp -R example-schemaless/solr/ .
  • 19. Automatic ID generation solrconfig.xml <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> …….. <processor class="solr.UUIDUpdateProcessorFactory"> <str name="fieldName">id</str> </processor> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
  • 20. mmjsonparse /dev/log logger omprog + script
  • 21. /dev/log -> parse -> format -> send to Solr % logger '@cee: {"hello": "world"}' rsyslog.conf module(load="imuxsock") # version 7+
  • 22. /dev/log -> parse -> format -> send to Solr ... module(load="mmjsonparse") action(type="mmjsonparse")
  • 23. /dev/log -> parse -> format -> send to Solr ... template(name="CEE" type="list") { property(name="$!all-json") constant(value="n") }
  • 24. /dev/log -> parse -> format -> send to Solr ... action(type="mmjsonparse") template(name="CEE" … module(load="omprog") if $parsesuccess == "OK" then action(type="omprog" binary="/opt/json-to-solr.py" template="CEE")
  • 25. /dev/log -> parse -> format -> send to Solr import json, pysolr, sys solr = pysolr.Solr('http://localhost:8983/solr/') while True: line = sys.stdin.readline() doc = json.loads(line) solr.add([doc])
  • 26. Morphline Solr Sink Avro
  • 27. Avro -> buffer -> parse -> send to Solr https://github.com/mpercy/flume-log4j-example flume.conf agent.sources = avroSrc agent.sources.avroSrc.type = avro agent.sources.avroSrc.bind = 0.0.0.0 agent.sources.avroSrc.port = 41414
  • 28. Avro -> buffer -> parse -> send to Solr flume.conf agent.channels = solrMemoryChannel agent.channels.solrMemoryChannel.type = memory agent.sources.avroSrc.channels = solrMemoryChannel
  • 29. Avro -> buffer -> parse -> send to Solr flume.conf agent.sinks = solrSink agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.morphlineFile = conf/morphline.conf agent.sinks.solrSink.channel = solrMemoryChannel
  • 30. Avro -> buffer -> parse -> send to Solr morphline.conf ... commands : [ { readLine { charset : UTF-8 }} { grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{INT:pid} %{DATA:message}""" ... https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
  • 31. Avro -> buffer -> parse -> send to Solr morphline.conf SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/" } ... commands : [ ... { loadSolr { solrLocator : ${SOLR_LOCATOR} ...
  • 32. fluent-logger fluent-plugin-solr
  • 33. fluent-logger -> fluentd -> fluent-plugin-solr % pip install fluent-logger from fluent import sender,event sender.setup('solr.test') event.Event('forward', {'hello': 'world'})
  • 34. fluent-logger -> fluentd -> fluent-plugin-solr <source> type forward </source> <match solr.**> type solr host localhost port 8983 core collection1 </match>
  • 35. fluent-logger -> fluentd -> fluent-plugin-solr % gem install fluent-plugin-solr https://github.com/btigit/fluent-plugin-solr out_solr.rb doc = Solr::Document.new(:hello => record["hello"])
  • 36. grok filter file input file solr_http output Logstash
  • 37. file input -> grok filter -> solr_http output % echo '2 world' >> /tmp/testlog logstash.conf: input { file { path => "/tmp/testlog" } }
  • 38. file input -> grok filter -> solr_http output logstash.conf: filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] } } {"pid": "2", "hello":"world"}
  • 39. file input -> grok filter -> solr_http output logstash.conf: output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" } }
  • 40. Fast and Cloud
  • 41. “It Depends” load test monitor: SPM 20% off: LR2013SPM20 http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
  • 42. |>>>>|Single Core: # of docs/update http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
  • 43. |>>>>|Single Core: Commits <autoSoftCommit> <maxTime>... <autoCommit> <openSearcher>false <maxTime>??? <ramBufferSizeMB>??? http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpg http://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
  • 44. |>>>>|Single Core: Size and Merges omitNorms="true" omitTermFreqAndPositions="true" <mergeFactor>?? http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.png http://mergewords.com/gfx/logo-big.png
  • 45. |>>>>|Single Core: Caches facets <fieldValueCache ... size="???" autowarmCount="0" changing data to sort&facet docValues="true" http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.png http://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
  • 46. SolrCloud: ZooKeeper bin/zkServer.sh start OR java -DzkRun … -jar start.jar http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  • 47. SolrCloud: ZooKeeper zkcli.sh -cmd upconfig -zkhost SERVER:2181 -confdir solr/collection1/conf/ -confname start -Dbootstrap_confdir=solr/collection1/conf Dcollection.configName=start http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  • 48. SolrCloud: Start Nodes java -DzkHost=SERVER:2181 -jar start.jar
  • 49. Timed Collections optimize 04 Nov 05 Nov search latest 06 Nov search all 07 Nov index
  • 50. Collections API action=DELETE &name=05Nov 05 Nov 06 Nov 07 Nov 08 Nov action=CREATE &name=08Nov &numShards=4
  • 51. Aliases. Optimize 07Nov/update?optimize=true 05 Nov 06 Nov 07 Nov action=CREATEALIAS &name=LATEST &collection=08Nov 08 Nov action=CREATEALIAS &name=ALL &collection=06Nov,07Nov,08Nov
  • 52. logs = production data
  • 53. logs = production data Logstash
  • 54. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  • 55. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  • 56. commits docs/update mergeFactor logs = production data docValues omit* caches time Logstash Collections API aliases optimize
  • 57. We’re hiring! sematext.com/about/jobs
  • 58. Thank you! radu.gheorghe@sematext.com @radu0gheorghe @sematext And @ our booth :)