Using Solr to Search and
Analyze Logs

Radu Gheorghe
@sematext

@radu0gheorghe
Logsene
Kibana
Elasticsearch API
Logstash
syslog
receiver

syslogd
What about

?
defining and handling logs in general

4 sets of tools to send logs to

Performance tuning and SolrCloud
Defining and Handling Logs
(story time!)
syslog

syslog

?
syslog

syslog
Requirements
1) What’s wrong?
(

for debugging)

http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
Problem

looooots of messages coming in

http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
Solved with no indexing

BUT
Elasticsearch
Requirements
1) What’s wrong?

✓

2) What will go wrong?
(stats)
Parsing Raw Logs
still slow

BUT

user

format changes

item

time

mickey mouse 10
Parsing Raw Logs
still slow

BUT

format changes

add error code

mickey mouse 0 10
Facets. Logging in JSON
2013-11-06… mickey mouse

{
"date": "2013-11-06",
"message": "mickey mouse"
}
Facets. Logging in JSON
2013-11-06… mickey mouse

2013-11-06… @cee:{"user": "mickey"}

{

{
"date": "2013-11-06",
"message...
Requirements
1) What’s wrong?

✓

2) What will go wrong? ✓
3) Handle logs like production data ✓
Requirements
1) What’s wrong?

✓

2) What will go wrong? ✓
What is a log?

3) Handle logs like production data ✓
How to ha...
4 Ways of Sending Logs to Solr
logger

Logstash

files
Schemaless

% cd solr-4.5.1/example/
% mv solr solr.bak

% cp -R example-schemaless/solr/ .
Automatic ID generation
solrconfig.xml
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
……..

<proces...
mmjsonparse
/dev/log

logger

omprog + script
/dev/log -> parse -> format -> send to Solr
% logger '@cee: {"hello": "world"}'

rsyslog.conf
module(load="imuxsock") # ve...
/dev/log -> parse -> format -> send to Solr
...

module(load="mmjsonparse")
action(type="mmjsonparse")
/dev/log -> parse -> format -> send to Solr
...
template(name="CEE"
type="list") {
property(name="$!all-json")
constant(va...
/dev/log -> parse -> format -> send to Solr
...
action(type="mmjsonparse")
template(name="CEE"
…

module(load="omprog")
if...
/dev/log -> parse -> format -> send to Solr
import json, pysolr, sys
solr = pysolr.Solr('http://localhost:8983/solr/')
whi...
Morphline
Solr Sink
Avro
Avro -> buffer -> parse -> send to Solr
https://github.com/mpercy/flume-log4j-example
flume.conf
agent.sources = avroSrc
a...
Avro -> buffer -> parse -> send to Solr

flume.conf
agent.channels = solrMemoryChannel
agent.channels.solrMemoryChannel.ty...
Avro -> buffer -> parse -> send to Solr
flume.conf
agent.sinks = solrSink
agent.sinks.solrSink.type = org.apache.flume.sin...
Avro -> buffer -> parse -> send to Solr
morphline.conf
...
commands : [
{ readLine { charset : UTF-8 }}
{ grok {
dictionar...
Avro -> buffer -> parse -> send to Solr
morphline.conf
SOLR_LOCATOR : {
collection : collection1
#zkHost : "127.0.0.1:2181...
fluent-logger

fluent-plugin-solr
fluent-logger -> fluentd -> fluent-plugin-solr
% pip install fluent-logger

from fluent import sender,event
sender.setup('...
fluent-logger -> fluentd -> fluent-plugin-solr
<source>
type forward
</source>
<match solr.**>
type solr
host localhost
po...
fluent-logger -> fluentd -> fluent-plugin-solr
% gem install fluent-plugin-solr

https://github.com/btigit/fluent-plugin-s...
grok filter
file input

file

solr_http output

Logstash
file input -> grok filter -> solr_http output
% echo '2 world' >> /tmp/testlog

logstash.conf:
input {
file { path => "/tm...
file input -> grok filter -> solr_http output
logstash.conf:
filter {
grok {
match => ["message", "%{NUMBER:pid} %{GREEDYD...
file input -> grok filter -> solr_http output
logstash.conf:
output {
solr_http { # master or v1.2.3+
solr_url => "http://...
Fast and Cloud
“It Depends”

load test

monitor: SPM
20% off: LR2013SPM20
http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
|>>>>|Single Core: # of docs/update

http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
|>>>>|Single Core: Commits

<autoSoftCommit>
<maxTime>...

<autoCommit>
<openSearcher>false
<maxTime>???
<ramBufferSizeMB>...
|>>>>|Single Core: Size and Merges

omitNorms="true"
omitTermFreqAndPositions="true"

<mergeFactor>??

http://sweetclipart...
|>>>>|Single Core: Caches

facets

<fieldValueCache ...
size="???"
autowarmCount="0"

changing data
to sort&facet

docValu...
SolrCloud: ZooKeeper

bin/zkServer.sh start
OR
java -DzkRun … -jar start.jar
http://www.clker.com/cliparts/c/a/8/d/1331060...
SolrCloud: ZooKeeper

zkcli.sh -cmd upconfig 
-zkhost SERVER:2181 
-confdir solr/collection1/conf/ 
-confname start
-Dboot...
SolrCloud: Start Nodes

java -DzkHost=SERVER:2181 -jar start.jar
Timed Collections
optimize

04
Nov

05
Nov

search latest

06
Nov
search all

07
Nov

index
Collections API

action=DELETE
&name=05Nov

05
Nov

06
Nov

07
Nov

08
Nov

action=CREATE
&name=08Nov
&numShards=4
Aliases. Optimize

07Nov/update?optimize=true

05
Nov

06
Nov

07
Nov

action=CREATEALIAS
&name=LATEST
&collection=08Nov

...
logs =
production
data
logs =
production
data

Logstash
commits
docs/update
mergeFactor
logs =
production
data

Logstash

docValues
caches

omit*
commits
docs/update
mergeFactor
logs =
production
data

Logstash

docValues
caches

omit*
commits
docs/update
mergeFactor
logs =
production
data

docValues

omit*

caches

time

Logstash

Collections API
aliases
...
We’re hiring!

sematext.com/about/jobs
Thank you!

radu.gheorghe@sematext.com
@radu0gheorghe

@sematext

And @ our booth :)
Solr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
Upcoming SlideShare
Loading in...5
×

Solr for Indexing and Searching Logs

27,877

Published on

How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.

Published in: Technology, Design

Solr for Indexing and Searching Logs

  1. 1. Using Solr to Search and Analyze Logs Radu Gheorghe @sematext @radu0gheorghe
  2. 2. Logsene Kibana Elasticsearch API Logstash syslog receiver syslogd
  3. 3. What about ?
  4. 4. defining and handling logs in general 4 sets of tools to send logs to Performance tuning and SolrCloud
  5. 5. Defining and Handling Logs (story time!) syslog syslog ? syslog syslog
  6. 6. Requirements 1) What’s wrong? ( for debugging) http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png
  7. 7. Problem looooots of messages coming in http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346
  8. 8. Solved with no indexing BUT
  9. 9. Elasticsearch
  10. 10. Requirements 1) What’s wrong? ✓ 2) What will go wrong? (stats)
  11. 11. Parsing Raw Logs still slow BUT user format changes item time mickey mouse 10
  12. 12. Parsing Raw Logs still slow BUT format changes add error code mickey mouse 0 10
  13. 13. Facets. Logging in JSON 2013-11-06… mickey mouse { "date": "2013-11-06", "message": "mickey mouse" }
  14. 14. Facets. Logging in JSON 2013-11-06… mickey mouse 2013-11-06… @cee:{"user": "mickey"} { { "date": "2013-11-06", "message": "mickey mouse" } "date": "2013-11-06", "user": "mickey" }
  15. 15. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ 3) Handle logs like production data ✓
  16. 16. Requirements 1) What’s wrong? ✓ 2) What will go wrong? ✓ What is a log? 3) Handle logs like production data ✓ How to handle logs?
  17. 17. 4 Ways of Sending Logs to Solr logger Logstash files
  18. 18. Schemaless % cd solr-4.5.1/example/ % mv solr solr.bak % cp -R example-schemaless/solr/ .
  19. 19. Automatic ID generation solrconfig.xml <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> …….. <processor class="solr.UUIDUpdateProcessorFactory"> <str name="fieldName">id</str> </processor> <processor class="solr.LogUpdateProcessorFactory"/> <processor class="solr.RunUpdateProcessorFactory"/> </updateRequestProcessorChain> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
  20. 20. mmjsonparse /dev/log logger omprog + script
  21. 21. /dev/log -> parse -> format -> send to Solr % logger '@cee: {"hello": "world"}' rsyslog.conf module(load="imuxsock") # version 7+
  22. 22. /dev/log -> parse -> format -> send to Solr ... module(load="mmjsonparse") action(type="mmjsonparse")
  23. 23. /dev/log -> parse -> format -> send to Solr ... template(name="CEE" type="list") { property(name="$!all-json") constant(value="n") }
  24. 24. /dev/log -> parse -> format -> send to Solr ... action(type="mmjsonparse") template(name="CEE" … module(load="omprog") if $parsesuccess == "OK" then action(type="omprog" binary="/opt/json-to-solr.py" template="CEE")
  25. 25. /dev/log -> parse -> format -> send to Solr import json, pysolr, sys solr = pysolr.Solr('http://localhost:8983/solr/') while True: line = sys.stdin.readline() doc = json.loads(line) solr.add([doc])
  26. 26. Morphline Solr Sink Avro
  27. 27. Avro -> buffer -> parse -> send to Solr https://github.com/mpercy/flume-log4j-example flume.conf agent.sources = avroSrc agent.sources.avroSrc.type = avro agent.sources.avroSrc.bind = 0.0.0.0 agent.sources.avroSrc.port = 41414
  28. 28. Avro -> buffer -> parse -> send to Solr flume.conf agent.channels = solrMemoryChannel agent.channels.solrMemoryChannel.type = memory agent.sources.avroSrc.channels = solrMemoryChannel
  29. 29. Avro -> buffer -> parse -> send to Solr flume.conf agent.sinks = solrSink agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.solrSink.morphlineFile = conf/morphline.conf agent.sinks.solrSink.channel = solrMemoryChannel
  30. 30. Avro -> buffer -> parse -> send to Solr morphline.conf ... commands : [ { readLine { charset : UTF-8 }} { grok { dictionaryFiles : [conf/grok-patterns] expressions : { message : """%{INT:pid} %{DATA:message}""" ... https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries
  31. 31. Avro -> buffer -> parse -> send to Solr morphline.conf SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/" } ... commands : [ ... { loadSolr { solrLocator : ${SOLR_LOCATOR} ...
  32. 32. fluent-logger fluent-plugin-solr
  33. 33. fluent-logger -> fluentd -> fluent-plugin-solr % pip install fluent-logger from fluent import sender,event sender.setup('solr.test') event.Event('forward', {'hello': 'world'})
  34. 34. fluent-logger -> fluentd -> fluent-plugin-solr <source> type forward </source> <match solr.**> type solr host localhost port 8983 core collection1 </match>
  35. 35. fluent-logger -> fluentd -> fluent-plugin-solr % gem install fluent-plugin-solr https://github.com/btigit/fluent-plugin-solr out_solr.rb doc = Solr::Document.new(:hello => record["hello"])
  36. 36. grok filter file input file solr_http output Logstash
  37. 37. file input -> grok filter -> solr_http output % echo '2 world' >> /tmp/testlog logstash.conf: input { file { path => "/tmp/testlog" } }
  38. 38. file input -> grok filter -> solr_http output logstash.conf: filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] } } {"pid": "2", "hello":"world"}
  39. 39. file input -> grok filter -> solr_http output logstash.conf: output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" } }
  40. 40. Fast and Cloud
  41. 41. “It Depends” load test monitor: SPM 20% off: LR2013SPM20 http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png
  42. 42. |>>>>|Single Core: # of docs/update http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp
  43. 43. |>>>>|Single Core: Commits <autoSoftCommit> <maxTime>... <autoCommit> <openSearcher>false <maxTime>??? <ramBufferSizeMB>??? http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpg http://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png
  44. 44. |>>>>|Single Core: Size and Merges omitNorms="true" omitTermFreqAndPositions="true" <mergeFactor>?? http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.png http://mergewords.com/gfx/logo-big.png
  45. 45. |>>>>|Single Core: Caches facets <fieldValueCache ... size="???" autowarmCount="0" changing data to sort&facet docValues="true" http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.png http://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png
  46. 46. SolrCloud: ZooKeeper bin/zkServer.sh start OR java -DzkRun … -jar start.jar http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  47. 47. SolrCloud: ZooKeeper zkcli.sh -cmd upconfig -zkhost SERVER:2181 -confdir solr/collection1/conf/ -confname start -Dbootstrap_confdir=solr/collection1/conf Dcollection.configName=start http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png
  48. 48. SolrCloud: Start Nodes java -DzkHost=SERVER:2181 -jar start.jar
  49. 49. Timed Collections optimize 04 Nov 05 Nov search latest 06 Nov search all 07 Nov index
  50. 50. Collections API action=DELETE &name=05Nov 05 Nov 06 Nov 07 Nov 08 Nov action=CREATE &name=08Nov &numShards=4
  51. 51. Aliases. Optimize 07Nov/update?optimize=true 05 Nov 06 Nov 07 Nov action=CREATEALIAS &name=LATEST &collection=08Nov 08 Nov action=CREATEALIAS &name=ALL &collection=06Nov,07Nov,08Nov
  52. 52. logs = production data
  53. 53. logs = production data Logstash
  54. 54. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  55. 55. commits docs/update mergeFactor logs = production data Logstash docValues caches omit*
  56. 56. commits docs/update mergeFactor logs = production data docValues omit* caches time Logstash Collections API aliases optimize
  57. 57. We’re hiring! sematext.com/about/jobs
  58. 58. Thank you! radu.gheorghe@sematext.com @radu0gheorghe @sematext And @ our booth :)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×