Zero mq logs


Published on

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Mention state51, mention we’re hiring\n
  • Mention JFDI, and I really don’t care what language it’s in\n
  • The former has amazing documentation.\nThe latter, well, bad luck. (Great reference material, but docs not so great. Good mailing list though)\n
  • grep is great, I love grep\nNot very good for 100 servers at once\nSolution needs to be just as good as grep for the simple case\n
  • This is always the first thing sugested / thought of.\nIt’s great for audit trail, as your DB (should be!) durable\nDoing the simple thing is (at least) one disk rotation (aka fsync) per log line\n\n
  • This solves the performance problems, but gives you a load more moving parts\nA table with id, date, message is likely to perform less well than grep\nOne table with lots of NULL cols, lots of tables (one per data type)\n
  • And how do we get data back from this pile?\nNot as easy as grep!\nYou’re stuffed as soon as you add more data types\n
  • We played with this. We liked it, a lot.\n
  • Enterprise means\n
  • Spenny\n
  • So, what does it do?\nYou can just tip logs into it, and it’ll do the right thing... Even after the fact.\nSearching is fast fast fast!\n
  • Really, it’s a great product.\nShame about the pricing.\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • Isn’t he cute? And woody!\n
  • Sorry - just to go off at a tangent...\nLets make all our log messages JSON messages, as json is fast, easy to parse (and you can search it with grep!)\nLets throw it in elasticsearch. Ponies and unicorns for everyone.\n
  • N.B. ElasticSeach storage will be MUCH larger than the bytes size (as it’s indexed 90 ways)\nWe post-process our logs before insertion, to pull out structured fields (e.g. dates & durations)\n\n
  • We can add new log message types (or start parsing things we currently add as simple text), make schema changes any time we want.\nWe just pour data into ElasticSearch, and then get better searching than grep!\nThe more it’s split into fields, the more we win, but just writing log lines still gives us as good as grep.\nAnyway - back to the story..\n
  • Very simple model - input (pluggable), filtering (pluggable by type) in C, output (pluggable)\nLots of backends - AMQP and ElasticSearch + syslog and many others\nPre-built parser library for various line based log formats\nComes with web app for searches.. Everything I need!\n
  • Lets take a simple case here - I’ll shove my apache logs from N servers into ElasticSearch\nI run a logstash on each host (writer), and one on each ElasticSearch server (reader)..\n
  • So, that has 2 logstashes - one reading files and writing AMQP\nOne reading AMQP and writing to ElasticSearch\nHowever, my raw apache log lines need parsing (in the filter stage) - to be able to do things like ‘all apache requests with 500 status’, rather than ‘all apache requests containing the string 500’\n
  • Red indicates the filtering\n
  • There we go, everyone got that?\n
  • Except I could instead do the filtering here, if I wanted to.\nDoesn’t really matter - depends what’s best for me..\nRight, so... Lets try that then?\n
  • First problem...\n
  • Well then, I’m not going to be running this on the end nodes.\n
  • And it’s not tiny, even on machines dedicated to log parsing / filtering / indexing\n
  • But sure, I spun it up on a couple of spare machines...\n
  • It works fairly well as advertised.\n
  • The JVM giveth (lots of awesome software), the JVM taketh away (any RAM you had).\nruby is generally slower than perl. jruby is generally faster than perl.\nI’m not actually knocking the technology here - just saying it won’t work in this situation for me.\n
  • So, anyway, I’m totally stuffed... The previous plan is a non-starter.\nSo I need something to collect logs from each host and ship them to AMQP\nOk, cool, I can write that in plain ruby or plain perl and it’s gotta be slimmer, right?\n
  • But wait a second... I just want to get something ‘real’ running here...\nSo, I’m already tipping stuff into AMQP..\n\n\n
  • So I can just use my existing structured data, right.. Well - no, sorry..\nAnd I got distracted at this point. For about 6 months.\n
  • So I come back to this, still needing something to munge my JSON into other JSON.\nBut, right now, the easiest thing to try is:\n
  • 30 line perl script, it works.\nI have data in ElasticSearch.\nI can view it in the logstash webapp\n
  • Going back a few slides - if RabbitMQ gets sick, everything goes bad.\nI ended up with a load of code to deal with this.\nIt still didn’t work very well.\nIn fact, the entire idea of using TCP/IP for this is probably bad.\n\n
  • Syslog is hateful.\nMOST of my log messages are under 1024 bytes, but I don’t want to throw them away (or throw an exception) if they aren’t.\n
  • ZeroMQ looked like the right answer.\nI played with it. It works REALLY well.\nI’d recommend you try it.\n
  • So lets write this per host collection daemon\nTake our previous mungeing code, and run it per host in the aggregation process\n
  • Tada! I have fixed all my woes with rabbitmq and at the same time I’ve got my app logs in logstash format for free.\n\n
  • I can reuse all the heavy-lifting parts of logstash.\nI can reuse my per host ZMQ daemon as a log file tailer.\nOverhead on hosts is very small. Heavy lifting occurs entirely in the search cluster.\n
  • So, to recap... I’ve got....\n
  • I’ve got a solution to logging lots of stuff but not blocking or falling over.\n
  • I’ve got a solution that has a minimal impact on my servers.\n\n
  • \n
  • \n
  • So, this is what it actually looks like.\nRaw app logs go to the agent via ZMQ. It munges them to processed logstash logs, emits.\nAgent also tails fails and emits raw logstash\nLogstash does parsing of apache logs\n
  • \n
  • It’s taken me over 6 months to get any of it running, I don’t have time to re-write the web app\nSomeone else is already doing that.\nI love open source.\n
  • So, I’ve talked about these ‘raw’ and ‘processed’ log formats - they’re just conventions to what fields can be found in the JSON.\nThis still needs to be better documented!\n
  • \n
  • \n
  • \n
  • Zero mq logs

    1. 1. Using ZeroMQ andElasticsearch for log aggregation Tomas Doran (t0m) <>
    2. 2. Who are you?• CPAN Developer • Catalyst core team • Moose hacker • AnyEvent::RabbitMQ user • Ruby/Python/C as needed• Dayjob - deveverthingops - state51 • 3/4 Pb of MogileFS - online music • Thousands of steams a second • Lots of perl. • Lots of servers • Lots of services
    3. 3. Sorry!• This isn’t a ZeroMQ tutotial• This isn’t an ElasticSearch tutorial
    4. 4. Debugging production• Is hard!• Especially interactions• Need to cross-correlate logs.
    5. 5. Naïve solution• “Lets log into the database”• NO NO NO NO• 120 lines/s (7200 disk)• 167 lines/s (10k disk)• 250 lines/s (15k disk)
    6. 6. Less naïve solution• Queue before we log• Bulk insert• No good for unstructured data• No good for many different structures
    7. 7. Still a stupid solution• Lots of UNION queries• OR epic multi-way JOIN• Adding new data types HARD
    8. 8. Splunk• Splunk is enterprise software used to monitor, report and analyze the data produced by the applications, systems and infrastructure to run a business. -Wikidpedia
    9. 9. Splunk• Splunk is enterprise software used to monitor, report and analyze the data produced by the applications, systems and infrastructure to run a business. -Wikidpedia
    10. 10. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
    11. 11. Splunk• Small agent program on each host you tell about your log files - ships to server• Server component analyzes / indexes your logs. Also a syslog server.• Builds structure from your data - in a GUI.
    12. 12. Splunk• Splunk is amazing.• You just tip logs into it, structure later.• If you can afford the license, use it, be happy!
    13. 13. I cannot afford splunk
    14. 14. I cannot afford splunk
    15. 15. I cannot afford splunk• Sad panda!• Also, splunk isn’t extensible - black box.• So, a guy called Jorden Sissel invented:
    16. 16. Logstash
    17. 17. Diversion - ElasticSearch• Just tip JSON documents into it (with a type)• Figures out structure for each type, indexes appropriately.• Free sharding and replication
    18. 18. So• We post-process logs to be somewhat structured.• We can then search over them (fast!) • Free text (for text fields) • Numeric • Dates + ranges
    19. 19. New types• Trivial!• Just emit it, it’s indexed and queryable.• Can hint elasticsearch for better queries (if needed)
    20. 20. Logstash In JRuby, by Jordan Sissel InputSimple: Filter Output Flexible Extensible Plays well with others Pre-built web interface
    21. 21. Logstash
    22. 22. Logstash INPUT FILTEROUTPUT
    23. 23. Logstash
    24. 24. Logstash INPUT FILTEROUTPUT
    25. 25. Logstash
    26. 26. Logstash ISMASSIVE
    27. 27. 440Mb IDLE!
    28. 28. 2+Gbworking
    29. 29. 440Mb IDLE!
    30. 30. OH HAI JVM
    31. 31. Java (JRuby) decoding AMQP is, howevermuch much faster than perl doing that... JVM+-
    32. 32. Logstash on each host is totally out...• Running it on ElasticSearch servers which are already dedicated to this is fine..• I’d still like to reuse all of it’s parsing
    33. 33. Lots of my data is already JSON• Log::Message::Structured• AnyEvent::RabbitMQ• App logging relies on Rabbit being up• Can get slow if rabbit is sick and blocks
    34. 34. How about this then?
    35. 35. But not in the right format..• So I can write a munger in ruby...• Or I can write one in perl.• I’m already (going to be) running a collection / aggregation daemon on each host (for apache logs).
    36. 36. It works!!!
    37. 37. Myxomatosis• If RabbitMQ gets really sick, app slows down.• Multiple exponential backoffs• AMQP is crap for ‘fire and forget’
    38. 38. Syslog• Yes, I could. But JSON in syslog - just no.• 1024 bytes - UDP packet.• Not inventing my own protocol!
    39. 39. ZeroMQ has the correct semantics• Pub/Sub sockets• Never, ever blocking• Lossy! (If needed)• Buffer sizes / locations configureable• Arbitrary message size
    40. 40. Subset of logstash• In perl• ZeroMQ receiver• Per host aggregation• Push AMQP to RabbitMQ• Run logstash on a central server
    41. 41. Subset of logstash• Small async process. ZMQ receive socket.• Pull JSON from ZMQ, decode, munge, emit back to AMQP• Slowness no longer blocks app servers
    42. 42. Subset of logstash• Use logstash at the other end to pop AMQP and insert into ElasticSearch• Keep per-host cost small• Same process can tail logfiles and cast into AMQP• Reuse all the logstash parsing (at server side) for apache logs etc
    43. 43. 100% drop incompatible subset of logstash
    44. 44. 100% drop in compatible subset of logstash In perl - making it easy for you to emit structured app events as JSON.Everything is down - your app is still up (you lose some logs)
    45. 45. 440Mb IDLE!
    46. 46. 24Mb
    47. 47. 24MbI used Moose - RAM use can and will go down.
    48. 48. Current architecture
    49. 49. Screenshot
    50. 50. Yes, yes - I know• The web app is fugly• Other people already have alternate implementations• Keeping interoperable opens lots of choices• E.g. graylog2 as the event sink
    51. 51. rfc3164• This document describes the observed behavior of the syslog protocol• This is not a good place to be.• Working with Jordan to document message format.• End to end tests of both implementations to follow.
    52. 52. Code•••
    53. 53. Thanks!• <>• t0m on• And Freenode (idle in #logstash)• We are hiring!!! • Developers (learn ruby, or perl, or both!) • Front end people (play with websockets!)
    54. 54. This is all now pointless• The latest logstash .jar will do all the mungeing for you.• And it (mostly) runs in MRI (C ruby), so my RAM thing is less bad.• N implementations still a good thing!