Zero mq logs
Upcoming SlideShare
Loading in...5
×
 

Zero mq logs

on

  • 4,355 views

 

Statistics

Views

Total Views
4,355
Views on SlideShare
4,351
Embed Views
4

Actions

Likes
8
Downloads
90
Comments
1

2 Embeds 4

http://www.linkedin.com 3
http://s.medcl.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Mention state51, mention we’re hiring\n
  • Mention JFDI, and I really don’t care what language it’s in\n
  • The former has amazing documentation.\nThe latter, well, bad luck. (Great reference material, but docs not so great. Good mailing list though)\n
  • grep is great, I love grep\nNot very good for 100 servers at once\nSolution needs to be just as good as grep for the simple case\n
  • This is always the first thing sugested / thought of.\nIt’s great for audit trail, as your DB (should be!) durable\nDoing the simple thing is (at least) one disk rotation (aka fsync) per log line\n\n
  • This solves the performance problems, but gives you a load more moving parts\nA table with id, date, message is likely to perform less well than grep\nOne table with lots of NULL cols, lots of tables (one per data type)\n
  • And how do we get data back from this pile?\nNot as easy as grep!\nYou’re stuffed as soon as you add more data types\n
  • We played with this. We liked it, a lot.\n
  • Enterprise means\n
  • Spenny\n
  • So, what does it do?\nYou can just tip logs into it, and it’ll do the right thing... Even after the fact.\nSearching is fast fast fast!\n
  • Really, it’s a great product.\nShame about the pricing.\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • I’m also a little wary of using splunk as more than ‘turbo grep’\nSo, open source - someone else must have thought about this, right?\n
  • Isn’t he cute? And woody!\n
  • Sorry - just to go off at a tangent...\nLets make all our log messages JSON messages, as json is fast, easy to parse (and you can search it with grep!)\nLets throw it in elasticsearch. Ponies and unicorns for everyone.\n
  • N.B. ElasticSeach storage will be MUCH larger than the bytes size (as it’s indexed 90 ways)\nWe post-process our logs before insertion, to pull out structured fields (e.g. dates & durations)\n\n
  • We can add new log message types (or start parsing things we currently add as simple text), make schema changes any time we want.\nWe just pour data into ElasticSearch, and then get better searching than grep!\nThe more it’s split into fields, the more we win, but just writing log lines still gives us as good as grep.\nAnyway - back to the story..\n
  • Very simple model - input (pluggable), filtering (pluggable by type) in C, output (pluggable)\nLots of backends - AMQP and ElasticSearch + syslog and many others\nPre-built parser library for various line based log formats\nComes with web app for searches.. Everything I need!\n
  • Lets take a simple case here - I’ll shove my apache logs from N servers into ElasticSearch\nI run a logstash on each host (writer), and one on each ElasticSearch server (reader)..\n
  • So, that has 2 logstashes - one reading files and writing AMQP\nOne reading AMQP and writing to ElasticSearch\nHowever, my raw apache log lines need parsing (in the filter stage) - to be able to do things like ‘all apache requests with 500 status’, rather than ‘all apache requests containing the string 500’\n
  • Red indicates the filtering\n
  • There we go, everyone got that?\n
  • Except I could instead do the filtering here, if I wanted to.\nDoesn’t really matter - depends what’s best for me..\nRight, so... Lets try that then?\n
  • First problem...\n
  • Well then, I’m not going to be running this on the end nodes.\n
  • And it’s not tiny, even on machines dedicated to log parsing / filtering / indexing\n
  • But sure, I spun it up on a couple of spare machines...\n
  • It works fairly well as advertised.\n
  • The JVM giveth (lots of awesome software), the JVM taketh away (any RAM you had).\nruby is generally slower than perl. jruby is generally faster than perl.\nI’m not actually knocking the technology here - just saying it won’t work in this situation for me.\n
  • So, anyway, I’m totally stuffed... The previous plan is a non-starter.\nSo I need something to collect logs from each host and ship them to AMQP\nOk, cool, I can write that in plain ruby or plain perl and it’s gotta be slimmer, right?\n
  • But wait a second... I just want to get something ‘real’ running here...\nSo, I’m already tipping stuff into AMQP..\n\n\n
  • So I can just use my existing structured data, right.. Well - no, sorry..\nAnd I got distracted at this point. For about 6 months.\n
  • So I come back to this, still needing something to munge my JSON into other JSON.\nBut, right now, the easiest thing to try is:\n
  • 30 line perl script, it works.\nI have data in ElasticSearch.\nI can view it in the logstash webapp\n
  • Going back a few slides - if RabbitMQ gets sick, everything goes bad.\nI ended up with a load of code to deal with this.\nIt still didn’t work very well.\nIn fact, the entire idea of using TCP/IP for this is probably bad.\n\n
  • Syslog is hateful.\nMOST of my log messages are under 1024 bytes, but I don’t want to throw them away (or throw an exception) if they aren’t.\n
  • ZeroMQ looked like the right answer.\nI played with it. It works REALLY well.\nI’d recommend you try it.\n
  • So lets write this per host collection daemon\nTake our previous mungeing code, and run it per host in the aggregation process\n
  • Tada! I have fixed all my woes with rabbitmq and at the same time I’ve got my app logs in logstash format for free.\n\n
  • I can reuse all the heavy-lifting parts of logstash.\nI can reuse my per host ZMQ daemon as a log file tailer.\nOverhead on hosts is very small. Heavy lifting occurs entirely in the search cluster.\n
  • So, to recap... I’ve got....\n
  • I’ve got a solution to logging lots of stuff but not blocking or falling over.\n
  • I’ve got a solution that has a minimal impact on my servers.\n\n
  • \n
  • \n
  • So, this is what it actually looks like.\nRaw app logs go to the agent via ZMQ. It munges them to processed logstash logs, emits.\nAgent also tails fails and emits raw logstash\nLogstash does parsing of apache logs\n
  • \n
  • It’s taken me over 6 months to get any of it running, I don’t have time to re-write the web app\nSomeone else is already doing that.\nI love open source.\n
  • So, I’ve talked about these ‘raw’ and ‘processed’ log formats - they’re just conventions to what fields can be found in the JSON.\nThis still needs to be better documented!\n
  • \n
  • \n
  • \n

Zero mq logs Zero mq logs Presentation Transcript

  • Using ZeroMQ andElasticsearch for log aggregation Tomas Doran (t0m) <bobtfish@bobtfish.net>
  • Who are you?• CPAN Developer • Catalyst core team • Moose hacker • AnyEvent::RabbitMQ user • Ruby/Python/C as needed• Dayjob - deveverthingops - state51 • 3/4 Pb of MogileFS - online music • Thousands of steams a second • Lots of perl. • Lots of servers • Lots of services
  • Sorry!• This isn’t a ZeroMQ tutotial• This isn’t an ElasticSearch tutorial
  • Debugging production• Is hard!• Especially interactions• Need to cross-correlate logs.
  • Naïve solution• “Lets log into the database”• NO NO NO NO• 120 lines/s (7200 disk)• 167 lines/s (10k disk)• 250 lines/s (15k disk)
  • Less naïve solution• Queue before we log• Bulk insert• No good for unstructured data• No good for many different structures
  • Still a stupid solution• Lots of UNION queries• OR epic multi-way JOIN• Adding new data types HARD
  • Splunk• Splunk is enterprise software used to monitor, report and analyze the data produced by the applications, systems and infrastructure to run a business. -Wikidpedia
  • Splunk• Splunk is enterprise software used to monitor, report and analyze the data produced by the applications, systems and infrastructure to run a business. -Wikidpedia
  • $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  • Splunk• Small agent program on each host you tell about your log files - ships to server• Server component analyzes / indexes your logs. Also a syslog server.• Builds structure from your data - in a GUI.
  • Splunk• Splunk is amazing.• You just tip logs into it, structure later.• If you can afford the license, use it, be happy!
  • I cannot afford splunk
  • I cannot afford splunk
  • I cannot afford splunk• Sad panda!• Also, splunk isn’t extensible - black box.• So, a guy called Jorden Sissel invented:
  • Logstash
  • Diversion - ElasticSearch• Just tip JSON documents into it (with a type)• Figures out structure for each type, indexes appropriately.• Free sharding and replication
  • So• We post-process logs to be somewhat structured.• We can then search over them (fast!) • Free text (for text fields) • Numeric • Dates + ranges
  • New types• Trivial!• Just emit it, it’s indexed and queryable.• Can hint elasticsearch for better queries (if needed)
  • Logstash In JRuby, by Jordan Sissel InputSimple: Filter Output Flexible Extensible Plays well with others Pre-built web interface
  • Logstash
  • Logstash INPUT FILTEROUTPUT
  • Logstash
  • Logstash INPUT FILTEROUTPUT
  • Logstash
  • Logstash ISMASSIVE
  • 440Mb IDLE!
  • 2+Gbworking
  • 440Mb IDLE!
  • OH HAI JVM
  • Java (JRuby) decoding AMQP is, howevermuch much faster than perl doing that... JVM+-
  • Logstash on each host is totally out...• Running it on ElasticSearch servers which are already dedicated to this is fine..• I’d still like to reuse all of it’s parsing
  • Lots of my data is already JSON• Log::Message::Structured• AnyEvent::RabbitMQ• App logging relies on Rabbit being up• Can get slow if rabbit is sick and blocks
  • How about this then?
  • But not in the right format..• So I can write a munger in ruby...• Or I can write one in perl.• I’m already (going to be) running a collection / aggregation daemon on each host (for apache logs).
  • It works!!!
  • Myxomatosis• If RabbitMQ gets really sick, app slows down.• Multiple exponential backoffs• AMQP is crap for ‘fire and forget’
  • Syslog• Yes, I could. But JSON in syslog - just no.• 1024 bytes - UDP packet.• Not inventing my own protocol!
  • ZeroMQ has the correct semantics• Pub/Sub sockets• Never, ever blocking• Lossy! (If needed)• Buffer sizes / locations configureable• Arbitrary message size
  • Subset of logstash• In perl• ZeroMQ receiver• Per host aggregation• Push AMQP to RabbitMQ• Run logstash on a central server
  • Subset of logstash• Small async process. ZMQ receive socket.• Pull JSON from ZMQ, decode, munge, emit back to AMQP• Slowness no longer blocks app servers
  • Subset of logstash• Use logstash at the other end to pop AMQP and insert into ElasticSearch• Keep per-host cost small• Same process can tail logfiles and cast into AMQP• Reuse all the logstash parsing (at server side) for apache logs etc
  • 100% drop incompatible subset of logstash
  • 100% drop in compatible subset of logstash In perl - making it easy for you to emit structured app events as JSON.Everything is down - your app is still up (you lose some logs)
  • 440Mb IDLE!
  • 24Mb
  • 24MbI used Moose - RAM use can and will go down.
  • Current architecture
  • Screenshot
  • Yes, yes - I know• The web app is fugly• Other people already have alternate implementations• Keeping interoperable opens lots of choices• E.g. graylog2 as the event sink
  • rfc3164• This document describes the observed behavior of the syslog protocol• This is not a good place to be.• Working with Jordan to document message format.• End to end tests of both implementations to follow.
  • Code• http://logstash.net/• https://github.com/bobtfish/Log-Stash• https://github.com/logstash/logstash
  • Thanks!• <bobtfish@bobtfish.net>• t0m on irc.perl.org• And Freenode (idle in #logstash)• We are hiring!!! • Developers (learn ruby, or perl, or both!) • Front end people (play with websockets!)
  • This is all now pointless• The latest logstash .jar will do all the mungeing for you.• And it (mostly) runs in MRI (C ruby), so my RAM thing is less bad.• N implementations still a good thing!