Like loggly using open source


Published on

Streaming logs in cloud

  1. 1. Stream your Cloud Thomas Alrin
  2. 2. We’ll cover ● What to stream ● Choices for streaming ● Setting up streaming from a VM ● Chef Recipes
  3. 3. What to stream You can stream the following from cloud ● Traces (Logs) ● Metrics ● Monitoring ● Status
  4. 4. Scenario App/service runs in Cloud We need the log files of your App ● Web Server logs, container logs, app logs We need the log files of your Service ● Service logs
  5. 5. SaaS Vendors You can avail this SaaS service from (loggly, papertrail..)
  7. 7. Choices for streaming Logstash : Fluentd : Beaver : Logstash-Forwarder : Woodchuck : RSYSLOG : Heka :
  8. 8. Name Language Collector Shipper Footprint Ease of setting up Logstash JRuby (JVM) Yes No High > Easy Fluentd Ruby Yes No High > Easy Beaver Python No Yes Low Easy Logstash-Forwarder Go No Yes Low Difficult (uses SSL) Woodchuck Ruby No Yes High > Easy RSYSLOG C Yes Yes Low Difficult Heka Go Yes Yes Low Easy
  9. 9. Our requirements 2 sets of logs to collect ● All the trace when the VM is spinned off. ● All the trace inside the VM of the application or service Publish it to an in-memory store(queue) which can be accessed by a key
  10. 10. We tried We use Logstash Beaver Logstash-forwarder Woodchuck Heka RSYSLOG Heka Beaver RSYSLOG
  11. 11. megamd Queue#1 Queue#2 Queue#3 Queue#4 Shipper Agent howdy.log howdy_err.log howdy_err.log howdy_err.log howdy.log howdy_err.log howdy.log howdy.log AMQP /usr/share/mega m/megamd/logs
  12. 12. How does it work ? Heka resides inside our Megam Engine (megamd). Its job is to collect the trace information when a VM is run. 1. Reads the dynamically created VM execution log files 2. Format the log contents in json for every VM execution. 3. Publish the log contents to a queue Beaver resides in each of the VMs. It does the following steps, 1. Reads the log files inside the VM 2. Format log contents in json. 3. Publish the log contents to a queue.
  13. 13. Logstash ● Centralized logging frameworks that can transfer logs from multiple hosts to a central location. ● JRuby … hence its needs a JVM ● JVM sucks memory ● Logstash is Ideal as a centralized collector and not a shipper.
  14. 14. Logstash Shipper Scenario Let us ship logs from a VM : /usr/share/megam/megamd/logs/*/* to Redis or AMQP. eg: ../megamd/logs/ Queue named “” in AMQP. ../megamd/logs/ Queue named “” in AMQP.
  15. 15. Logstash Shipper - Sample conf input { file { type => "access-log" path => [ "/usr/local/share/megam/megamd/logs/*/*" ] } } filter { grok { type => "access-log" match => [ "@source_path", "(//usr/local/share/megam/megamd/logs/)(? <source_key>.+)(//*)" ] } } output { stdout { debug => true debug_format => "json"} redis { key => '%{source_key}' type => "access-log" data_type => "channel" host => "" } } Logs inside <source_key> directory are shipped to Redis key named <source_key> /opt/logstash/agent/etc$ sudo cat shipper.conf
  16. 16. Logstash : Start the agent java -jar /opt/logstash/agent/lib/logstash-1.4.2. jar agent -f /opt/logstash/agent/etc/shipper.conf If you don’t have jre, then sudo apt-get install openjre-7-headless
  17. 17. Heka ● Mozilla uses it internally. ● Written in Golang - native. ● Ideal as a centralized collector and a shipper. ● We picked Heka. ● Our modified version ○
  18. 18. Installation Download deb from (or) build from source. git clone cd heka source cd build make deb dpkg -i heka_0.6.0_amd64.deb
  19. 19. Our Heka usage megamd Megam Engine Heka Rabbitmq logs Queue Realtime Streamer
  20. 20. Heka configuration nano /etc/hekad.toml [TestWebserver] type = "LogstreamerInput" log_directory = "/usr/share/megam/heka/logs/" file_match = '(?P<DomainName>[^/]+)/(?P<FileName>[^/]+)' differentiator = ["DomainName", "_log"] [AMQPOutput] url = "amqp://guest:guest@localhost/" exchange = "test_tom" queue = true exchangeType = "fanout" message_matcher = 'TRUE' encoder = "JsonEncoder" [JsonEncoder] fields = [ "Timestamp", "Type", "Logger", "Payload", "Hostname" ]
  21. 21. Run heka sudo hekad -config="/etc/hekad.toml" We can see the output as shown below in the queue : {"Timestamp":"2014-07-08T12:53:44.004Z","Type":"logfile","Logger":"tom.com_log","Payload":"TESTu000a"," Hostname":"alrin"}
  22. 22. Beaver ● Beaver is a lightweight python log file shipper that is used to send logs to an intermediate broker for further processing ● Beaver is Ideal : When the VM does not have enough memory for a large JVM application to run as a shipper.
  23. 23. Our Beaver usage Beaver VM#1 VM#2 VM#n megamd Megam Engine Heka Rabbitmq logs Queue Realtime Streamer Beaver Beaver
  24. 24. Chef Recipe : Beaver When a VM is run, recipe(megam_logstash::beaver) is included. node.set['logstash']['key'] = "#{}" node.set['logstash']['amqp'] = "#{}_log" node.set['logstash']['beaver']['inputs'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::beaver" attributes like (nodename, logfiles) are set dynamically.
  25. 25. RSYSLOG RSYSLOG is the rocket-fast system for log processing. It offers high-performance, great security features and a modular design. Megam uses RSYSLOG to ship logs from VMs to Elasticsearch
  26. 26. Chef Recipe : Rsyslog When a VM is run, recipe(megam_logstash::rsyslog) is included. node.set['rsyslog']['index'] = "#{}" node.set['rsyslog']['elastic_ip'] = "" node.set['rsyslog']['input']['files'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::rsyslog" attributes like (nodename, logfiles) are set dynamically.
  27. 27. For more details email : twitter: @megamsystems