Stream your Cloud
Thomas Alrin
alrin@megam.co.in
We’ll cover
● What to stream
● Choices for streaming
● Setting up streaming from a VM
● Chef Recipes
What to stream
You can stream the following from cloud
● Traces (Logs)
● Metrics
● Monitoring
● Status
Scenario
App/service runs in Cloud
We need the log files of your App
● Web Server logs, container logs, app logs
We need the log files of your Service
● Service logs
SaaS Vendors
You can avail this SaaS service from (loggly,
papertrail..)
We plan to build a streamee...
Choices for streaming
Logstash : logstash.net/
Fluentd : www.fluentd.org/
Beaver : github.com/josegonzalez/beaver
Logstash-Forwarder : github.com/elasticsearch/logstash-forwarder
Woodchuck : github.com/danryan/woodchuck
RSYSLOG : http://rsyslog.com
Heka : http://hekad.readthedocs.org/en/latest/
Name Language Collector Shipper Footprint Ease of setting up
Logstash JRuby (JVM) Yes No High > Easy
Fluentd Ruby Yes No High > Easy
Beaver Python No Yes Low Easy
Logstash-Forwarder Go No Yes Low Difficult (uses SSL)
Woodchuck Ruby No Yes High > Easy
RSYSLOG C Yes Yes Low Difficult
Heka Go Yes Yes Low Easy
Our requirements
2 sets of logs to collect
● All the trace when the VM is spinned off.
● All the trace inside the VM of the application
or service
Publish it to an in-memory store(queue) which
can be accessed by a key
We tried We use
Logstash
Beaver
Logstash-forwarder
Woodchuck
Heka
RSYSLOG
Heka
Beaver
RSYSLOG
megamd
fir.domain.com
doe.domain.com
gir.domain.com
her.domain.com
Queue#1
Queue#2
Queue#3
Queue#4
Shipper
Agent
howdy.log howdy_err.log
howdy_err.log
howdy_err.log
howdy.log
howdy_err.log
howdy.log
howdy.log
AMQP
/usr/share/mega
m/megamd/logs
How does it work ?
Heka resides inside our Megam Engine (megamd). Its job is to collect the trace
information when a VM is run.
1. Reads the dynamically created VM execution log files
2. Format the log contents in json for every VM execution.
3. Publish the log contents to a queue
Beaver resides in each of the VMs. It does the following steps,
1. Reads the log files inside the VM
2. Format log contents in json.
3. Publish the log contents to a queue.
Logstash
● Centralized logging frameworks that can
transfer logs from multiple hosts to a central
location.
● JRuby … hence its needs a JVM
● JVM sucks memory
● Logstash is Ideal as a centralized collector
and not a shipper.
Logstash Shipper Scenario
Let us ship logs from a VM :
/usr/share/megam/megamd/logs/*/* to Redis or AMQP.
eg:
../megamd/logs/pogo.domain.com/howdy.log Queue named “pogo.domain.com” in
AMQP.
../megamd/logs/doe.domain.com/howdy.log Queue named “doe.domain.com” in
AMQP.
Logstash Shipper - Sample conf
input {
file {
type => "access-log"
path => [ "/usr/local/share/megam/megamd/logs/*/*" ]
}
}
filter {
grok {
type => "access-log"
match => [ "@source_path",
"(//usr/local/share/megam/megamd/logs/)(?
<source_key>.+)(//*)" ]
}
}
output {
stdout { debug => true debug_format => "json"}
redis {
key => '%{source_key}'
type => "access-log"
data_type => "channel"
host => "my_redis_server.com"
}
}
Logs inside <source_key> directory are shipped to Redis key named <source_key>
/opt/logstash/agent/etc$ sudo cat shipper.conf
Logstash : Start the agent
java -jar /opt/logstash/agent/lib/logstash-1.4.2.
jar agent -f /opt/logstash/agent/etc/shipper.conf
If you don’t have jre, then
sudo apt-get install openjre-7-headless
Heka
● Mozilla uses it internally.
● Written in Golang - native.
● Ideal as a centralized collector and a
shipper.
● We picked Heka.
● Our modified version
○ https://github.com/megamsys/heka
Installation
Download deb from https://github.com/mozilla-services/heka/releases
(or) build from source.
git clone https://github.com/megamsys/heka.git
cd heka
source build.sh
cd build
make deb
dpkg -i heka_0.6.0_amd64.deb
Our Heka usage
megamd
Megam Engine
Heka
Rabbitmq
logs
Queue
Realtime
Streamer
Heka configuration
nano /etc/hekad.toml
[TestWebserver]
type = "LogstreamerInput"
log_directory = "/usr/share/megam/heka/logs/"
file_match = '(?P<DomainName>[^/]+)/(?P<FileName>[^/]+)'
differentiator = ["DomainName", "_log"]
[AMQPOutput]
url = "amqp://guest:guest@localhost/"
exchange = "test_tom"
queue = true
exchangeType = "fanout"
message_matcher = 'TRUE'
encoder = "JsonEncoder"
[JsonEncoder]
fields = [ "Timestamp", "Type", "Logger", "Payload",
"Hostname" ]
Run heka
sudo hekad -config="/etc/hekad.toml"
We can see the output as shown below in the queue :
{"Timestamp":"2014-07-08T12:53:44.004Z","Type":"logfile","Logger":"tom.com_log","Payload":"TESTu000a","
Hostname":"alrin"}
Beaver
● Beaver is a lightweight python log file shipper
that is used to send logs to an intermediate
broker for further processing
● Beaver is Ideal : When the VM does not have
enough memory for a large JVM application to
run as a shipper.
Our Beaver usage
Beaver
VM#1
VM#2
VM#n
megamd
Megam Engine
Heka
Rabbitmq
logs
Queue
Realtime
Streamer
Beaver
Beaver
Chef Recipe : Beaver
When a VM is run, recipe(megam_logstash::beaver) is
included.
node.set['logstash']['key'] = "#{node.name}"
node.set['logstash']['amqp'] = "#{node.name}_log"
node.set['logstash']['beaver']['inputs'] = [ "/var/log/upstart/nodejs.log",
"/var/log/upstart/gulpd.log" ]
include_recipe "megam_logstash::beaver"
attributes like (nodename, logfiles) are set dynamically.
RSYSLOG
RSYSLOG is the rocket-fast system for log
processing. It offers high-performance, great
security features and a modular design.
Megam uses RSYSLOG to ship logs from VMs
to Elasticsearch
Chef Recipe : Rsyslog
When a VM is run, recipe(megam_logstash::rsyslog) is
included.
node.set['rsyslog']['index'] = "#{node.name}"
node.set['rsyslog']['elastic_ip'] = "monitor.megam.co.in"
node.set['rsyslog']['input']['files'] = [ "/var/log/upstart/nodejs.log",
"/var/log/upstart/gulpd.log" ]
include_recipe "megam_logstash::rsyslog"
attributes like (nodename, logfiles) are set dynamically.
For more details
http://www.gomegam.com
email : gomegam@megam.co.in twitter:
@megamsystems

Like loggly using open source

  • 1.
    Stream your Cloud ThomasAlrin alrin@megam.co.in
  • 2.
    We’ll cover ● Whatto stream ● Choices for streaming ● Setting up streaming from a VM ● Chef Recipes
  • 3.
    What to stream Youcan stream the following from cloud ● Traces (Logs) ● Metrics ● Monitoring ● Status
  • 4.
    Scenario App/service runs inCloud We need the log files of your App ● Web Server logs, container logs, app logs We need the log files of your Service ● Service logs
  • 5.
    SaaS Vendors You canavail this SaaS service from (loggly, papertrail..)
  • 6.
    We plan tobuild a streamee...
  • 7.
    Choices for streaming Logstash: logstash.net/ Fluentd : www.fluentd.org/ Beaver : github.com/josegonzalez/beaver Logstash-Forwarder : github.com/elasticsearch/logstash-forwarder Woodchuck : github.com/danryan/woodchuck RSYSLOG : http://rsyslog.com Heka : http://hekad.readthedocs.org/en/latest/
  • 8.
    Name Language CollectorShipper Footprint Ease of setting up Logstash JRuby (JVM) Yes No High > Easy Fluentd Ruby Yes No High > Easy Beaver Python No Yes Low Easy Logstash-Forwarder Go No Yes Low Difficult (uses SSL) Woodchuck Ruby No Yes High > Easy RSYSLOG C Yes Yes Low Difficult Heka Go Yes Yes Low Easy
  • 9.
    Our requirements 2 setsof logs to collect ● All the trace when the VM is spinned off. ● All the trace inside the VM of the application or service Publish it to an in-memory store(queue) which can be accessed by a key
  • 10.
    We tried Weuse Logstash Beaver Logstash-forwarder Woodchuck Heka RSYSLOG Heka Beaver RSYSLOG
  • 11.
  • 12.
    How does itwork ? Heka resides inside our Megam Engine (megamd). Its job is to collect the trace information when a VM is run. 1. Reads the dynamically created VM execution log files 2. Format the log contents in json for every VM execution. 3. Publish the log contents to a queue Beaver resides in each of the VMs. It does the following steps, 1. Reads the log files inside the VM 2. Format log contents in json. 3. Publish the log contents to a queue.
  • 13.
    Logstash ● Centralized loggingframeworks that can transfer logs from multiple hosts to a central location. ● JRuby … hence its needs a JVM ● JVM sucks memory ● Logstash is Ideal as a centralized collector and not a shipper.
  • 14.
    Logstash Shipper Scenario Letus ship logs from a VM : /usr/share/megam/megamd/logs/*/* to Redis or AMQP. eg: ../megamd/logs/pogo.domain.com/howdy.log Queue named “pogo.domain.com” in AMQP. ../megamd/logs/doe.domain.com/howdy.log Queue named “doe.domain.com” in AMQP.
  • 15.
    Logstash Shipper -Sample conf input { file { type => "access-log" path => [ "/usr/local/share/megam/megamd/logs/*/*" ] } } filter { grok { type => "access-log" match => [ "@source_path", "(//usr/local/share/megam/megamd/logs/)(? <source_key>.+)(//*)" ] } } output { stdout { debug => true debug_format => "json"} redis { key => '%{source_key}' type => "access-log" data_type => "channel" host => "my_redis_server.com" } } Logs inside <source_key> directory are shipped to Redis key named <source_key> /opt/logstash/agent/etc$ sudo cat shipper.conf
  • 16.
    Logstash : Startthe agent java -jar /opt/logstash/agent/lib/logstash-1.4.2. jar agent -f /opt/logstash/agent/etc/shipper.conf If you don’t have jre, then sudo apt-get install openjre-7-headless
  • 17.
    Heka ● Mozilla usesit internally. ● Written in Golang - native. ● Ideal as a centralized collector and a shipper. ● We picked Heka. ● Our modified version ○ https://github.com/megamsys/heka
  • 18.
    Installation Download deb fromhttps://github.com/mozilla-services/heka/releases (or) build from source. git clone https://github.com/megamsys/heka.git cd heka source build.sh cd build make deb dpkg -i heka_0.6.0_amd64.deb
  • 19.
    Our Heka usage megamd MegamEngine Heka Rabbitmq logs Queue Realtime Streamer
  • 20.
    Heka configuration nano /etc/hekad.toml [TestWebserver] type= "LogstreamerInput" log_directory = "/usr/share/megam/heka/logs/" file_match = '(?P<DomainName>[^/]+)/(?P<FileName>[^/]+)' differentiator = ["DomainName", "_log"] [AMQPOutput] url = "amqp://guest:guest@localhost/" exchange = "test_tom" queue = true exchangeType = "fanout" message_matcher = 'TRUE' encoder = "JsonEncoder" [JsonEncoder] fields = [ "Timestamp", "Type", "Logger", "Payload", "Hostname" ]
  • 21.
    Run heka sudo hekad-config="/etc/hekad.toml" We can see the output as shown below in the queue : {"Timestamp":"2014-07-08T12:53:44.004Z","Type":"logfile","Logger":"tom.com_log","Payload":"TESTu000a"," Hostname":"alrin"}
  • 22.
    Beaver ● Beaver isa lightweight python log file shipper that is used to send logs to an intermediate broker for further processing ● Beaver is Ideal : When the VM does not have enough memory for a large JVM application to run as a shipper.
  • 23.
    Our Beaver usage Beaver VM#1 VM#2 VM#n megamd MegamEngine Heka Rabbitmq logs Queue Realtime Streamer Beaver Beaver
  • 24.
    Chef Recipe :Beaver When a VM is run, recipe(megam_logstash::beaver) is included. node.set['logstash']['key'] = "#{node.name}" node.set['logstash']['amqp'] = "#{node.name}_log" node.set['logstash']['beaver']['inputs'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::beaver" attributes like (nodename, logfiles) are set dynamically.
  • 25.
    RSYSLOG RSYSLOG is therocket-fast system for log processing. It offers high-performance, great security features and a modular design. Megam uses RSYSLOG to ship logs from VMs to Elasticsearch
  • 26.
    Chef Recipe :Rsyslog When a VM is run, recipe(megam_logstash::rsyslog) is included. node.set['rsyslog']['index'] = "#{node.name}" node.set['rsyslog']['elastic_ip'] = "monitor.megam.co.in" node.set['rsyslog']['input']['files'] = [ "/var/log/upstart/nodejs.log", "/var/log/upstart/gulpd.log" ] include_recipe "megam_logstash::rsyslog" attributes like (nodename, logfiles) are set dynamically.
  • 27.
    For more details http://www.gomegam.com email: gomegam@megam.co.in twitter: @megamsystems