6 Million Ways To Log In Docker - NYC Docker Meetup 12/17/2014
1. Six Million Ways To Log In Docker
Dwayne Hoover, Senior Field Engineer
Christian Beedgen, Co-Founder & CTO
December 17th, 2014
Sumo Logic Confidential
2. Introduction
Sumo Logic Background
What Our Customers Are Telling Us
A Catalog Of Ways To Log In Docker
What We Would Like To Build
Agenda
Sumo Logic Confidential2
3. Señor Field Engineer at
Sumo Logic since 2013
Former developer and data
warehouse turned poly-
structured data junkie
Let’s Make This Personal - Who We Are
Co-Founder & CTO, Sumo
Logic since 2010
Server guy, Chief Architect,
ArcSight, 2001 – 2009
Dwayne Christian
4. The Machine Data Cloud
4
Search
Visualize
Predict
Sumo Logic Confidential
5. Sumo Logic is the only enterprise-grade 100% service-based offering
Sumo Logic Deployment “Architecture”
Sumo Logic Confidential5
6. Use Cases
Sumo Logic Confidential6
1. Availability &
Performance
2. Security and
Compliance
3. Customer
Analytics
8. We have one process per container
We like to log to stdout
We have multiple processes per container
We run the Sumo Logic collector on the host
We are looking into using Beanstalk with Docker
We are waiting for Amazon ECS
Everyone here loves Docker
We are logging straight from the application
We are using /dev/log for Syslog
What Our Customers Are Telling Us
Sumo Logic Confidential8
10. One size doesn’t (yet?) fit all
It’s not our job to judge
What does the community say?
Let’s figure out how to collect them all!
What We Are Hearing
Sumo Logic Confidential10
11. Mailing list thread started in 2013
– https://groups.google.com/forum/#!searchin/docker-
dev/logging/docker-dev/3paGTWD6xyw/hvZlnFD5x5sJ
Superseded by Logging Drivers proposal mid-2014
– https://github.com/docker/docker/issues/7195
However, as of now no clear path
– Extension proposal as the way forward for integrating log forwarders?
What Does The Community Say
Sumo Logic Confidential11
13. Logs are…
– The actual message plus a bunch of meta data
– At scale, the meta data becomes very important
Timestamp
– With date, full year, down to at least milliseconds
– With time zone, ideally as an offset, or identifiable as straight UTC
Docker host info
– FQDN or IP address or both
– Correlate Docker daemon logs with container logs
Container ID
– Need a way to identify the unique instance of course
– With name if possible, sometimes we are just human…
Image ID
– To correlate, potentially, with logs from other containers from the same image
– Name would likely help the human operator as well
Process ID
– To correlate with logs from the process if there’s no other way to identify them
What Should Be In A Log
Sumo Logic Confidential13
14. Docker captures container stdout to file in JSON format
In /var/lib/docker/containers/[ID]/[ID]-json.log
The docker logs command can spit back the logs
Each invocation returns the full logs all over
But it can also be used to tail the logs
Careful! Stdout logs grow without bound on the host
Consider using logrotate on the Docker host
https://github.com/docker/docker/issues/7333
What Docker Provides
Sumo Logic Confidential14
docker logs –tf –-tail 0 [ID]
17. Assuming you have control over the application
Use a library that can send Syslog
Or use a vendor library if HTTPS is required
This can work for other stack components as well
Apache can be coerced into sending Syslog
Nginx has an easy way to send error/access to Syslog
So does Postgres, and almost any Java-based app
Log Directly From The Application
Sumo Logic Confidential17
1
18. If you want to use Sumo Logic…
There’s an image to quickly set up a Syslog collector
Configure your applications to send to the host at 514
Log Directly From The Application
Sumo Logic Confidential18
docker run -d -p 514:514 -p 514:514/udp --name="sumo-logic-collector"
sumologic/collector:latest-syslog [Access ID] [Access key]
1
19. Pros
– Conceptually pretty straightforward
– Might not even have to change anything
– Syslog includes the container ID as the hostname
Cons
– Need control over the code or at least the configuration
– Every component might need different situps
– HTTPS straight from the app might not include the container ID
– Logging to service without a collector loses data if link is down
Log Directly From The Application
Sumo Logic Confidential19
1
20. Various application stacks
– http://help.papertrailapp.com/
Log4J
– https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SyslogAppender.html
Apache Web Server
– http://httpd.apache.org/docs/trunk/mod/mod_syslog.html
– https://raymii.org/s/snippets/Apache_access_and_error_log_to_syslog.html
Nginx
– http://nginx.org/en/docs/syslog.html
Postgres
– http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html
Sumo Logic blog on official syslog collector image
– http://www.sumologic.com/blog/company/an-official-docker-image-for-the-sumo-logic-collector
– https://github.com/SumoLogic/sumologic-collector-docker
Log Directly From The Application
Sumo Logic Confidential20
1
21. Install A File Collector In The Container
Sumo Logic Confidential21
2
22. It is not terribly uncommon that logs go to files
There’s many ways to tail logs and ship them off
Logstash, Rsyslog, Sumo Logic Collector, Splunk Forwarder, …
Log to volumes to bypass layered file system
Also, logs are not really container state?
Install A File Collector In The Container
Sumo Logic Confidential22
2
23. Pros
– Conceptually pretty straightforward
– If everything logs to files already, not a big change
– Collectors can be configured as part of building the image
Cons
– One collector per container could be unacceptable overhead
– No container ID included unless collector picks up hostname
Install A File Collector In The Container
Sumo Logic Confidential23
2
24. Install A File Collector As A Container
Sumo Logic Confidential24
3
25. Normalize the collector-per-container idea
Create a container that has only the collector
Mount a host directory into that container to collect from
Mount the same directory into each container
Configure the container to write log files to the mount
Configure the collector container to recursively collect
Could collector on the host, but not Docker-native
For example, using the Sumo Logic file collector image
Install A File Collector As A Container
Sumo Logic Confidential25
docker run -v /tmp/clogs:/tmp/clogs -d --name="sumo-logic-collector"
sumologic/collector:latest-file [Access ID] [Access key]
3
26. What about name clashes in the shared mounted directory?
Create a sub directory named after the container ID!
Assume the Dockerfile ends in:
Then do this in run.sh:
Install A File Collector As A Container
Sumo Logic Confidential26
ENTRYPOINT ["/bin/bash", "run.sh"]
# Create log directory
mkdir -p /tmp/clogs/$HOSTNAME
ln -s /tmp/clogs/$HOSTNAME /tmp/logs
# Do something
echo "ls -la /tmp/clogs"
ls -la /tmp/clogs
echo "ls -la /tmp/logs"
ls -la /tmp/logs
3
27. What about name clashes in the shared mounted directory?
Create a sub directory named after the container ID!
Assume the Dockerfile ends in:
Then do this in run.sh and observe:
Install A File Collector As A Container
Sumo Logic Confidential27
ENTRYPOINT ["/bin/bash", "run.sh"]
ls -la /tmp/clogs
total 16
drwxr-xr-x 4 root root 4096 Dec 15 23:51 .
drwxrwxrwt 3 root root 4096 Dec 15 23:51 ..
drwxr-xr-x 2 root root 4096 Dec 15 23:51 43da9cc4d050
drwxr-xr-x 2 root root 4096 Dec 15 23:51 7df836a68214
ls -la /tmp/logs
lrwxrwxrwx 1 root root 23 Dec 15 23:51 /tmp/logs -> /tmp/clogs/43da9cc4d050
3
28. Sumo Logic blog on official collector images
– http://www.sumologic.com/blog/company/an-official-docker-image-
for-the-sumo-logic-collector
– https://github.com/SumoLogic/sumologic-collector-docker
Rainer Gerhards on Rsyslog’s file input module
– http://www.slideshare.net/rainergerhards1/using-wildcards-with-
rsyslogs-file-monitor-imfile
OWASP Log Injection
– https://www.owasp.org/index.php/Log_injection
Install A File Collector As A Container
Sumo Logic Confidential28
3
29. Pros
– Not terribly hard to understand and setup
– File collection is very common collector functionality and can scale
Cons
– Have to expose a host directory to all containers
– Mounted directory might be considered an attack vector
– Unless performing described sit ups, name clashes likely
Install A File Collector As A Container
Sumo Logic Confidential29
3
30. Install A Syslog Collector As A Container
Sumo Logic Confidential30
4
31. If you want to use Syslog, and Sumo Logic…
There’s an image to quickly set up a Syslog collector
Use linking to configure the Syslog location in the containers
Easy to test with
Install A Syslog Collector As A Container
Sumo Logic Confidential31
docker run –d --name="sumo-logic-collector"
sumologic/collector:latest-syslog [Access ID] [Access key]
docker run -it --link sumo-logic-collector:sumo ubuntu /bin/bash
echo "I'm in ur linx" | nc -v -u -w 0 $SUMO_PORT_514_TCP_ADDR $SUMO_PORT_514_TCP_PORT
4
32. Pros
– Not terribly hard to understand and setup
– Will retain origin hostname and container ID
Cons
– Every component might need different situps for Syslog
Install A Syslog Collector As A Container
Sumo Logic Confidential32
4
33. Use Host Syslog For Local Syslog
Sumo Logic Confidential33
5
34. The process(es) in the container already do Syslog
There is some chance that the host is running Syslog daemon
Configure the host Syslog daemon to forward
Mount /dev/log from the host to /dev/log in the container
Now tail the host syslog
Run a container to test if it works
Should see something like this in the tail’ed file
Use Host Syslog For Local Syslog
Sumo Logic Confidential34
docker run -d -v /dev/log:/dev/log [image]
tail -F /var/log/syslog
docker run -v /dev/log:/dev/log ubuntu logger -t schnitzel Now!
Dec 14 16:33:49 ubuntu schnitzel: Now!
5
35. Pros
– Nothing extra to install if the host has Syslog already
– Host’s Syslog will be collected as well
Cons
– Hostname is set to the receivers hostname, no container ID in the logs
Use Host Syslog For Local Syslog
Sumo Logic Confidential35
5
36. Use A Syslog Container For Local Syslog
Sumo Logic Confidential36
6
37. From Jérôme Petazzoni’s blog – use a bind mount!
Create a simple Rsyslog container, claim /dev as a volume
Then run the Syslog container, capturing its /dev in /tmp/syslogdev
Finally, run the containers that log to local
Use A Syslog Container For Local Syslog
Sumo Logic Confidential37
docker run --name syslog -d -v /tmp/syslogdev:/dev [image]
FROM ubuntu:14.04
RUN apt-get update -q
RUN apt-get install rsyslog
CMD rsyslogd -n
VOLUME /dev
VOLUME /var/log
docker run --name [image-name] -d -v /tmp/syslogdev/log:/dev/log [image]
6
38. Jérôme Petazzoni’s Blog
– http://jpetazzo.github.io/2014/08/24/syslog-docker/
What is a bind mount?
– http://docs.1h.com/Bind_mounts
– http://man7.org/linux/man-pages/man8/mount.8.html
Use A Syslog Container For Local Syslog
Sumo Logic Confidential38
6
39. Pros
– Removes the need to have and configure Syslog on the host
– Encapsulates Syslog collection in a Docker-native way
Cons
– Hostname is set to the receivers hostname, no container ID in the logs
Use A Syslog Container For Local Syslog
Sumo Logic Confidential39
6
40. Containers model processes, not machines
Docker persists container stdout on the host
Simply point the collectors’s file collection mechanism to this path
Collector can also be a container, if the above path is mounted
For example, the Sumo file collector image expects logs in /tmp/clogs
Log To Stdout And Use A File Collector
Sumo Logic Confidential40
/var/lib/docker/containers/*/*-json.log
docker run -d -v /var/lib/docker/containers:/tmp/clogs
sumologic/collector:latest-file [Access ID] [Access Key]
7
41. Pros
– Relatively straightforward to set up
– Container ID available via filename
Cons
– Docker doesn’t bound the stdout logs on disk
– File collector needs to be able to deal with logrotate if used
– Must be willing to live with host directory mounted in container
Log To Stdout And Use A File Collector
Sumo Logic Confidential41
7
42. Rainer Gerhards on Rsyslog’s file input module
– http://www.slideshare.net/rainergerhards1/using-wildcards-with-
rsyslogs-file-monitor-imfile
Sumo Logic blog on official collector images and Github repo
– http://www.sumologic.com/blog/company/an-official-docker-image-
for-the-sumo-logic-collector
– https://github.com/SumoLogic/sumologic-collector-docker
On using Logrotate with Docker
– https://github.com/docker/docker/issues/7333
Log To Stdout And Use A File Collector
Sumo Logic Confidential42
7
43. Logspout is a very lightweight container that forwards stdout to syslog
Logspout uses the Docker Event API to track containers coming and going
For each container, Logspout gets the stdout from Docker via API
By default everything gets forwarded to the specified endpoint
Logspout supports routing to different endpoints
Routing rules can be expressed as filters on container name & ID
Logspout also exposes a little HTTP interface to bounce logs back live
We are hacking Logspout to forward to Sumo’s HTTP endpoint as well!
Log To Stdout And Use Logspout
Sumo Logic Confidential43
docker run –d –p 8000:8000 –v /var/run/docker.sock:/tmp/docker.sock
progrium/logspout syslog://[syslog-host]:[syslog-port]
curl localhost:8000/logs
8
44. Pros
– Trivial to set up and very lightweight
– Adds container ID and name to the logs
– Flexible, optionally persistent routing for complicated cases
Cons
– Docker doesn’t bound the stdout logs on disk
Log To Stdout And Use Logspout
Sumo Logic Confidential44
8
45. Logspout Github repository
– https://github.com/progrium/logspout
Various Articles
– http://stackengine.com/docker-logs-aggregating-ease/
– http://blog.froese.org/2014/05/15/docker-logspout-and-nginx/
On using Logrotate with Docker
– https://github.com/docker/docker/issues/7333
Log To Stdout And Use Logspout
Sumo Logic Confidential45
8
47. Ultimately, all files from container file systems end up on disk
One of my boxes is running AUFS and I can see all files in:
A simple test with tailing a file in a container from the host works…
Collect From Docker Filesystems
Sumo Logic Confidential47
9
/var/lib/docker/aufs/mnt/[Container ID]
48. Unfortunately, this doesn’t work with Devicemapper
Another box is using devicemapper and I can see all files in:
A simple test with tailing a file in a container from the host works
So now you can slab a file collector on the host and configure it…?
With devicemapper, stopping a container while tailing leads to error on start
This error will persist until the other process (tail) is stopped
And then, a manual umount is required before docker start
Collect From Docker Filesystems
Sumo Logic Confidential48
9
/var/lib/docker/devicemapper/mnt/[Container ID]/rootfs/
Error response from daemon: Cannot start container 6f62be47025d:
Error getting container 6f62be47025d... from driver devicemapper:
Error mounting '/dev/mapper/docker-202:1-277656-6f62be47025d....' on
'/var/lib/docker/devicemapper/mnt/6f62be47025d...': device or
resource busy
49. Pros
– If legal, it means a lot of existing file collection tools can just be used
Cons
– Could just be a batshit crazy idea and the universe collapses into itself
– Need to find a way to configure file collector per image
Collect From Docker Filesystems
Sumo Logic Confidential49
9
51. docker exec allows injection of a process into a container
A collector could live in a container, and talk to the Docker daemon
The collector could use the Event API to track containers come and go
Basically, just like Logspout… or put it on the host, I guess
When a container appears, the Exec API could be used to inject a process
The process could run the collection logic, starting with watching paths, etc.
The process could also actually tail the files and send logs to a service
Or, it could send logs back to the collector container via stdout or something
The collector in the container could then do caching, compression, …
Inject Collector Via Docker Exec
Sumo Logic Confidential51
10
52. Pros
– This could actually be a generic and non-crazy way to collect log files
– There’s a ton of tools that know how to collect from files
Cons
– In reality, will people accept/allow docker exec?
– It basically allows a container to access another container as root
Inject Collector Via Docker Exec
Sumo Logic Confidential52
10
54. Something that catches stdout from all containers…
– Logspout does this already!
…and that can tail files in containers in a clean way…
– Container can define which path(s)
…and forward messages via different protocols
– Logspout does Syslog, we are adding HTTP POST
We think the extensions discussion is very relevant!
– More realistic than adding to core Docker codebase?
What We Would Like To Build
Sumo Logic Confidential54
INTROS
For those of you new to Sumo Logic, we’re a Silicon Valley-based startup - founded by industry experts with strong backgrounds in Data Science, Enterprise Software & Internet Services and backed by some of the top VC firms in the Business Today.
We were founded with a simple but far-reaching goal:
To meet the challenge of the largest data explosion in history and help turn that data—whatever its type, location or volume—into actionable IT and business insights.
It you are in IT today you have a choice in front of you. You can choose to to look at the machine data output of your infrastructure as just fumes and exhaust from your Apps, servers and Network OR you can look at it as the Life Blood of your Operation and Business. The Pulse.
We are here to talk about how SumoLogic is disrupting the Status Quo and what that means to you. By Status Quo we are talking about the prior generation of On-Premise software, Home Grown solutions, and that one we all know…”Ignore and Wait.” Our intent is to break the barriers that were previously in front of you regarding Data Silos, inability to handle ever growing Volumes of data, Antiquated Architectures, and manual analytics.
From Top to Bottom here we have a distinct focus on Customer Satisfaction and doing things the Right Way. We do this all as a Service – Secure, Reliable, Flexible with a ground breaking Time to Value.
So let’s get started.
Desired State: Turning a Chaotic Situation into your Benefit and Advantage
Sumologic takes this chaos of information, 1000’s of sources every second of the day and makes it Human Readable for IT insights to make business decisions.
How do we do it better than current solutions?