Tim Gross @0x74696d
“We have built mind-bogglingly
complicated systems that we cannot see,
allowing glaring performance problems
to hide in broad daylight in our systems.”
Bryan Cantrill, Joyent CTO
ACM Queue Vol 4, Issue 1, 2006 Feb 23
“Get used to interacting with your observability
tooling every day. As part of your release cycle, or
just out of curiosity. Honestly, things are broken all
the time — you don’t even know what normal looks
like unless you’re also interacting with your
observability tooling under “normal” circumstances.”
Charity Majors
Building Badass Engineers and Badass Teams
$	ssh	ubuntu@${IP}	
Welcome	to	Ubuntu	16.04.1	LTS	(GNU/Linux	4.4.0-45-generic	x86_64)	
Certified	Ubuntu	Cloud	Image	
			__								.																			.	
	_|		|_						|	.-.	.		.	.-.	:--.	|-	
|_				_|					;|			||		|(.-'	|		|	|	
		|__|			`--'		`-'	`;-|	`-'	'		'	`-'	
																			/		;		Instance	(Ubuntu	16.04.1	LTS	20161020)	
	*	Documentation:	
	*	Management:	
	*	Support:	
	Get	cloud	support	with	Ubuntu	Advantage	Cloud	Guest:	
Last	login:	Wed	Nov		2	14:20:32	2016	from	
ubuntu@05ca8420-3b56-43c5-9e28-209bd2eab154:~$	cd	workshop/	
ubuntu@05ca8420-3b56-43c5-9e28-209bd2eab154:~/workshop$	ls	-lah	
total	68K	
drwxrwxr-x	8	ubuntu	ubuntu	4.0K	Nov		2	14:32	.	
drwxr-xr-x	7	ubuntu	ubuntu	4.0K	Oct	31	19:51	..	
drwxrwxr-x	2	ubuntu	ubuntu	4.0K	Oct	31	18:55	consul	
-rw-rw-r--	1	ubuntu	ubuntu	2.9K	Nov		2	14:32	docker-compose.yml	
drwxrwxr-x	3	ubuntu	ubuntu	4.0K	Nov		1	13:56	fortunes	
drwxrwxr-x	8	ubuntu	ubuntu	4.0K	Nov		1	19:46	.git	
-rw-rw-r--	1	ubuntu	ubuntu			28	Oct	31	18:55	.gitignore	
-rw-rw-r--	1	ubuntu	ubuntu		16K	Oct	31	17:26	LICENSE	
-rw-rw-r--	1	ubuntu	ubuntu	3.1K	Nov		1	19:46	local-compose.yml	
drwxrwxr-x	2	ubuntu	ubuntu	4.0K	Oct	31	18:55	mysql	
drwxrwxr-x	2	ubuntu	ubuntu	4.0K	Nov		1	15:26	nginx	
-rw-rw-r--	1	ubuntu	ubuntu	6.1K	Oct	31	18:55	
drwxrwxr-x	3	ubuntu	ubuntu	4.0K	Oct	31	20:31	setup
ubuntu@05ca8420-3b56-43c5-9e28-209bd2eab154:~/workshop$	docker	images	
REPOSITORY																TAG																	IMAGE	ID												CREATED													SIZE	
workshop_nginx												latest														1e8c5dabd8d5								19	hours	ago								249.6	MB	
workshop_fortunes									latest														c94501cbbf3b								25	hours	ago								61.42	MB	
workshop_mysql												latest														504d8ce0ff06								44	hours	ago								491.8	MB	
workshop_consul											latest														a52cbd2b8c03								44	hours	ago								54.69	MB	
alpine																				3.4																	baa5d63471ea								2	weeks	ago									4.799	MB	
autopilotpattern/nginx				1-r6.1.0												50ff23913232								2	weeks	ago									249.6	MB	
autopilotpattern/mysql				5.6r3.1.0											d9709015cd62								4	weeks	ago									491.8	MB	
autopilotpattern/consul			0.7r0.7													224d9f7134fa								6	weeks	ago									54.69	MB
ubuntu@05ca8420-3b56-43c5-9e28-209bd2eab154:~/workshop$	docker-compose	up	-d	
Creating	workshop_mysql_1	
Creating	workshop_consul_1	
Creating	workshop_nginx_1	
Creating	workshop_fortunes_1	
ubuntu@05ca8420-3b56-43c5-9e28-209bd2eab154:~/workshop$	docker-compose	ps	
							Name																						Command															State			Ports	
workshop_consul_1					/usr/local/bin/containerpi	...			Up	
workshop_fortunes_1			/bin/containerpilot	-confi	...			Up	
workshop_mysql_1						containerpilot	mysqld	--co	...			Up	
workshop_nginx_1						/usr/local/bin/containerpi	...			Up




Service B
Service A












Your KVM:
▸ Docker Engine
▸ Composed containers
▸ Host networking (shared IP)
Virtual Machine
SmartOS Container Hypervisor
Triton compute node:
▸ SmartOS
▸ Many customer containers
▸ VXLAN: 1 container =1+ IPs
Bare-metal compute
SmartOS Container Hypervisor
Triton Cloud
▸ Many compute nodes
▸ Containers distributed
transparently across DC
Bare-metal compute
Docker Log drivers
‣ Captures stdout/stderr; easy for 12-Factor apps
‣ Mangles multi-line logs (stack traces)
‣ Wraps log line in log driver structure: makes
parsing structured logs really messy
Logging to File
‣ Mounting volume on host for logging to file isn’t
portable across platforms (e.g. PaaS)
‣ Log shippers as co-process in container can be
arbitrarily smart
Structured Logging
‣ Require more storage, less ingest processing
‣ More metadata to search on
‣ Docker log drivers blow away structured logs =(
‣ Consider logging directly from app to collector
<20>Nov	11	2016	14:52:01	nginx/edaac4e19616	[4996]:	[2016-11-04T14:52:01+00:00]	
5c18b677d628d9511818baab9f33ceb3	"GET	/	HTTP/1.1"	200	204	"-"	"curl/
<20>Nov	11	2016	14:52:01	nginx/edaac4e19616	[4996]:	[2016-11-04T14:52:01+00:00]	
5c18b677d628d9511818baab9f33ceb3	"GET	/	HTTP/1.1"	200	204	"-"	"curl/
syslog wrapper: added by log driver
<20>Nov	11	2016	14:52:01	nginx/edaac4e19616	[4996]:	[2016-11-04T14:52:01+00:00]	
5c18b677d628d9511818baab9f33ceb3	"GET	/	HTTP/1.1"	200	204	"-"	"curl/
container identifier via Compose
hostname flag
<20>Nov	11	2016	14:52:01	nginx/edaac4e19616	[4996]:	[2016-11-04T14:52:01+00:00]	
5c18b677d628d9511818baab9f33ceb3	"GET	/	HTTP/1.1"	200	204	"-"	"curl/
Nginx access log timestamp
<20>Nov	11	2016	14:52:01	nginx/edaac4e19616	[4996]:	[2016-11-04T14:52:01+00:00]	
5c18b677d628d9511818baab9f33ceb3	"GET	/	HTTP/1.1"	200	204	"-"	"curl/
Nginx $request_id field
$	vi	~/workshop/setup/supporting/logstash/logstash.conf	
filter	{	
		#	first	parse	out	the	body	from	the	syslog	format	
		#	we've	added	tags	via	the	Docker	log	drivers	to	identify	the	
		#	specific	container	and	the	service	identifier.	See:	
		grok	{	
				match	=>	{	
						"message"	=>	'%{SYSLOG5424PRI:syslog5424_pri}+(?:%
{SYSLOGTIMESTAMP:syslog_timestamp}|-)	%{WORD:serviceid}/+(?:%
{HOSTNAME:containerid}|-)[+(?:%{POSINT:pid}|-)]:	%{GREEDYDATA:msg}'	
		syslog_pri	{	}	
		#	failed	to	match,	so	parse	as	error	
		if	"_grokparsefailure"	in	[tags]	{	
				mutate	{	
						add_tag	=>	"parse_error"	
		}	else	{	
						mutate	{	
								#	the	raw	message	is	redundant	data	at	this	point	
								remove_field	=>	[	"message",	"@source_host"	]	
						mutate	{	
								#	once	we've	got	a	valid	syslog	parse	we	can	discard	all	this	rubbish	
								#	because	the	Docker	log	driver	stomped	all	over	anything	useful	
								remove_field	=>	[	
											"syslog_hostname",	"syslog_message",	"syslog_timestamp",	
											"syslog_severity",	"syslog_facility_code",	"syslog_severity_code",	
											"syslog_facility",	"syslog5424_pri"	
		#	filter	to	get	the	application-specific	log	format.	lots	of	these	have	
		#	their	own	timestamps,	which	we'll	capture	and	overwrite	the	"outermost"	
		#	timestamp	with	
		grok	{	
				#	nginx	access	log	
				match	=>	{	
						"msg"	=>	'[%{TIMESTAMP_ISO8601:log_timestamp}]	%{WORD:req_id}	"%
{WORD:http_method}	%{URIPATHPARAM:http_request}	HTTP/%{NUMBER:http_version}"	%
{NUMBER:http_code}	%{NUMBER:http_bytes_sent}	(?:%{QUOTEDSTRING:http_referer}|-)	%
{IP:client}	(?:%{QUOTEDSTRING:http_user_agent}|-)'	
				#	nginx	error	msg	
				match	=>	{	
						"msg"	=>	'(?<log_timestamp>%{YEAR}/%{MONTHNUM}/%{MONTHDAY}	%{TIME})	(?
<http_timestamp>%{YEAR}/%{MONTHNUM}/%{MONTHDAY}	%{TIME})?	?%{GREEDYDATA:msg}'	
				#	mysql	log	messages	
				match	=>	{	
						"msg"	=>	'%{TIMESTAMP_ISO8601:log_timestamp}	%{NUMBER}	[%{LOGLEVEL:level}]	?%
				#	mysql	log	messages	
				match	=>	{	
						"msg"	=>	'(?<log_timestamp>%{YEAR}/%{MONTHNUM}/%{MONTHDAY}	%{TIME})?	%
{LOGLEVEL:level}	manage	%{GREEDYDATA:msg}'	
				#	ContainerPilot	log	format	
				match	=>	{	
						"msg"	=>	'(?<log_timestamp>%{YEAR}/%{MONTHNUM}/%{MONTHDAY}	%{TIME})?	?%
				#	catchall	
				match	=>	{	"msg"	=>	"%{GREEDYDATA:msg}"	}	
				overwrite	=>	[	"log_timestamp"	]	
				overwrite	=>	[	"msg"	]	
		}	#	end	grok
		#	fortunes	application	
		json	{	
				source	=>	"msg"	
				add_field	=>	[	"log_timestamp",	"%{time}"]	#	overwrites	w/	timestamp	from	app	
				remove_field	=>	[	"time"	]	
		#	anything	other	than	our	fortunes	application	will	not	be	JSON	
		#	so	we'll	ignore	this	error	
		if	"_jsonparsefailure"	in	[tags]	{	
				mutate	{	remove_tag	=>	"_jsonparsefailure"	}	
‣ Ephemeral containers == higher dimensionality
‣ Use service discovery to find collector / targets
‣ Pulling metrics safer / more scalable in multi-tenant
environments (tenants can’t DDOS collector)
I’m “application” at

I’m “telemetry” at

Where is


Container Design
‣ Large containers == slower deploys
‣ Mount common tooling read-only from the host
Distributed Request Tracing
‣ Zipkin / Open Tracing looks promising; little
support outside app code (load balancers, DB)
‣ Nginx can inject request ID field
‣ Carry request ID in your logging
Tim Gross @0x74696d

ContainerDays NYC 2016: "Observability and Manageability in a Container Environment" (Tim Gross)