Making Operations Visible

Nick Galbreath  ニック ガルブレス

DevOpsDays Tokyo 2013
#devopsdays
@ngalbreath
nickg@client9.com
ngalbreath@iponweb.net
http://slidesha.re/1h9Aqye
http://www.client9.com/
It's also on video!

http://bit.ly/1gaEmDS
Who is nickg?

Nick Galbreath

http://client9.com/20130501
@ngalbreath
www.client9.com
Online Advertising Infrastructure
オンライン広告 インフラ

東京
ロシヤ モスクワ

http://www.iponweb.jp/
Continuous
Deployment
• In 2012, I spoke many times on
continuous deployment.

• But changing from release cycles to

cont...
Goal
• I'm hoping that adding new metrics to
the application becomes so addictive
that you'll want to shorten release
cycl...
What is DevOps?
• Puppet, Chef, Annsible?
• GitHub? AWS? The Cloud?
• Continuous Deployment?
Yes, but these are tools. Gre...
It's About
Communication
• Between machines
• Between team members
• Between Dev and Ops
But in many companies there is a ...
You're Invisible
• If you are in Business, you are

invisible to Development and Tech
Operations

• If you are in Operatio...
Invisible Things
Aren't Valued
Developer
• "I don't know what my code will do in

production and ops and let's them deal
with it.

• "Why doesn't ops fix ...
Business
• Why do I have to wait till end of the
month for a report?

• "Did the last weeks release change
anything?"

• "...
Operations
• Why are they always bothering me.
• I've got work to do!
• Why do we have do another release
again... can't d...
This is really destructive
To you
To your Team
To your company.
All of This
Can Fixed By Making
Operations Visible
with data
Not just technical operations but
company operations.
Your company is full
of data!
So Why Not Expose
This Data?
Here's a list of excuses I've heard
"But I already have
graphing in my
alerting system"
• Maybe. But it's junk
• Can't share
• Can't do data mash-ups
• Can't ...
"They wouldn't
understand."
• "They won't understand the data so
what's the point of sharing it."

• First, "they" probabl...
"They might break
something."
• "The data is in our alerting system, we
don't want you to break it."

• Assumes "they" are...
"It's not your job,
so you don't need to
know."
"That information isn't
important"
• This excuse is typically caused by fe...
"I'm not making
another system,
duplicating data is bad."
• For operational metrics is very ok

to have a redundant copy o...
"I'm too busy."
"It's too dangerous"
"I don't know how."
• These are real problems.
• So let's fix it!
One Machine,
One Day,
One Person
Challenge!
Let's get 100% of operational metrics in,
and enable the application to make a...
Graphite
• https://github.com/graphite-project
• http://graphite.readthedocs.org/

• Similar to RRDTool, Ganglia, Cacti
• ...
Graphite isn't Perfect
• Documentation isn't great
(but getting better)

• A few QA issues
• Somewhat odd stack

(python-t...
Graphite Ecosystem
• Flexible input and output
• REST API for graphs
• Simple UI for mashups and dashboards
• 3rd party, c...
Makes Sharing Easy
• Do you have an interesting graph?

It's

just a URL!

• Dashboards are easy since graphs are
just URL...
One Machine
One Day!
• A single low-end machine should have

capacity for a few thousand metrics per
minute from 50+ machi...
One Day,
One Person
• Graphite is not hard to install, but it is a
bit messy.

• But might be as easy as

"apt-get install...
Operational Stats
• You could parse /proc, ps, df,

netstat, etc and write your own
custom scripts....

• ...or use Diamon...
Metrics in Diamond now
• Memory
• CPU
• Disk
• Network

• Apache
• NGINX
• MySQL
• SNMP

and many more
100% of pure operational metrics are now shared!

But what about the
your applications?
And business metrics?
Enter StatsD
• https://github.com/etsy/statsd

• Your application sends event data to
statsd, as it happens, in real-time....
The Magic of UDP
• Your application sends metrics in a
UDP packet.

• UDP is error-free. No exceptions, No
timeouts. It ca...
Let's Count Logins!
• Most StatsD client APIs are
one-file, no C, simple.

• Add one line to your login code.
StatsD::incre...
Events!
• You can also graph low-frequency
events.

• Just send another StatsD request in
your batch script

StatsD::incre...
Server

login,1

Server

login,1

Server

login,1

StatsD

deploy,1

(login,3), (deploy,1)

Deploy Script
Graphite
Measure Anything, Measure Everything
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Logins By Country!
• get country code from IP address
• make a new metric
"login_country" instantly

StatsD::increment('lo...
Make Dashboards

• and make frameworks to make new
dashboards, easy.
Default Dashboard
Good for experiments
Dashboards

Make it easy for your customers
Make
Operations
Visible
• Make the company visible.
• Enable communication
• Do the

One Machine, One Day, One Person
Chal...
Thanks!
DevOpsDays Tokyo 2013
DevOpsDays
is on video!

Tokyo 2013

• The entire event is
http://vimeo.com/album/2559722
http://vim...
DevOpsDays
Tokyo 2013
Media Coverage
•

http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/

•

http://itpro.nikkeib...
DevOpsDays
Tokyo 2013
Attendee Coverage
•
•
•
•

http://mass.hatenablog.com/entry/2013/09/28/205309

•
•

http://toshi-miu...
Making operations visible - Nick Gallbreath
Upcoming SlideShare
Loading in …5
×

Making operations visible - Nick Gallbreath

1,345 views
1,219 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,345
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Making operations visible - Nick Gallbreath

  1. 1. Making Operations Visible Nick Galbreath  ニック ガルブレス DevOpsDays Tokyo 2013
  2. 2. #devopsdays @ngalbreath nickg@client9.com ngalbreath@iponweb.net
  3. 3. http://slidesha.re/1h9Aqye http://www.client9.com/
  4. 4. It's also on video! http://bit.ly/1gaEmDS
  5. 5. Who is nickg? Nick Galbreath http://client9.com/20130501 @ngalbreath www.client9.com
  6. 6. Online Advertising Infrastructure オンライン広告 インフラ 東京 ロシヤ モスクワ http://www.iponweb.jp/
  7. 7. Continuous Deployment • In 2012, I spoke many times on continuous deployment. • But changing from release cycles to continuous deployment is too big a change for most organization, and they don't have the tools to do it.
  8. 8. Goal • I'm hoping that adding new metrics to the application becomes so addictive that you'll want to shorten release cycles.
  9. 9. What is DevOps? • Puppet, Chef, Annsible? • GitHub? AWS? The Cloud? • Continuous Deployment? Yes, but these are tools. Great tools.
  10. 10. It's About Communication • Between machines • Between team members • Between Dev and Ops But in many companies there is a bigger problem
  11. 11. You're Invisible • If you are in Business, you are invisible to Development and Tech Operations • If you are in Operations, you are invisible to Business and Development • If you are in Development, you are invisible to Business and Operations.
  12. 12. Invisible Things Aren't Valued
  13. 13. Developer • "I don't know what my code will do in production and ops and let's them deal with it. • "Why doesn't ops fix these problems." • "What does Ops do all day?"
  14. 14. Business • Why do I have to wait till end of the month for a report? • "Did the last weeks release change anything?" • "What don't they understand the impact of that bug, outage, etc?"
  15. 15. Operations • Why are they always bothering me. • I've got work to do! • Why do we have do another release again... can't developers do a better job? • "What does this company do?" (really)
  16. 16. This is really destructive To you To your Team To your company.
  17. 17. All of This Can Fixed By Making Operations Visible with data Not just technical operations but company operations.
  18. 18. Your company is full of data! So Why Not Expose This Data? Here's a list of excuses I've heard
  19. 19. "But I already have graphing in my alerting system" • Maybe. But it's junk • Can't share • Can't do data mash-ups • Can't do data transformations
  20. 20. "They wouldn't understand." • "They won't understand the data so what's the point of sharing it." • First, "they" probably do. And more people looking at ops metrics, the better. • Us vs. Them = Fail.
  21. 21. "They might break something." • "The data is in our alerting system, we don't want you to break it." • Assumes "they" are incompetent, or malicious. Learn to trust.
  22. 22. "It's not your job, so you don't need to know." "That information isn't important" • This excuse is typically caused by fear. • Why are you deciding what's important?
  23. 23. "I'm not making another system, duplicating data is bad." • For operational metrics is very ok to have a redundant copy of data. • Completely different goals. • Use as alerting-beta
  24. 24. "I'm too busy." "It's too dangerous" "I don't know how." • These are real problems. • So let's fix it!
  25. 25. One Machine, One Day, One Person Challenge! Let's get 100% of operational metrics in, and enable the application to make and share new metrics on demand without any help from you.
  26. 26. Graphite • https://github.com/graphite-project • http://graphite.readthedocs.org/ • Similar to RRDTool, Ganglia, Cacti • Uses specialized data storage • Uses specialized queries • Optimized for time series
  27. 27. Graphite isn't Perfect • Documentation isn't great (but getting better) • A few QA issues • Somewhat odd stack (python-twisted, django)
  28. 28. Graphite Ecosystem • Flexible input and output • REST API for graphs • Simple UI for mashups and dashboards • 3rd party, custom, client-side dashboards
  29. 29. Makes Sharing Easy • Do you have an interesting graph? It's just a URL! • Dashboards are easy since graphs are just URLs. Very easy to make HTML dashboards.
  30. 30. One Machine One Day! • A single low-end machine should have capacity for a few thousand metrics per minute from 50+ machines. • Graphite is not CPU intensive, but needs fast disks and/or more memory.
  31. 31. One Day, One Person • Graphite is not hard to install, but it is a bit messy. • But might be as easy as "apt-get install graphite" on your system. • It would be good to have a workshop or prebuilt AMI for EC2 • But not today :-(
  32. 32. Operational Stats • You could parse /proc, ps, df, netstat, etc and write your own custom scripts.... • ...or use Diamond from BrightCove •https://github.com/ BrightcoveOS/Diamond
  33. 33. Metrics in Diamond now • Memory • CPU • Disk • Network • Apache • NGINX • MySQL • SNMP and many more
  34. 34. 100% of pure operational metrics are now shared! But what about the your applications? And business metrics?
  35. 35. Enter StatsD • https://github.com/etsy/statsd • Your application sends event data to statsd, as it happens, in real-time. • StatsD collects this data and computes time-series metrics (sum, min, max, average) • Once a minute, it writes data to Graphite
  36. 36. The Magic of UDP • Your application sends metrics in a UDP packet. • UDP is error-free. No exceptions, No timeouts. It can not cause your application to crash • It will not overload your network. • You may lose metrics, but in an intranet, it's rare.
  37. 37. Let's Count Logins! • Most StatsD client APIs are one-file, no C, simple. • Add one line to your login code. StatsD::increment('logins'); • That's it!
  38. 38. Events! • You can also graph low-frequency events. • Just send another StatsD request in your batch script StatsD::increment("deploy", 1); • Do it on reboots, installs, core dumps. • New bugs, new hires, new code commits. • Use drawAsInfinite to display
  39. 39. Server login,1 Server login,1 Server login,1 StatsD deploy,1 (login,3), (deploy,1) Deploy Script Graphite
  40. 40. Measure Anything, Measure Everything http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
  41. 41. Logins By Country! • get country code from IP address • make a new metric "login_country" instantly StatsD::increment('logins'); $kuni = geoip2country($ipv4); StatsD::increment('logins.$kuni');
  42. 42. Make Dashboards • and make frameworks to make new dashboards, easy.
  43. 43. Default Dashboard Good for experiments
  44. 44. Dashboards Make it easy for your customers
  45. 45. Make Operations Visible • Make the company visible. • Enable communication • Do the One Machine, One Day, One Person Challenge!
  46. 46. Thanks!
  47. 47. DevOpsDays Tokyo 2013 DevOpsDays is on video! Tokyo 2013 • The entire event is http://vimeo.com/album/2559722 http://vimeo.com/album/2559722
  48. 48. DevOpsDays Tokyo 2013 Media Coverage • http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/ • http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507755/ • http://itpro.nikkeibp.co.jp/article/NEWS/20131001/507959/ • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_1.html • http://www.publickey1.jp/blog/13/ githubdevopsboxenhubotdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/ githubboxenhubotdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_2.html
  49. 49. DevOpsDays Tokyo 2013 Attendee Coverage • • • • http://mass.hatenablog.com/entry/2013/09/28/205309 • • http://toshi-miura.hatenablog.com/entry/2013/09/29/222609 • http://codezine.jp/article/detail/7438 http://d.hatena.ne.jp/n-sega/20130928/1380373634 http://kazuph.hateblo.jp/entry/2013/09/28/152302 http://jedipunkz.github.io/blog/2013/09/29/devops-day-tokyo-2013report/ http://lewuathe.github.io/blog/2013/09/28/devopsday-tokyo-2013nixingtutekitayo/

×