Making operations visible - devopsdays tokyo 2013

11,157 views
11,162 views

Published on

Published in: Technology
1 Comment
33 Likes
Statistics
Notes
No Downloads
Views
Total views
11,157
On SlideShare
0
From Embeds
0
Number of Embeds
6,902
Actions
Shares
0
Downloads
52
Comments
1
Likes
33
Embeds 0
No embeds

No notes for slide

Making operations visible - devopsdays tokyo 2013

  1. 1. Making Operations Visible Nick Galbreath  ニック ガルブレス DevOpsDays Tokyo 2013
  2. 2. #devopsdays @ngalbreath nickg@client9.com ngalbreath@iponweb.net
  3. 3. http://slidesha.re/1h9Aqye http://www.client9.com/
  4. 4. It's also on video! http://bit.ly/1gaEmDS
  5. 5. Who is nickg? Nick Galbreath http://client9.com/20130501 @ngalbreath www.client9.com
  6. 6. Online Advertising Infrastructure オンライン広告 インフラ 東京 ロシヤ モスクワ http://www.iponweb.jp/
  7. 7. Continuous Deployment • In 2012, I spoke many times on continuous deployment. • But changing from release cycles to continuous deployment is too big a change for most organization, and they don't have the tools to do it.
  8. 8. Goal • I'm hoping that adding new metrics to the application becomes so addictive that you'll want to shorten release cycles.
  9. 9. What is DevOps? • Puppet, Chef, Annsible? • GitHub? AWS? The Cloud? • Continuous Deployment? Yes, but these are tools. Great tools.
  10. 10. It's About Communication • Between machines • Between team members • Between Dev and Ops But in many companies there is a bigger problem
  11. 11. You're Invisible • If you are in Business, you are invisible to Development and Tech Operations • If you are in Operations, you are invisible to Business and Development • If you are in Development, you are invisible to Business and Operations.
  12. 12. Invisible Things Aren't Valued
  13. 13. Developer • "I don't know what my code will do in production and ops and let's them deal with it. • "Why doesn't ops fix these problems." • "What does Ops do all day?"
  14. 14. Business • Why do I have to wait till end of the month for a report? • "Did the last weeks release change anything?" • "What don't they understand the impact of that bug, outage, etc?"
  15. 15. Operations • Why are they always bothering me. • I've got work to do! • Why do we have do another release again... can't developers do a better job? • "What does this company do?" (really)
  16. 16. This is really destructive To you To your Team To your company.
  17. 17. All of This Can Fixed By Making Operations Visible with data Not just technical operations but company operations.
  18. 18. Your company is full of data! So Why Not Expose This Data? Here's a list of excuses I've heard
  19. 19. "But I already have graphing in my alerting system" • Maybe. But it's junk • Can't share • Can't do data mash-ups • Can't do data transformations
  20. 20. "They wouldn't understand." • "They won't understand the data so what's the point of sharing it." • First, "they" probably do. And more people looking at ops metrics, the better. • Us vs. Them = Fail.
  21. 21. "They might break something." • "The data is in our alerting system, we don't want you to break it." • Assumes "they" are incompetent, or malicious. Learn to trust.
  22. 22. "It's not your job, so you don't need to know." "That information isn't important" • This excuse is typically caused by fear. • Why are you deciding what's important?
  23. 23. "I'm not making another system, duplicating data is bad." • For operational metrics is very ok to have a redundant copy of data. • Completely different goals. • Use as alerting-beta
  24. 24. "I'm too busy." "It's too dangerous" "I don't know how." • These are real problems. • So let's fix it!
  25. 25. One Machine, One Day, One Person Challenge! Let's get 100% of operational metrics in, and enable the application to make and share new metrics on demand without any help from you.
  26. 26. Graphite • https://github.com/graphite-project • http://graphite.readthedocs.org/ • Similar to RRDTool, Ganglia, Cacti • Uses specialized data storage • Uses specialized queries • Optimized for time series
  27. 27. Graphite isn't Perfect • Documentation isn't great (but getting better) • A few QA issues • Somewhat odd stack (python-twisted, django)
  28. 28. Graphite Ecosystem • Flexible input and output • REST API for graphs • Simple UI for mashups and dashboards • 3rd party, custom, client-side dashboards
  29. 29. Makes Sharing Easy • Do you have an interesting graph? It's just a URL! • Dashboards are easy since graphs are just URLs. Very easy to make HTML dashboards.
  30. 30. One Machine One Day! • A single low-end machine should have capacity for a few thousand metrics per minute from 50+ machines. • Graphite is not CPU intensive, but needs fast disks and/or more memory.
  31. 31. One Day, One Person • Graphite is not hard to install, but it is a bit messy. • But might be as easy as "apt-get install graphite" on your system. • It would be good to have a workshop or prebuilt AMI for EC2 • But not today :-(
  32. 32. Operational Stats • You could parse /proc, ps, df, netstat, etc and write your own custom scripts.... • ...or use Diamond from BrightCove •https://github.com/ BrightcoveOS/Diamond
  33. 33. Metrics in Diamond now • Memory • CPU • Disk • Network • Apache • NGINX • MySQL • SNMP and many more
  34. 34. 100% of pure operational metrics are now shared! But what about the your applications? And business metrics?
  35. 35. Enter StatsD • https://github.com/etsy/statsd • Your application sends event data to statsd, as it happens, in real-time. • StatsD collects this data and computes time-series metrics (sum, min, max, average) • Once a minute, it writes data to Graphite
  36. 36. The Magic of UDP • Your application sends metrics in a UDP packet. • UDP is error-free. No exceptions, No timeouts. It can not cause your application to crash • It will not overload your network. • You may lose metrics, but in an intranet, it's rare.
  37. 37. Let's Count Logins! • Most StatsD client APIs are one-file, no C, simple. • Add one line to your login code. StatsD::increment('logins'); • That's it!
  38. 38. Events! • You can also graph low-frequency events. • Just send another StatsD request in your batch script StatsD::increment("deploy", 1); • Do it on reboots, installs, core dumps. • New bugs, new hires, new code commits. • Use drawAsInfinite to display
  39. 39. Server login,1 Server login,1 Server login,1 StatsD deploy,1 (login,3), (deploy,1) Deploy Script Graphite
  40. 40. Measure Anything, Measure Everything http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
  41. 41. Logins By Country! • get country code from IP address • make a new metric "login_country" instantly StatsD::increment('logins'); $kuni = geoip2country($ipv4); StatsD::increment('logins.$kuni');
  42. 42. Make Dashboards • and make frameworks to make new dashboards, easy.
  43. 43. Default Dashboard Good for experiments
  44. 44. Dashboards Make it easy for your customers
  45. 45. Make Operations Visible • Make the company visible. • Enable communication • Do the One Machine, One Day, One Person Challenge!
  46. 46. Thanks!
  47. 47. DevOpsDays Tokyo 2013 DevOpsDays is on video! Tokyo 2013 • The entire event is http://vimeo.com/album/2559722 http://vimeo.com/album/2559722
  48. 48. DevOpsDays Tokyo 2013 Media Coverage • http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/ • http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507755/ • http://itpro.nikkeibp.co.jp/article/NEWS/20131001/507959/ • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_1.html • http://www.publickey1.jp/blog/13/ githubdevopsboxenhubotdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/ githubboxenhubotdevops_day_tokyo_2013.html • http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_2.html
  49. 49. DevOpsDays Tokyo 2013 Attendee Coverage • • • • http://mass.hatenablog.com/entry/2013/09/28/205309 • • http://toshi-miura.hatenablog.com/entry/2013/09/29/222609 • http://codezine.jp/article/detail/7438 http://d.hatena.ne.jp/n-sega/20130928/1380373634 http://kazuph.hateblo.jp/entry/2013/09/28/152302 http://jedipunkz.github.io/blog/2013/09/29/devops-day-tokyo-2013report/ http://lewuathe.github.io/blog/2013/09/28/devopsday-tokyo-2013nixingtutekitayo/

×