Making operations visible - devopsdays tokyo 2013

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Making Operations Visible Nick Galbreath  ニック ガルブレス DevOpsDays Tokyo 2013
  • 2. #devopsdays @ngalbreath
  • 3.
  • 4. It's also on video!
  • 5. Who is nickg? Nick Galbreath @ngalbreath
  • 6. Online Advertising Infrastructure オンライン広告 インフラ 東京 ロシヤ モスクワ
  • 7. Continuous Deployment • In 2012, I spoke many times on continuous deployment. • But changing from release cycles to continuous deployment is too big a change for most organization, and they don't have the tools to do it.
  • 8. Goal • I'm hoping that adding new metrics to the application becomes so addictive that you'll want to shorten release cycles.
  • 9. What is DevOps? • Puppet, Chef, Annsible? • GitHub? AWS? The Cloud? • Continuous Deployment? Yes, but these are tools. Great tools.
  • 10. It's About Communication • Between machines • Between team members • Between Dev and Ops But in many companies there is a bigger problem
  • 11. You're Invisible • If you are in Business, you are invisible to Development and Tech Operations • If you are in Operations, you are invisible to Business and Development • If you are in Development, you are invisible to Business and Operations.
  • 12. Invisible Things Aren't Valued
  • 13. Developer • "I don't know what my code will do in production and ops and let's them deal with it. • "Why doesn't ops fix these problems." • "What does Ops do all day?"
  • 14. Business • Why do I have to wait till end of the month for a report? • "Did the last weeks release change anything?" • "What don't they understand the impact of that bug, outage, etc?"
  • 15. Operations • Why are they always bothering me. • I've got work to do! • Why do we have do another release again... can't developers do a better job? • "What does this company do?" (really)
  • 16. This is really destructive To you To your Team To your company.
  • 17. All of This Can Fixed By Making Operations Visible with data Not just technical operations but company operations.
  • 18. Your company is full of data! So Why Not Expose This Data? Here's a list of excuses I've heard
  • 19. "But I already have graphing in my alerting system" • Maybe. But it's junk • Can't share • Can't do data mash-ups • Can't do data transformations
  • 20. "They wouldn't understand." • "They won't understand the data so what's the point of sharing it." • First, "they" probably do. And more people looking at ops metrics, the better. • Us vs. Them = Fail.
  • 21. "They might break something." • "The data is in our alerting system, we don't want you to break it." • Assumes "they" are incompetent, or malicious. Learn to trust.
  • 22. "It's not your job, so you don't need to know." "That information isn't important" • This excuse is typically caused by fear. • Why are you deciding what's important?
  • 23. "I'm not making another system, duplicating data is bad." • For operational metrics is very ok to have a redundant copy of data. • Completely different goals. • Use as alerting-beta
  • 24. "I'm too busy." "It's too dangerous" "I don't know how." • These are real problems. • So let's fix it!
  • 25. One Machine, One Day, One Person Challenge! Let's get 100% of operational metrics in, and enable the application to make and share new metrics on demand without any help from you.
  • 26. Graphite • • • Similar to RRDTool, Ganglia, Cacti • Uses specialized data storage • Uses specialized queries • Optimized for time series
  • 27. Graphite isn't Perfect • Documentation isn't great (but getting better) • A few QA issues • Somewhat odd stack (python-twisted, django)
  • 28. Graphite Ecosystem • Flexible input and output • REST API for graphs • Simple UI for mashups and dashboards • 3rd party, custom, client-side dashboards
  • 29. Makes Sharing Easy • Do you have an interesting graph? It's just a URL! • Dashboards are easy since graphs are just URLs. Very easy to make HTML dashboards.
  • 30. One Machine One Day! • A single low-end machine should have capacity for a few thousand metrics per minute from 50+ machines. • Graphite is not CPU intensive, but needs fast disks and/or more memory.
  • 31. One Day, One Person • Graphite is not hard to install, but it is a bit messy. • But might be as easy as "apt-get install graphite" on your system. • It would be good to have a workshop or prebuilt AMI for EC2 • But not today :-(
  • 32. Operational Stats • You could parse /proc, ps, df, netstat, etc and write your own custom scripts.... • ...or use Diamond from BrightCove • BrightcoveOS/Diamond
  • 33. Metrics in Diamond now • Memory • CPU • Disk • Network • Apache • NGINX • MySQL • SNMP and many more
  • 34. 100% of pure operational metrics are now shared! But what about the your applications? And business metrics?
  • 35. Enter StatsD • • Your application sends event data to statsd, as it happens, in real-time. • StatsD collects this data and computes time-series metrics (sum, min, max, average) • Once a minute, it writes data to Graphite
  • 36. The Magic of UDP • Your application sends metrics in a UDP packet. • UDP is error-free. No exceptions, No timeouts. It can not cause your application to crash • It will not overload your network. • You may lose metrics, but in an intranet, it's rare.
  • 37. Let's Count Logins! • Most StatsD client APIs are one-file, no C, simple. • Add one line to your login code. StatsD::increment('logins'); • That's it!
  • 38. Events! • You can also graph low-frequency events. • Just send another StatsD request in your batch script StatsD::increment("deploy", 1); • Do it on reboots, installs, core dumps. • New bugs, new hires, new code commits. • Use drawAsInfinite to display
  • 39. Server login,1 Server login,1 Server login,1 StatsD deploy,1 (login,3), (deploy,1) Deploy Script Graphite
  • 40. Measure Anything, Measure Everything
  • 41. Logins By Country! • get country code from IP address • make a new metric "login_country" instantly StatsD::increment('logins'); $kuni = geoip2country($ipv4); StatsD::increment('logins.$kuni');
  • 42. Make Dashboards • and make frameworks to make new dashboards, easy.
  • 43. Default Dashboard Good for experiments
  • 44. Dashboards Make it easy for your customers
  • 45. Make Operations Visible • Make the company visible. • Enable communication • Do the One Machine, One Day, One Person Challenge!
  • 46. Thanks!
  • 47. DevOpsDays Tokyo 2013 DevOpsDays is on video! Tokyo 2013 • The entire event is
  • 48. DevOpsDays Tokyo 2013 Media Coverage • • • • • • githubdevopsboxenhubotdevops_day_tokyo_2013.html • githubboxenhubotdevops_day_tokyo_2013.html •
  • 49. DevOpsDays Tokyo 2013 Attendee Coverage • • • • • • •