Practical resource monitoring with munin (English editon)


Published on

Hello. I'm @zembutsu. I work in a server hosting company in Japan. I am a solution engineer, and I am in charge of server and network operation mainly.
So, as for my presentation, it is a resource monitoring tool about Munin.

original version is here ( in Japanese )

Munin User Group Japan
Masahito Zembutsu @zembutsu
September 8, 2012 OpenSource Conference 2012 Tokyo/Fall, Japan (#osc12tk)

Published in: Technology, Design
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Practical resource monitoring with munin (English editon)

  1. 1. Munin User Group Japan Zembutsu @zembutsuSeptember 8, 2012 OpenSource Conference 2012 Tokyo/Fall (#osc12tk)“Practical Resource Monitoring with Munin - English Edition”
  2. 2. Nice to meet you. I’m @zembutsu.Thank you for giving an opportunity of the presentation to me!They are characters of Touhou Project, and "Please take it easy!!"(yukkuri site itte ne!) is one of the famous slang in Japan.
  3. 3. Today’s topic is… Recipe for monitoring
  4. 4. I’m expecting that various burdens onoperation engineers reduce it by utilizing Munin.
  5. 5. And now…
  6. 6. Why Am I Here? this is me• Masahito ZEMBUTSU @zembutsu – Solutions Engineer ( fiery zeal Otaku mind engineer ) • Working as a server infrastructure engineer. • I want to provide relaxation and rest for theengineers.(Operation/Monitoring/Automation) – Communities of an opensource and the cloud computing • My website – Experience • April 2000 - Support engineer of server hosting and the ISP • May 2008 - Company internal network management and support • November 2010 - Service development and upper escalation operation Don’t mind the careful thing! • July 2012 – Operation, Development, Research at datacenter somewhere.
  7. 7. filleting mackerelIt resembles server operations. DevOps!
  8. 8. ―Dont forget. always, somewhere, someone is fighting for you. ―As long as you remember her. you are not alone.Operation (Reference: “Puella Magi Madoka Magica” Episode 12 “My Very Best Friend” )Monitoring
  9. 9. This is an image photograph of the data center that I’m working. This Photo is under creative commons license by torkildr
  10. 10. A Dedicated Hosting ServicesA HUMAN WORK Shutdown Attack, An Unfamiliar Specifications, Cloudcomputing’s Arrival in Japan, Shape of Server, A Business That’s Changing, My Purest Heart for Our Customers.TroubleshootingDECISIVE BATTLE The Phone That Never Stop Ringing, The Day a Datacenter Stood Still, The Choice of Priority, In sickness unto shutdown, and…, Sales Representative’s Invasion, Customer’s office the Throne of Souls, Tears.You’re a loser only when you fail to tryWe can (not) advance. The Birth of Special task force, The Value of Miracles, At Least, Be Human.
  11. 11. If you are a server administrator,you will have thought once. Perhaps…
  12. 12. My Little Servers Cant Be This Heavy... But, it may work with Munin and a solution of the problem.
  13. 13. This is that I want to do a share today.• I think that it is necessary to adopt resource monitoring for an operative flow.• As a result, it may reduce the burden on administrators. Im extremely happy. XD• We need is the culture to leave the office on time!! (Only as for me?)
  14. 14. Agenda 1. What is Munin? 2. Munin’s Architectre 3. How to use Munin 4. Practical trobleshootings! 5. MY VERY BEST MONITORING TOOL
  15. 15. I hope…1. Lets obtain a weapon called “resource monitoring” for us. Wille zur Macht2. We improve the efficiency of our working (server and network operations).“Lets find happiness together.”(Reference: Kiichi Goto, Patlabor: The Movie, 1989) I guess everybodys happy, thats fine.
  16. 16. Munin User Group Japan Munin Community in Japan
  17. 17.• Munin User Group Japan –• Wiki –• Demo –• How to join us –
  18. 18. #1What is Munin?
  19. 19. I dare to say, Munin isresource monitoring tool. Munin is a networked resource monitoring tool. I dare to say!!
  20. 20. This is overview (frontpage) of Munin 2.0
  21. 21. When you click it any, a graph of a day / week / month / year are displays it
  22. 22. This zooming function is convenient. It is useful for me to make a report.
  23. 23. Vertical axis is a server, cross axle is metrics. The grouping function is characteristic.
  24. 24. The Melancholy ofServer Administrators Monitoring is like a box of chocolate
  25. 25. By the Time we Realized It, It Had Already Begun.• troubles - alert systems can’t detect it (increased) – Mainly clientage for Social Networking Service – When the threshold of the alert exceeds it, it is already late.• demand of the clientage – rapidly response – Because a loss per one second is wrong number of digits than before. – a loss of several hundred dollars / minutes :(
  26. 26. “There is something weird, will you check servers? :)” Request from my customer of us• Very difficult request... – Clear cause identification often takes time.• I want to do my best more! – Yes!! I stir myself and go to work. Administrators got exhausted… – I want to aim at the service improvement, but this thought is bad. Why? Let’s see the next slide.
  27. 27. An old network constitution. One web server and one database server. It’s very simple!
  28. 28. An old network constitution. If it was a general Web server, it was such a constitution BIND to the utmost. One web server and one database server. It’s very simple!
  29. 29. On the other hand, at present…
  30. 30. This Just Cant Be Right!! BINDNumber and the management objects of the server are increasing in comparison with the past.Therefore support takes the time, and the degree of difficulty rises, too.
  31. 31. Why did this happen?• On the changing environment – Network – Server – Software – Middleware – Application – etc
  32. 32. Be freed from CONSOLE Ace Console: Fires of Liberation
  33. 33. most important thing, by troubleshooting• Cause investigation work has top priority. “When we act, it is a first thing to do condition to notice. If there is a technique, anything cannot be settled. It becomes necessary to notice before a technique. The technical expert is in Japan no matter how much, but cannot be readily settled. The reason is because it does not notice.” Soichiro Honda (2008) "akku baran” (candidness ) PHP inc, 10pp.
  34. 34. You sure that’s enough armor(tools)?• “No problem. Everything’s fine.” – ps – top – vmstat – iostat – free – sar (sysstat) …etc Really?
  35. 35. Past Present day
  36. 36. Situation has changedPast Now (present day present time, hahaha!!)• One or several servers • Plural servers in the same• Apache, Sendmail, Perl network (we assume)• PostgreSQL, MySQL • Conventional software + nginx,Tomcat,ruby,PHP,Python,memcac• Network appliance hed,Key-Value (sometimes) Store,Hadoop,Cassandra,MongoDB…etc• No scale • The need for scalability• Upgrading is effective • Upgrading is not effective I think that one of the answers to this problem is resource monitoring using Munin.
  37. 37. The essence of Munin ismany resources visualization I Know What Your Server Did Last Summer
  38. 38. MRTG has declined Is This MRTG? No, This Is Munin. We have lost a hero to our glorious and noble cause, but does this foreshadow our defeat? No. It is a new beginning. Compared to Cloud Computing Federation the national resources of Dedicated Server are less than one thirtieth of theirs. Despite this major difference, how is it that we have been able to fight the fight for so long? It is because our goal in this war is a righteous one. It’s been over fifty years since the elite of Cloud Computing, consumed by greed took control of the Cloud Computing Federation. We want our freedom. Never forget the times when the Federation has trampled us! We, the Principality of Dedicated Server, have had a long and arduous struggle to achieve freedom for all e n g i n e er s o f o u r g r e a t n e t w or k . Ou r f i g h t i s s a cr ed , ou r c a u s e d i v i n e . My beloved brother, MRTG, was sacrificed. Why? The war is at a stalemate.
  39. 39. Comparing resource monitoring tool I’m interested, too!
  40. 40. Comparing resource monitoring tool snmpd C
  41. 41. Comparerative table Tool name Type Datastore Config Web interaface alerting Resource Munin monitoring RRDTool CUI Reference only Resouce Cacti monitoring RRDTool & MySQL CUI/GUI We are friends all the time... MRTG Resource monitoring original CUI Reference only × IT Zabbix infrastructure monitoring MySQL, PostgreSQL, etc GUI IT Nagios infrastructure monitoring MySQL or PostgreSQL CUI/GUI It is good points and bad points both. I use Munin and a Nagios-based tool properly by my team.
  42. 42. What is Munin?The Munin We Saw That Day
  43. 43. About Munin Be alert!•• Resource monitoring tool – Munin can analyze resource trends – “what just happened to kill our performance?”• Plug and Play architecture – It can monitor many items by default Munin is a networked resource monitoring tool that can help analyze resource trends and "what just happened to kill our performance?" problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.
  44. 44. Developers Munin project github• • Documents / FAQ / Trac / Wiki Repository / tools / plugins
  45. 45. Progress in development• Community based – Github • – Mailing list • – IRC • irc://• Licence – GNU Public License version2 – There is not commercial support
  46. 46. History• 2002 - project began – The original name is “LRRD”• 2004 - Munin 1.0 released – “munin-eye” name was changed to “munin-node” – took long time, and daily improvement continued• 2009 - Munin 1.4 released – Perhaps I think that it is a version spreading most in 1.x.• May 30, 2012 - Munin 2.0 (stable) released
  47. 47. Where is the Japanese information? • NOT YET! • Let’s make it together now! – How about write something to wiki first? • “Is the number of the invitation to the Munin“I’m sorry, user group ZERO case this week, too?my applogies…” Hum? Do you have a mind to do?”
  48. 48. Munin?The Secret Of Munin
  49. 49. This Photo is under creative commons license
  50. 50. What obstacle factors there are!! Are you getting wise with me?Speaking munin-eye’s mind ( now, munin-node )
  51. 51. Summarize the points• Munin is a resource monitoring tool. (GPL v2)• Simple and powerful architecture.• Munin frees us from a console. (effectiveness)• Munin mean is “memory”. You are never alone! Munin always here for you 24x7x365
  52. 52. #2Munin’s Architecture
  53. 53. An Amazing Simple Munin The Architecture Of Munin
  54. 54. User’s viewpoint It is only simple structure. The user refers to the data on the server.
  55. 55. User’s viewpoint
  56. 56. Client Server Model Munin Master And Munin Node
  57. 57. Lets watch a little more detailed movement.
  58. 58. This is the data which wereferred to some time ago.
  59. 59. This is the work of the main Munin master, and a program isexecuted by cron.It thereby carry out the generation of the collection of data,checking threshold, HTML files and graphs one by one.
  60. 60. This is “munin-node” agent. Munin Master acquires data via plugins.
  61. 61. munin-update connects with munin-node.Its port number of munin-node is TCP 4949.
  62. 62. Plugins are executed in munin-node, and program is ascript acquiring various data. Munin-update stores thedata which I acquired in RRDTool.And, munin-limits checks the threshold.
  63. 63. And munin-graph and munin-htmlgenerate a graph and HTML for thematerial in data (.rrd) stored away byRRDtool.
  64. 64. These flows are basic movement of Munin. I think that it is really simple and cool!
  65. 65. Constitution of Muninmaster ( SERVER ) munin-node ( CLIENT )• Perl Libs • Perl Libs – Munin::Common – Munin::Common• munin-cron • munin-node – munin-update – config: munin-node.conf – munin-limits – Plugins – munin-html • Tools – munin-graph – munin-node-configure• config: munin.conf – munin-cron
  66. 66. About data collection• munin-node collect various data.• Port 4949(TCP) – Munin protocol • LIST • CONFIG • FETCH • • VERSION QUIT (T_T)4949 “4949” is onomatopoeia of Japanese "tearful face".
  67. 67. Data storage and graph generation arework of RRDtool• Data format is RRD (round robin database) – /var/lib/munin/<hostname>/<plugin’s name>.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-idle-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-iowait-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-irq-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-nice-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-softirq-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-steal-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-system-d.rrd -rw-r--r-- 1 munin munin 50612 10月 18 2010 localhost-cpu-user-d.rrd• 50KByte/one RRD file – More than 200KB/one plugin (MUST) – 150 to 250 files/munin-node (total about 8 to 15MB/node)
  68. 68. generate graphs per plugin Munin Master And Munin Node
  69. 69. Munin prepares for much plugins• System resources – CPU, memory, Load Average, disk, S.M.A.R.T…• Network – Traffic, SNMP, HTTP loadtime, TCP, UDP, ICMP…• Applications, middleware – Apache, Nginx, Sendmail, Postfix, MySQL, PostgreSQL, MongoDB, memcached, PHP… etc
  70. 70. Ex) Load Average plugin• /etc/munin/plugins/load – “Load average” is five minutes average – It’s a symbolic link • Original is /usr/share/munin/plugin/load – Simple shell script echo -n "load.value " cut -f2 -d < /proc/loadavg load .value 3.22
  71. 71. #3How to use Munin
  72. 72. Munin setup! Make My Day
  73. 73. Environment• Perl5• OS – Linux • Source code ( version 2.0.6 ) • Binary Package – Red Hat Enterprise Linux 系 ( EPEL ) – Debian – openSUSE – MacOS X – Windows
  74. 74. Setting up flow• Install Munin and Perl Libraries• Change a config file ( munin.conf )• Setting up munin-node ( munin-node.conf )• Check its graphs
  75. 75. Case) Red Hat Enterprise Linux• Use EPEL*1(testing repository) package or source• procedure – 1. enabling EPEL – 2. “yum install munin” – 3. configure munin.conf – 4. turn on munin-node and setup – 5. check *1 Extra Packages for Enterprise Linux(EPEL)
  76. 76. Case) Debian / Ubuntu• Use apt (Debian PTS is testing) or Source• Procedure – 1. setting up Perl libraries (via apt-get) – 2. install munin – 3. configure munin.conf – 4. turn on munin-node and setup – 5. check
  77. 77. Basic of the setting How To Configure
  78. 78. Config filesMunin Master munin-node• /etc/munin/munin.conf • /etc/munin/munin-node.conf – Host tree (targeting nodes) – Access control – Graph strategy • Host (IP address) • Cron or realtime generation • Network CIDR – Paths – Node’s hostname • RRD files – Port number • logfiles • Default: TCP 4949 (T_T) – Plugin’s option
  79. 79. [munin.conf] set target node[GroupName;] address use_node_name yes
  80. 80. [munin-node.conf] Access control• allow ^$ – Regular expression• cidr_allow – Not regular expression• If you change files, then you must restart munin-node!
  81. 81. Basic of the plugin How To Configure plugin
  82. 82. Basic knowledge of Munin plugin• Original files is here ( shell or perl scripts ) – /usr/share/munin/plugins/• How to use – To make symbolic link to /etc/munin/plugins – configure munin-node.conf – munin-node restart (MUST) – Check graph and html
  83. 83. How to debug plugin• /usr/sbin/munin-run <plugin-name> – “--debug” shows more detail – behavior is same as munin-node – useful• Command line tool ( I made ) – muninwalk & muninget ; perl script
  84. 84. Plugins catalog How To Configure plugin
  85. 85. Apache• Symbolic link # ln -s /usr/share/munin/plugins/apache_* /etc/munin/plugins/• munin-node.conf [apache_*] env.url env.ports 80• httpd.conf ExtendedStatus On <Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from </Location>
  86. 86. MySQL• Symbolic link # ln -s /usr/share/munin/plugins/mysql_* /etc/munin/plugins/• munin-node.conf [mysql*] env.mysqlopts -u root -pPASSWORD env.mysqladmn /usr/bin/mysqladmin
  87. 87. BIND• Symbolic link # ln -s /usr/share/munin/plugins/bind9_rndc /etc/munin/plugins/• munin-node.conf [bind9_rndc] env.rndc /usr/sbin/rndc env.querystats /var/named/chroot/var/named/data/named_stats.txt user root• named.conf statistics-file "/var/named/data/named_stats.txt";
  88. 88. Plugins I made Plugins Catalog
  89. 89. Web Services; it estimated charges realtime via API
  90. 90. Quantity of generation of the electric power company (TEPCO)
  91. 91. And electricity consumption rate (TEPCO).Munin can make anything plug in if digitized.
  92. 92. How to make plugin Take It Easy
  93. 93. Sample case; httping plugin• • "httping" is a command-line tool which can check response time of the Web server like a “ping” command. • If you set –S opsion, then you can check response time and processing time. $ httping -S PING ( connected to (380 bytes), seq=0 time=0.10+0.69=0.79 ms connected to (380 bytes), seq=1 time=0.08+0.47=0.55 ms connected to (380 bytes), seq=2 time=0.07+0.68=0.75 ms connected to (380 bytes), seq=3 time=0.12+0.66=0.77 ms Got signal 2 --- ping statistics --- 4 connects, 4 ok, 0.00% failed
  94. 94. Plugin: httping_#!/bin/sh## Plugin to monitor HTTP response (httping)#%# family=auto#%# capabilities=autoconfURL=${URL:-"http://localhost/"}COUNT=${COUNT:-"5"}httping_bin=$(which httping) This is substance of a httping plugin, and aif [ "$1" = "autoconf" ]; then file itself is a simple shell script. echo yes exit 0 The contents are the definition about thefi Define graphing graph and commands to really acquire a value.if [ "$1" = "config" ] ; then echo "graph_args -r --lower-limit 0 "; A point is to acquire data, and therefore the echo "graph_title http response $URL"; echo "graph_category httping"; plug in can make even what kind of language echo "graph_info httping response time: $URL"; echo graph_vlabel msec including perl and PHP. echo "connect.label connect time" echo "connect.draw AREA" echo "connect.type GAUGE" echo "processing.label processing time" echo "processing.draw STACK" echo "processing.type GAUGE" exit Output format is “xxx.Value ***”fi# format for httpiing 1.5.3$httping_bin -c $COUNT -G -S $URL | tr +|= | awk {connect+=$9; processing+=$10} END{print "connect.value",connect/$COUNT"n""processing.value",processing/$COUNT}
  95. 95. Config: httping_• /etc/munin/plugin-conf.d/httping [httping_localhost] env.URL env.COUNT 5 [httping_blog] env.URL env.COUNT 5 [httping_node1] env.URL env.COUNT 5• # ln -s /usr/share/munin/plugins/httping_ /etc/munin/plugins/httping_localhost
  96. 96. httping live demo• is a case having any problem neither for this server, There is much partial (processing time) of this server groupresponse time and processing time. blue.It takes the processing time by certain CMS. On the other hand, I understand that the network is good.
  97. 97. #4Practical trobleshoot!
  98. 98. Never say never.• Agility is the pivot of the service (in my case) – LOOKOUT, its cause solution of the trouble • Hardware or Software or Network – We need investigation • where a problem happens promptly
  99. 99. Live Munin demo• – Then lets observe the resource situation through this demonstration site of Munin. • Where is a bottleneck? or will be? • Even if you do not log in to a server, I think that you can refer to many resources.
  100. 100. This Just Can’t Be Right A Real Troubleshooting
  101. 101. Case) identified unauthorized access• By the Time we Realized It, It Had Already Begun.• situation – 1. Error emails beguns to arrive to postmaster – 2. There was not the alert with the monitoring tool – 3. Therefore at first I checked a resource in Munin – 4. I identified that CMS had vulnerability from the situation and acted promptly. I was able to perform the above-mentioned movement quickly in a short time by Munin.
  102. 102. How to find it.Sendmail’s queue rised suddenly Load Average has no problem
  103. 103. I confirmed the time MySQL’s queries were when traffic was strange rised suddenly, tooFrom the above-mentioned situation, I supposed illegal access for CMS. Actually, I understoodthe attack for the specific URL when I investigated log of the time.Identification and the action of the cause should have taken time more if I did not use Munin.
  104. 104. #5My Bery Vest Monitoring tool
  105. 105. New Features of Munin 2.0 I think that Munin is truly wonderful
  106. 106. Munin 2.0 has new features!• Better UI and CGI integration – New look, Graph Zooming, FastCGI• asynchronous I/O support – Better performance• Native SSH transport – secure (port 22) & easy setup• asynchronous proxy support – async-server substitutes for munin-node• And more… – monitoring/munin/blob/devel/Announce-2.0
  107. 107. [RemoteNetwork;backend-DB] addressssh:// --spooldir /var/opt/munin/spool/ –spoolfetch use_node_name yes
  108. 108. No munin, No Troubleshoot. Im Not Afraid of Anything Anymore
  109. 109. Munin changed support flow (my case)• If I don’t use tools – Troubleshooting is various command execute (sysstat) and investigation of the log files. – But, this method need long time and many human resources need, and is bad for service.• If I use Munin (now). – Even if I do not log in, I can understand the situation. – I can judge abnormality visually • “I see the ending of this troubleshooting!” – Agile Support • Troubleshooting that has Plan-Do-Check-Action (PDCA) cycles.
  110. 110. In work of my dedicated server hosting• I really depend on Munin – Always, I setup Munin. Neat – Munin is almost in several I cannot part with Munin hundred servers which for my work. I manage directly. – I think that Munin is indispensable to our service quality You believe it! improvement. BAM BAM!
  111. 111. Trobuleshoot PDCALaw ofCycles Presage!!
  112. 112. Detecting problem What arePlan and situation these alerts? For real? Trobuleshoot PDCA Law of Cycles Presage!!
  113. 113. Detecting problem What arePlan and situation these alerts? For real? Trobuleshoot PDCA Do Law of Suppose a cause Cycles OK, Munin. Please tell me that trouble lies hidden in wherever? Presage!! Fire! Please stop!!
  114. 114. Detecting problem What arePlan and situation these alerts? For real? Trobuleshoot PDCA Do Law of Suppose a cause Cycles OK, Munin. Please tell me that trouble lies hidden in wherever? Presage!! Fire! I just talk about what I just looked in Munin!! Check Please stop!! To check resources remotely
  115. 115. Detecting problem What are Plan and situation these alerts? For real? Wow! click-clack Trobuleshoot PDCA Do click-clack Law of Suppose a cause Cycles OK, Munin. Please tell me that trouble lies hidden in Action wherever? Presage!! Fire!Log in and I just talk aboutexecute commands what I just looked in Munin!! Check Please stop!! To check resources remotely
  116. 116. You are never alone!Munin always here for you 24 x 7 x 365 The Only Thing I Have Left To Guide Me
  117. 117. Munin’s overview・Munin is the resource monitoring tool that specialize to notice by the visualization.・Simple architecture, and many plug-ins.・Ths is most suitable for the system that quick support is necessary in a short time.
  118. 118. Conclusion * This is my personal impression. No munin, No Operation. While there’s Munin, there’s hope. MY VERY BEST MONITORING TOOL. Thank you for MUNIN. Good-bye to MRTG.
  119. 119. I wish…• I would appreciate you use Munin that if you were interested in Munin by my presentation.• Tomorrow is another day. Up to you. Squidn’t you use Munin? (Shoudn’t)
  120. 120. Questions?• Do you have a questionable point for munin? Im glad you asked. Lets give the rights that the reward buys Opoona for you. (but, here is wagon sale...)
  121. 121. References• Munin –• Munin User Group Japan – –• Website – Waiting for Munin 2.0 – Introduction – Personal Workflow Blog • – /tags/2.0.0/ChangeLog – Munin – Trac • Please feedback me or @zembutsu ( twitter ) Thank you for your reading!