Your SlideShare is downloading. ×
0
Monitoring Large-scale Cloud
Infrastructures with OpenNebula
Simon Boulet
OpenNebula Consultant
Co-founder of the Cloudnor...
Goals
1. Show how to configure OpenNebula to
achieve sub-1 minute monitoring interval
2. Demonstrate the use of OpenNebula...
How Big Exactly is Large-scale?
How many hosts?
1,000? 2,000? 10,000 VMs?
Monitoring in OpenNebula
● Detects when a VM or host changes status
(Running, Stopped, etc.)
● Built-in metrics: CPU, memo...
Don't Expect the Default
Configuration to Perform Optimally
● Database: Use MySQL database backend,
not the default SQLite...
Use OpenNebula >= 4.0
Prior versions did monitoring in two phases:
1. The IM Monitor action monitored Hosts
2. The VMM Pol...
Monitoring History
By default OpenNebula keeps 24h of
monitoring history
15 seconds interval X 24h = 5760 records per VM
A...
Monitoring History (continued)
● Reduce history to 30 minutes (1800
seconds)
● Use MySQL MEMORY storage engine for
vm_moni...
Watch your Load Average
As of 4.2, the maximum number of
simultaneous XML-RPC API connections is
limited to 15
Overloaded ...
Local Caching Nameserver
OpenNebula use DNS name for monitoring
hosts (unless you named your hosts using their
IP address ...
Beware of SSH Transport
Most OpenNebula drivers (KVM, Xen, etc.) use
SSH connections to perform actions
OK for deploying n...
Meet Ganglia
<< Ganglia is a scalable distributed system monitor tool for high-performance
computing systems such as clust...
Meet Ganglia (continued)
Ganglia Driver Limitations
1. Currently only 1 Ganglia Collector is
supported
2. Need to run script on each host to export...
Host sFlow
<< The Host sFlow agent exports physical and virtual server performance
metrics using the sFlow protocol. The a...
Host sFlow (continued)
Source: http://blog.sflow.com/2012/02/ganglia-33-released.html
Host sFlow (continued)
Sample Metrics
Hosts Metrics
VMs Metrics
Not currently supported in OpenNebula. Contact me if you'r...
4,000 VMs at Sub-1 Minute Interval
OpenNebula 4.2 + xml-rpc patch (upcoming in 4.4)
Experimental Host sFlow Driver
1 OpenN...
4,000 VMs at Sub-1 Minute Interval
4,000 VMs at Sub-1 Minute Interval
4,000 VMs at Sub-1 Minute Interval
Looking Forward
There’s room for optimizations
● The command line tools can get very slow when
returning very large result...
Thank you!
Questions?
“OpenNebula captured my interest for several technical
reasons besides the fact that it is truly ope...
Upcoming SlideShare
Loading in...5
×

OpenNebulaConf 2013 - Monitoring Large-scale Cloud Infrastructures with OpenNebula by Simon Boulet

458

Published on

Efficient monitoring is crucial when managing your Cloud infrastructure. The metrics collected by OpenNebula can be used to trigger automatic scaling, or quickly detect failures to automatically restart virtual machines. During this talk, I will show how OpenNebula can be used to efficiently monitor thousands of virtual machines at sub-1 minute interval. I will show how OpenNebula can be enhanced and optimized, and how different metrics collection tools such as Ganglia and Host-sFlow can be used with OpenNebula to monitor large-scale Cloud infrastructures.

Bio:
Simon Boulet is an Entrepreneur and an IT Consultant from Montreal, Canada. He has worked on various Cloud infrastructure projects, including projects for the CBC/Radio-Canada public television that had important scaling needs for hosting online interactive TV shows. Prior to becoming an IT Consultant, Simon was IT Director at iWeb, Canada’s largest Web Hosting company, where he led iWeb’s first steps into Cloud Computing with the development of the Smart Servers. Simon is also an active and frequent contributor to OpenNebula, with a deep understanding of OpenNebula internals, and has contributed several enhancements and bug fixes that made it through the official releases of OpenNebula.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
458
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "OpenNebulaConf 2013 - Monitoring Large-scale Cloud Infrastructures with OpenNebula by Simon Boulet "

  1. 1. Monitoring Large-scale Cloud Infrastructures with OpenNebula Simon Boulet OpenNebula Consultant Co-founder of the Cloudnorth.com Project simon@nostalgeek.com
  2. 2. Goals 1. Show how to configure OpenNebula to achieve sub-1 minute monitoring interval 2. Demonstrate the use of OpenNebula in large-scale cloud infrastructures 3. Suggest enhancements to OpenNebula performance and monitoring
  3. 3. How Big Exactly is Large-scale? How many hosts? 1,000? 2,000? 10,000 VMs?
  4. 4. Monitoring in OpenNebula ● Detects when a VM or host changes status (Running, Stopped, etc.) ● Built-in metrics: CPU, memory and network usage ● You can add as many metrics as you like by customizing driver ● Can be used to perform various tasks (auto scaling, high-availability redeployment, etc.)
  5. 5. Don't Expect the Default Configuration to Perform Optimally ● Database: Use MySQL database backend, not the default SQLite ● Logs: Use Syslog log system, and disable debug logging (debug_level=1) ● Number of threads: Adjust the number of drivers threads (see -t option to your *MAD config options)
  6. 6. Use OpenNebula >= 4.0 Prior versions did monitoring in two phases: 1. The IM Monitor action monitored Hosts 2. The VMM Poll action monitored VMs 100 Hosts + 1,000 VMs * 15 seconds interval = 4,400 actions per minute Since OpenNebula 4.0, the IM Monitor action is capable of returning the information of VMs running on the monitored host
  7. 7. Monitoring History By default OpenNebula keeps 24h of monitoring history 15 seconds interval X 24h = 5760 records per VM Average record size: 4KB 23MB of monitoring history per VM 100 VM = 2.3GB 10,000 VM = 230GB HOST_MONITORING_EXPIRATION_TIME and VM_MONITORING_EXPIRATION_TIME config options
  8. 8. Monitoring History (continued) ● Reduce history to 30 minutes (1800 seconds) ● Use MySQL MEMORY storage engine for vm_monitoring and host_monitoring tables It's OK to lose monitoring history when MySQL is restarted Most recent monitoring values are stored in VM template Set MySQL max_heap_table_size large enough to hold all your monitoring history
  9. 9. Watch your Load Average As of 4.2, the maximum number of simultaneous XML-RPC API connections is limited to 15 Overloaded OpenNebula = Slow XML-RPC API response = API Limit / Timeout ● Reduce load at deployment time by adjusting number of VMs simultaneously deployed by scheduler ● Watch next release (4.4) for XML-RPC API concurrency enhancements
  10. 10. Local Caching Nameserver OpenNebula use DNS name for monitoring hosts (unless you named your hosts using their IP address instead of name) ● Use a local caching nameserver to speed up DNS lookup (such as dnsmasq).
  11. 11. Beware of SSH Transport Most OpenNebula drivers (KVM, Xen, etc.) use SSH connections to perform actions OK for deploying new VM, but expensive when doing VM monitoring
  12. 12. Meet Ganglia << Ganglia is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. >> - Wikipedia OpenNebula has built-in support for Ganglia By default Ganglia and OpenNebula must run on the same machine Set GANGLIA_HOST in /var/lib/one/remotes/im/ganglia.d/ganglia_probe and /var/lib/one/remotes/vmm/kvm/poll_ganglia
  13. 13. Meet Ganglia (continued)
  14. 14. Ganglia Driver Limitations 1. Currently only 1 Ganglia Collector is supported 2. Need to run script on each host to export OpenNebula-specific metric (OPENNEBULA_VMS_INFORMATION) 3. Ganglia as a maximum length of 1392 bytes for string metrics
  15. 15. Host sFlow << The Host sFlow agent exports physical and virtual server performance metrics using the sFlow protocol. The agent provides scalable, multi-vendor, multi-OS performance monitoring with minimal impact on the systems being monitored.>> - http://host-sflow.sourceforge.net/ Exports a standard set of hypervisor and VM metrics Official support for Xen, KVM and Hyper-V, but uses Libvirt to gather metrics (and Libvirt has support LXC, OpenVZ, VMWare, etc.)
  16. 16. Host sFlow (continued) Source: http://blog.sflow.com/2012/02/ganglia-33-released.html
  17. 17. Host sFlow (continued) Sample Metrics Hosts Metrics VMs Metrics Not currently supported in OpenNebula. Contact me if you're interested. vnode_mem_total Hypervisor Total Memory vnode_domains Hypervisor VM Count <VM ID>.vcpu_state VM State (Running, Stopped, etc.) <VM ID>.vmem_util VM Memory Utilization <VM ID>.vdisk_free VM Free Disk Space
  18. 18. 4,000 VMs at Sub-1 Minute Interval OpenNebula 4.2 + xml-rpc patch (upcoming in 4.4) Experimental Host sFlow Driver 1 OpenNebula Core (EC2 High-CPU XLarge instance) 1 Sunstone Web Server (EC2 Standard Medium instance) 1 Ganglia Collector (EC2 Standard Medium instance) 100 Hosts (EC2 High-CPU Medium instances) ~40 VMs per Host ~4,000 VMs (OpenVZ) 15 - 60 second monitoring interval
  19. 19. 4,000 VMs at Sub-1 Minute Interval
  20. 20. 4,000 VMs at Sub-1 Minute Interval
  21. 21. 4,000 VMs at Sub-1 Minute Interval
  22. 22. Looking Forward There’s room for optimizations ● The command line tools can get very slow when returning very large result sets (but not the API…) ● Distributed driver, for example using ZeroMQ for distributing tasks to multiple workers ● Investigate PoolSQL locks being held for long period and blocking other threads (discussed in bug #1818) ● Gather metrics about OpenNebula internals: locks wait, effective monitoring interval, memory footprints, etc. ● Investigate very large Sunstone memory usage
  23. 23. Thank you! Questions? “OpenNebula captured my interest for several technical reasons besides the fact that it is truly open. It's architecture is very elegant; it has C++ bones, ruby muscles and bash tendons. It's extensible and understandable. It has no peer as far as I can tell.” Christopher Barry, Infrastructure Engineer, RJMetrics, September 2012 http://opennebula.org/users:testimonials
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×