Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Openstack Nova deep dive

4,074 views

Published on

This is a presentation that was conducted by me for Openstack bootcamp

Published in: Technology
  • Sex in your area is here: ❶❶❶ http://bit.ly/2F90ZZC ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2F90ZZC ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Does Penis Size REALLY Matter? The truth comes out... ★★★ https://tinyurl.com/yy3nfggr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Secrets To Making Up, These secrets will help you get back together with your ex. ➤➤ http://t.cn/R50e2MX
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Openstack Nova deep dive

  1. 1. NOVA By Anand Nande
  2. 2. AGENDA ITEMSAGENDA ITEMS ● What is Openstack NOVA ? ● Its Components ● AMQP and NOVA ● VM state transitions ● Host Aggregates: NUMA, CPU-Pinning ● Server Groups: Affinity, Anti-Affinity. ● Migration: Stages, Types, Security, How to interact with live- migration ? ● Debugging and Troubleshooting
  3. 3. Whats NOVA  Its the big daddy of all the openstack projects  one of the core components which has been there in openstack since forever  provides compute as a service and all the juice required to run the vms on top  inherits ancient qemu/kvm virtualization features and principles  built on top of messaging-based architecture  pluggable/hybrid hypervisor support like xen,lxc,hyperv,esx,docker
  4. 4. NOVA components NOVA API NOVA Conductor NOVA Compute NOVA ConsoleAuth NOVA novncproxy NOVA Scheduler messagequeue Components interaction via MQ | External service interaction via REST-api
  5. 5. ➢ NOVA API: ○ nova-api is responsible to provide an API for users and services to interact with NOVA ○ For ex. Spawning the instance from Horizon / NOVA CLI.
  6. 6. ➢ NOVA Scheduler: ○ Using Filters dispatches requests for new virtual machines to the correct node.
  7. 7. ➢ openstack-nova-compute: Runs on each node to create and terminate virtual instances. The compute service interacts with the hypervisor to launch new instances, and ensures that the instance state is maintained in the Compute database.
  8. 8. ➢openstack-nova-conductor: Provides database-access support for Compute nodes (thereby reducing security risks). ➢openstack-nova-consoleauth: Handles console authentication. ➢openstack-nova-novncproxy: Provides a VNC proxy for browsers (enabling VNC consoles to access virtual machines).
  9. 9. What is keypair and security group ? ➢ Keypair: On standard cloud images of Linux operating systems like Ubuntu and Fedora SSH access is restricted to public key authentication. Instead of authenticating with a password you authenticate with a private key that corresponds to a public key that is installed on the instance. ➢ Security groups are sets of IP filter rules that are applied to an instance's networking. i.e. we can filter the network traffic which should allow/deny. For ex. Deny “ssh” access to any specific instance.They are project specific, and project members can edit the default rules for their group and add new rules sets. All projects have a "default" security group, which is applied to instances that have no other security group defined.
  10. 10. REQ: curl -i 'http://10.65.234.1:5000/v2.0/tokens' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "{SHA1}121c3faea23dd4467fc992f1b77f6eacf8587ed5"}}}' ➢ NOVA call for authentication with keystone: ○ It provides authentication token along with service catalog.
  11. 11. ➢ Keystone response(token + service catalog) : RESP BODY: {"access": {"token": {"issued_at": "2015-05-30T11:05:03.054462", "expires": "2015-05-30T12:05:03Z", "id": "{SHA1}7781e321bfbfbf909ae44027ef60cb92ccce8f2e", "tenant": {"enabled": true, "description": "admin tenant", "name": "admin", "id": "97787e34dc0d4f2b8fc04034eed3594c"}, "serviceCatalog": [{"endpoints_links": [], "endpoints": [{"adminURL": "http://10.65.234.1:8774/v2/97787e34dc0d4f2b8fc04034eed3594c", "region": "RegionOne", "publicURL": "http://10.65.234.1:8774/v2/97787e34dc0d4f2b8fc04034eed3594c", "internalURL": "http://10.65.234.1:8774/v2/97787e34dc0d4f2b8fc04034eed3594c", "id": "42142cca01fd4bc382ac9f95c204e116"}], "type": "compute", "name": "nova"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://10.65.234.1:9696/", "region": "RegionOne", "publicURL": "http://10.65.234.1:9696/", "internalURL": "http://10.65.234.1:9696/", "id": "466354cac1094127ac0617cf75dd1494"}], "type": "network", "name": "neutron"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://10.65.234.1:9292", "region": "RegionOne", "publicURL": "http://10.65.234.1:9292", "internalURL": "http://10.65.234.1:9292", "id": "43c49fe7dd8f4315af848b48a53021c1"}], "type": "image", "name": "glance"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://10.65.234.1:8776/v1/97787e34dc0d4f2b8fc04034eed3594c", "region": "RegionOne", "publicURL": "http://10.65.234.1:8776/v1/97787e34dc0d4f2b8fc04034eed3594c", "internalURL": "http://10.65.234.1:8776/v1/97787e34dc0d4f2b8fc04034eed3594c", "id": "30ce33a6d05e4a80b8a0e22ada52abdb"}], "type": "volume", "name": "cinder"}, [...]
  12. 12. ➢ Required details to boot instance: ○ instance name ○ glance image ○ flavor ID ○ network ID ○ security group [root@dhcp209-220 ~]# nova boot --flavor 1 --image 2d946232-5773-48df-b8bb-7677f8b6e0fe --nic net- id=97bd405a-77e3-4ef8-836e-8ad1ddb3ee63 --security-groups default pratik_test_instance [...] REQ: curl -i 'http://10.65.209.220:8774/v2/27513fe577364ce594d48f629f7b74fd/servers' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X- Auth-Project-Id: admin" -H "X-Auth-Token: {SHA1}fde39ed28acaf2d30788fced000970f9c7f65dfb" -d '{"server": {"name": "pratik_test_instance", "imageRef": "2d946232-5773-48df-b8bb-7677f8b6e0fe", "flavorRef": "1", "max_count": 1, "min_count": 1, "networks": [{"uuid": "97bd405a-77e3-4ef8-836e- 8ad1ddb3ee63"}], "security_groups": [{"name": "default"}]}}' [...] ➢ NOVA call to boot an instance:
  13. 13. NOVA and AMQP interaction ● The API services process REST requests, which typically involve database reads/writes, optionally sending RPC messages to other Nova services, and generating responses to the REST calls. ● RPC messaging is done via the `oslo.messaging library`, an abstraction on top of message queues. ● Most of the major nova components can be run on multiple servers, and have a manager that is listening for RPC messages. The one major exception is nova- compute, where a single process runs on the hypervisor it is managing. Oslo.messagingREST RPC oslo_messaging.rpc.dispatcher oslo.messaging._drivers.impl_rabbit Dig further ? On compute node : * nova-manage logs errors | grep oslo * nova.conf > [oslo_messaging_rabbit] section
  14. 14. NOVA and AMQP(rabbit) interaction * Nova uses direct, fanout, and topic- based exchanges. [1] * Each Nova service (for example Compute, Scheduler, etc.) create two queues at the initialization time, one which accepts messages with routing keys ‘NODE-TYPE.NODE-ID’ Example: compute.hostname * other, which accepts messages with routing keys as generic ‘NODE-TYPE’ (for example compute). The former is used specifically when Nova-API needs to redirect commands to a specific node like ‘destroy instance’. [1]https://www.rabbitmq.com/tutorials/amqp-concepts.html Zoomed in Broader view
  15. 15. VM State Transitions
  16. 16. ! Lets see how they dance !
  17. 17. Horizon NOVA API NOVA Conductor 1. Sending API request Keystone 2.Authentication request 3.Authentication ACK & validates if provided data is correct. 5. Update DB NOVA Scheduler 4 6 Database NOVA Compute ‘A’ NOVA Compute ‘B’ NOVA Compute ‘C’ 7. Selects compute Host 9. Request for glance im age 10. glance im age download 11. create port(allocate MAC - IP) 12. notify l2 agent Glance Server Cinder Server Neutron Server -openstack- glance-api -openstack- glance-registry -openstack-cinder- api -openstack-cinder- scheduler -openstack-cinder- volume -neutron-server -neutron-l3-agent -neutron-dhcp- agent -l2 agent -openstack-nova-compute 13. configure local VLAN, OVS flows 14. send port up notification (RPC: l2 agent to Neutron) 15. port up(RPC: Neutron to NOVA) 16. instance booted. 8.Update DB
  18. 18. Host Aggregates and Availability Zones http://wordpress- anande.rhcloud.com/2016/05/24/digging-into- aggregate-groups-and-availability-zones/
  19. 19. NUMA and CPU Pinning
  20. 20. How to Interact with numa nodes and monitor usage ? Example showing sub-optimal memory:CPU alignment on NUMA nodes of a compute
  21. 21. After running `numad` the memory:cpu alignment is adjusted to best suite
  22. 22. But we dont use ‘numad’ in Openstack http://bit.ly/1suSHm5
  23. 23. Now Lets understand NOVA Flavor extra_specs hw_cpu_policy=shared V/S hw_cpu_policy=dedicated cpu_pinning + NUMA + Live_Migration + ? Coffee More on NUMA: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_ Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-NUMA-NUMA_and_lib virt.html - https://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-plac ement.html - https://access.redhat.com/solutions/2046753 More on cpu-pinning: https://access.redhat.com/solutions/2191071
  24. 24. What are Server Groups http://wordpress- anande.rhcloud.com/2016/03/17/host-aggregate- groups-and-server-groups/
  25. 25. Whats Live Migration from a qemu perspective ? ● Pick guest state from one QEMU process on one hypervisor and transfer it to another qemu process, while the guest is running, on another hypervisor. ● The guest shouldn’t realize the world is changing beneath its feet ● Guest might notice some degraded performance inside it, though - only for a few seconds (ideally) due to dirty-page-tracking taking place. qemu 19997 1 0 May17 ? 00:04:06 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name mini1,debug-threads=on -S -machine pc-i440fx- 2.4,accel=kvm,usb=off,vmport=off -cpu Haswell-noTSX-m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 05708fb7-672e-4493-a316-e3765f37eedc-no-user- config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1- mini1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb- ehci1,id=usb,bus=pci.0,addr=0x6.0x7-device ich9-usb- uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb- uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb- uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio- serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/mininet-vm- x86_64.vmdk,format=vmdk,if=none,id=drive-virtio-disk0 -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:a0:1b:bb,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio- serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device vmware-svga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on qemu 21200 1 0 May17 ? 00:04:06 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name mini1,debug-threads=on -S -machine pc-i440fx- 2.4,accel=kvm,usb=off,vmport=off -cpu Haswell-noTSX-m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 05708fb7-672e-4493-a316-e3765f37eedc-no-user- config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1- mini1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb- ehci1,id=usb,bus=pci.0,addr=0x6.0x7-device ich9-usb- uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb- uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb- uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio- serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/mininet-vm- x86_64.vmdk,format=vmdk,if=none,id=drive-virtio-disk0 -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:a0:1b:bb,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio- serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device vmware-svga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on Hypervisor/Compute-1 Hypervisor/Compute-2
  26. 26. Lets drill down further.. Stage1: Mark all RAM dirty Stage 2: Keep sending dirty RAM pages since last iteration Stop when some low watermark or condition reached Stage 3: Stop guest, transfer remaining dirty RAM. Continue execution on destination qemu process … but this is just what happens on the qemu-level
  27. 27. Nova-scheduler SRC DEST Does DEST meet SRC specs to accommodate the incoming instance - Free RAM - TIME sync - CPU_map xml - same subnet
  28. 28. Nova-scheduler SRC DEST DEST matches, start migrating vm
  29. 29. Nova-scheduler SRC DEST vm vm
  30. 30. Nova-scheduler SRC DEST vm vm vm
  31. 31. Nova-scheduler SRC DEST vm vm vm
  32. 32. Nova-scheduler SRC DEST vm Send RARP to Retain the ip- address of the VM
  33. 33. Many ways to migrate - Its your call [stack@instack ~]$ nova help | grep -E '(migr|eva)' evacuate Evacuate server from failed host. live-migration Migrate running server to a new machine. migrate Migrate a server. The new host will be host-evacuate Evacuate all instances from failed host. migration-list Print a list of migrations. host-servers-migrate Migrate all instances of the specified host to host-evacuate-live Live migrate all instances of the specified
  34. 34. Credits : Stephen Gordon
  35. 35. Credits : Stephen Gordon
  36. 36. When security is of concern ● By default, live-migration uses TCP/unsecured way of migrating devices. ● There are a few ways to leverage unsecured TCP as a means for libvirtd socket xommunication : – TLS for encryption and X.509 client certificates for authentication – GSSAPI/Kerberos for authentication and encryption – TLS for encryption and Kerberos for authentication
  37. 37. live_migration_uri=qemu+ACCESSTYPE://USER@%s/system ACCESSTYPE = tcp (unencrypted) or tls (encrypted) USER = who has access to the compute service - “nova” Example: live_migration_uri=qemu+tls://nova@%s/system
  38. 38. PCI passthrough ● Nova.conf > scheduler_default_filters=…, PciPassthroughFilter ● Add the device whitelist on the compute node : ● Create a flavor with with pci property. ● Boot an instance ISSUE: no live-migration support, Admin needs to detach/attach pNIC. ● Proposal (not accepted though) to detach_nic_from_src + emulate + migrate + attach_nic_on_dest.
  39. 39. How to interact with Live Migration ? # virsh domjobinfo <domain> Job type: Unbounded Time elapsed: 4000ms Data processed: 5.093 MiB Data remaining: 1015.707 MiB Data total: 1.008 GiB Memory processed: 5.093 MiB Memory remaining: 1015.707 MiB Memory total: 1.008 GiB Constant pages: 460504 Normal pages: 78809 Normal data: 307.848 MiB Expected downtime:30ms Check the status of the migration using ‘virsh’
  40. 40. # virsh suspend <instance-name> The simplest and crudest mechanism for ensuring guest migration complete is to simply pause the guest CPUs. This prevent the guest from continuing to dirty memory and thus even on the slowest network, it will ensure migration completes in a finite amount of time. # virsh domjobabort <instance- name> Cancel an ongoing live-migration # virsh migrate-set-speed <domain> <speed_in_Mbps> Multiple small vm’s with less RAM or One large VM with more RAM
  41. 41. Future of Live Migration - Instances with direct pci passthrough - Split network plane for live migration - Abort an ongoing live migration - Pause a migration - Check the destination host when migrating or evacuating
  42. 42. Debugging ● LIVE MIGRATION : virsh qemu-monitor-command, virsh qemu-monitor-event ● If the instance crashes – coredump : virsh dump ● Enable libvirt logging for understanding lower level interactions between libvirt and qemu – libvirtd.conf – log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util 1:qemu_monitor" – log_outputs="1:file:/var/log/libvirt/libvirtd.log" ● NOVA debug (and verbose if required) logging: – sed -i 's/debug=False/debug=True/g' /etc/nova/nova.conf – sed -i 's/verbose=False/verbose=True/g' /etc/nova/nova.conf – Get compute.log, conductor.log, scheduler.log for analysis
  43. 43. Troubleshooting ● Enable debugging and check for keywords like WARN, ERROR, STOP, FAIL in the logs and on STDOUT with –debug/--verbose on the commands (if cli can be used to reproduce the issue) ● Check for existing bugzillas(bugzilla.redhat.com & bugs.launchpad.net). ● Try to reproduce the issues in your test-environment with exact component version.(and hardware if necessary) ● Discuss it with your team. ● Reach out to mailing lists and IRC channels. ● Reach out directly to engineering by opening bugzilla with all information and regularly following up on the bugzilla.
  44. 44. QUESTIONS
  45. 45. THANK YOU

×