Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Moving to Nova Cells without Destroying the World

1,340 views

Published on

Note: Video recording of this presentation at the OpenStack Liberty Summit in Vancouver is available here: https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/moving-to-nova-cells-without-destroying-the-world

Your cloud has been growing for a while and you've realized you need Nova Cells to scale further. But you've already got thousands of VMs and hundreds of active users. What to do?

This talk describes Go Daddy's experience with live-converting the production cloud to Nova Cells, including tips and recommendations to help you do it, too.

- Brief overview of Nova Cells' theory of operation and basic configuration

- Environment preparation to get ready for the conversion

- Specific steps to complete the conversion with minimal service interruption

- Caveats and lessons learned

- Introduction to Cells v2, and why you might want to wait for Kilo to convert.

Published in: Technology
  • Be the first to comment

Moving to Nova Cells without Destroying the World

  1. 1. CELLS INTRODUCTION How to scale nova? http://docs.openstack.org/openstack-ops/content/scaling.html
  2. 2. CELLS INTRODUCTION Use cells to overcome … • Large number of nova-computes • Single message queue instance • Complicated scheduling • Multi-site behind one API 3
  3. 3. CELLS INTRODUCTION Cells defined • Hierarchy of Nova instances • Each has database, message queue, scheduler, and compute • Message routing between cells to perform operations • Top-level API cell for nova-api and cell scheduling • Overrides the default compute API class • Lots of caveats • This is cells v1 (v2 in Liberty) 4
  4. 4. CELLS INTRODUCTION 5 http://comstud.com/cells.pdf
  5. 5. CELLS INTRODUCTION 6 http://comstud.com/cells.pdf
  6. 6. CELLS INTRODUCTION More details to get started • Nova cells configuration reference • http://docs.openstack.org/juno/config-reference/content/section_compute-cells.htm • Openstack-dev cells disucssions • http://www.gossamer-threads.com/lists/openstack/dev/16277 • CERN’s cells architecture • http://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html • Folsom cells design summit slides • http://comstud.com/FolsomCells.pdf • Exploring OpenStack Nova Cells • http://www.dorm.org/blog/exploring-openstack-nova-cells/ • Talks by Rackspace, CERN, NeCTAR 7
  7. 7. PLANNING THE CONVERSION Goals • Get to cells before scaling fire drill • Keep nova RMQ, DB close to compute nodes • Maintain existing instances state • Little or no downtime 8
  8. 8. PLANNING THE CONVERSION Basic plan • Existing nova becomes first compute cell • Split RMQ cluster • Create new nova instance for API cell • Import data to API cell • Existing nova-api service until final cutover 9
  9. 9. PLANNING THE CONVERSION Basic plan 10
  10. 10. ENVIRONMENT PREP Getting ready • New servers for the API cell services • Database for nova API cell • Migrate non-nova services to new machines • Network ACLs • Check DNS 11
  11. 11. ENVIRONMENT PREP Extra credit: Split RabbitMQ cluster • Not strictly necessary! • To minimize downtime and maintain state • First add new nodes • Split and contract cluster 12
  12. 12. heat neutron glance nova ceilometer ENVIRONMENT PREP Expand RabbitMQ cluster 13 Original RMQ/App Servers (to be: compute cell)
  13. 13. heat neutron glance nova ceilometer ENVIRONMENT PREP Expand RabbitMQ cluster 14 Original RMQ/App Servers (to be: compute cell) New RMQ/App Servers (to be: API cell)
  14. 14. heat neutron glance nova ceilometer ENVIRONMENT PREP Expand RabbitMQ cluster 15 Original RMQ/App Servers (to be: compute cell) New RMQ/App Servers (to be: API cell)
  15. 15. heat neutron glance nova ceilometer ENVIRONMENT PREP Reconfigure non-nova services 16 Original RMQ/App Servers (to be: compute cell) New RMQ/App Servers (to be: API cell)
  16. 16. heat neutron glance nova ceilometer ENVIRONMENT PREP Split brain 17 Original RMQ/App Servers (to be: compute cell) New RMQ/App Servers (to be: API cell)
  17. 17. heat neutron glance nova ceilometer ENVIRONMENT PREP Remove opposite nodes 18 Compute Cell Servers (Original RMQ/App Servers) API Cell Servers (New RMQ/App Servers)
  18. 18. CONFIGURE COMPUTE CELL Set up record for parent cell nova-manage cell create --name=api --cell_type=parent --username=api_rmq_user --password=api_rmq_pass --hostname=api_rmq_host --virtual_host=api_rmq_vhost • Use the API cell RMQ servers! • Or use cells_config option and put this in json http://docs.openstack.org/juno/config-reference/content/section_compute-cells.html#cell-config-optional-json 19
  19. 19. CONFIGURE COMPUTE CELL 20 http://comstud.com/cells.pdf
  20. 20. CONFIGURE COMPUTE CELL Enable nova-cells in compute cell [cells] enable = true name = cell_01 cell_type = compute • Start up nova-cells, verify connections to RMQ • Do not restart nova-api after this! 21
  21. 21. CONFIGURE COMPUTE CELL Disable quotas in compute cell • Quotas will be enforced by the API cell [DEFAULT] quota_driver=nova.quota.NoopQuotaDriver 22
  22. 22. BOOTSTRAP NOVA FOR API CELL Install & configure nova as usual • Install packages, db sync • Use the API cell RMQ servers! • Configure cells options [cells] enable = true name = api cell_type = api • Don’t start services yet (need to import data) 23
  23. 23. BOOTSTRAP NOVA FOR API CELL Set up record for child cell nova-manage cell create --name=cell_01 --cell_type=child --username=comp_rmq_user --password=comp_rmq_pass --hostname=comp_rmq_host --virtual_host=comp_rmq_vhost • Use the compute cell RMQ servers! • Remember cells_config/json option 24
  24. 24. BOOTSTRAP NOVA FOR API CELL 25 http://comstud.com/cells.pdf
  25. 25. IMPORT NOVA DATA Seed API cell data • API cell needs flavor, quota, instance, etc. data • Must do this directly in SQL • Shut down nova-api to prevent changes while you do this mysqldump nova_orig_db table_name | mysql nova_api_cell_db 26
  26. 26. IMPORT NOVA DATA Tables to import • instance_types • instance_type_extra_specs • instance_type_projects • instances • instance_info_caches • block_device_mapping • instance_system_metadata • instance_groups • instance_group_member • instance_group_metadata • instance_group_policy • key_pairs • quota_classes • quota_usages • quotas • snapshots • snapshot_id_mappings • virtual_interfaces • volumes • May be others you need! 27
  27. 27. RESTART SERVICES Start up all nova services API Cell • nova-cells • nova-api • nova-consoleauth * • nova-spicehtml5proxy • nova-serialproxy 28 * http://blog.mgagne.ca/nova-cells-and-console-access/ Compute Cell • nova-cells • nova-cert • nova-conductor • nova-console • nova-scheduler • nova-network • nova-compute • (Maybe nova-api)
  28. 28. CAVEATS YMMV nova-cells is considered experimental Test it! So it won’t blow up in your face! 29
  29. 29. CAVEATS Things that just don’t work • Neutron vif plugging notifications to nova vif_plugging_is_fatal = false vif_plugging_timeout = 5 (But this causes a race condition) • Any notifications between cells and other services ceilometer http://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html 30
  30. 30. CAVEATS Things that just don’t work • nova cells-list “circular reference detected” bug https://bugs.launchpad.net/nova/+bug/1312002 https://review.openstack.org/#/c/106991/2/nova/cells/state.py • Console Auth Make sure to set cells/enable=true on all node types http://blog.mgagne.ca/nova-cells-and-console-access/ 31
  31. 31. CAVEATS Some objects are not cell-aware • Flavors and Server Groups Must exist in API cell and compute cell DB (with same IDs!) https://github.com/NeCTAR-RC/nova/commit/5abc8847dc89b162b6ae678176a5cfe4989144a9 • Block Devices http://blog.mgagne.ca/nova-cells-and-block-device-mapping/ • Security groups • ??? 32
  32. 32. CAVEATS Host aggregates and availability zones nova-api server read cell state from DB: https://github.com/NeCTAR-RC/nova/commit/6fe7057fb4957485d3bac06579ddc38c93458064 Add AZ support for cells: https://github.com/NeCTAR-RC/nova/commit/048bd2d6d438fb8fa9ad7d3e0d57e7d03c546f6f Support aggregate API in cells: https://github.com/NeCTAR-RC/nova/commit/8ca8828d191bc271460eb80567717fd15ef6167c Ability to filter cells capacity report: https://github.com/NeCTAR-RC/nova/commit/97921ef1010c5e5bca357d77682bd0ee42d6ffcc Print cell name in cell timeout exceptions: https://github.com/NeCTAR-RC/nova/commit/60f669ba1ed5221d71138a72fb2cf3b34c07a970 Use sysmetadata to get instances AZ in API cell: https://github.com/NeCTAR-RC/nova/commit/95e4cccac623c601e074a618ea71d121a359e00f Use sysmetadata to get instance_name in API cell: https://github.com/NeCTAR-RC/nova/commit/6bf1cf78b86bed99733e1119b891397dee15a65e 33
  33. 33. FOSS FTW! Thanks! 34
  34. 34. CAVEATS Other issues • nova.cells.messaging errors nova.cells.messaging OperationalError: (OperationalError) (1048, "Column 'instance_uuid' cannot be null") 'UPDATE instance_extra SET updated_at=%s, instance_uuid=%s WHERE instance_extra.id = %s’ No clue on this, but doesn’t seem to break anything • Database consistency between API and compute cells Communication interruption between cells can cause this Use case for running nova-api in compute cells 35
  35. 35. CELLS V2 A better way forward for nova • Cells is the default mode • No nova-cells service • nova-api calls directly to each cell’s DB and message queue https://wiki.openstack.org/wiki/Nova-Cells-v2 https://etherpad.openstack.org/p/kilo-nova-cells-manifesto 36
  36. 36. CELLS V2 Give me Liberty or give me death! • Experimental in Liberty • Transition from no cells  v2 should be seamless • Unclear how cells v1 will migrate to v2 • Unless you really need to go to cells right now … … wait for Liberty 37
  37. 37. 38 Thank you! @misterdorm Freenode: mdorman mdorman@godaddy.com http://x.co/yvrcells

×