Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenStack Trove in Production at HP - TroveDay 2014

2,341 views

Published on

Presentation by Vipul Sabhaya, Software Development Lead, HP Cloud at OpenStack Trove Day 2014

Published in: Technology
  • Be the first to comment

OpenStack Trove in Production at HP - TroveDay 2014

  1. 1. OpenStack Trove Day August 19, 2014 Trove in Production at HP Vipul Sabhaya, Software Development Lead, HP Cloud
  2. 2. What is this about? • Trove • How to deploy Trove with HA • How we do config management • Monitoring Trove • Operating Trove 8/19/14 tesora.com 2
  3. 3. Trove • Database as a Service • MySQL • MongoDB • Cassandra • Postgres • … • Integrated Openstack Project • Icehouse Release 8/19/14 tesora.com 3
  4. 4. Architecture 8/19/14 tesora.com 4
  5. 5. Which Cloud? • Trove has only API dependencies • Overcloud (bare-metal)? • In-Cloud (vms)? 8/19/14 tesora.com 5
  6. 6. HA Trove • HA OverCloud • Availability Zones • HA Trove Control Plane • Control Plane across availability zones • Galera Cluster • RabbitMQ Cluster • Multiple Trove API, TaskManager, Conductors 8/19/14 tesora.com 6
  7. 7. How did we get here? • Salt Stack • Salt-based Trove deployment • https://github.com/saurabhsurana/trove-installer/ tree/master/saltstack • Salt-based Openstack deployment • https://github.com/EntropyWorks/salt-openstack 8/19/14 tesora.com 8
  8. 8. Configuration Management • Helps define/control • Packages and dependencies to be installed • Configuration files to be copied • Users / groups • Gives a reproducible state of the infrastructure • Highstate Trove-managed VMs on first boot 8/19/14 tesora.com 9
  9. 9. Remote Execution • No SSH • Can control infrastructure from single machine • Can define user and resource level access • Specifically useful for Trove to help manage DB instances 8/19/14 tesora.com 10
  10. 10. trove-api.sls trove: user.present: - name: trove trove-package: pip.installed: - name: trove - require: - user: trove /etc/trove/trove.conf: file.managed: - source: salt://trove/api/trove.conf - template: jinja - user: trove - require: - pip: trove-package - user: trove trove-api: service: - running - enable: True - watch: - pip: trove-package - file: /etc/trove/trove.conf 8/19/14 tesora.com 11
  11. 11. trove.conf # Number of child processes to run trove_api_workers = {{ pillar['trove_worker_threads']}} # AMQP Connection info rabbit_password = {{ pillar['trove_rabbit_password'] }} rabbit_hosts = {{ pillar['trove_rabbit_hosts'] }} rabbit_userid = {{ pillar['trove_rabbit_userid'] }} sql_connection = {{ pillar['trove_mysql_connection']}} {% if not pillar['devstack_setup'] %} # Updates service and instance task statuses if instance failed become active update_status_on_fail = True # how long to wait for guest agent to become active (in sec) (default is 300) usage_sleep_time = 30 usage_timeout = {{ salt['pillar.get']('trove_guestagent_active_timeout', 600) }} {% endif %} # Path to the extensions api_extensions_path = {{ pillar['trove_path'] }}/extensions/routes 8/19/14 tesora.com 12
  12. 12. Trove @ HP Helion • Image-based Deploys • TripleO • Trove Heat Templates • Trove Image Elements • Saltcloud / Nova wrapper -> Salt Master -> Trove • Seed -> Under -> Over -> Heat -> Trove 8/19/14 tesora.com 13
  13. 13. Operations - SaltStack • Most of the DBaaS operations are based on SaltStack • HA Deployment of Salt Masters • Control the access to infrastructure with Salt Stack • Control access to customer instances • To help Debug the issues • But protect the data and access to MySQL database • Each Trove guest instance becomes a minion 8/19/14 tesora.com 14
  14. 14. Trove Upgrades • Trove Datastore must be usable during all upgrades • Upgrades usually involve downtime • RPC Versioning • Upgrade Sequence that we follow: • Upgrade all the guest agents first (trove service) • Upgrade Task Manager and Conductor • Upgrade API servers • If new RPC method is introduced, it must be available on the Guest before an api operation is performed 8/19/14 tesora.com 15
  15. 15. Security of key Trove components • Use SSL • Trove API • RabbitMQ • Security Group • Database • Only Control Plane components needs access • RabbitMQ • Control Plane and All the guestagent needs access, but use the range where ever possible • Use separate DB and RMQ Credentials for each service 8/19/14 tesora.com 16
  16. 16. Monitoring of Trove Service / Instances • Trove doesn’t ship with monitoring • Upstart scripts respawn Trove services • Monitor Trove API ports with Nagios • Monitor RabbitMQ and DB connectivity from Control plane nodes 8/19/14 tesora.com 17
  17. 17. Monitoring of key Trove components • RabbitMQ • Number of Queues • Number of Sockets used • Number of Established Connections • Cluster Status • Failed access attempts • Database • MySQL standard monitoring • Cluster status • Slow query log • error.log for unauthorized/failed access attempts 8/19/14 tesora.com 18
  18. 18. Monitoring of key Trove components • Trove Guest Agent Heartbeat status • Trove Instance Audit (catch failed instances to help identify service issues) • Connectivity to trove instances from outside 8/19/14 tesora.com 19
  19. 19. What we learned? 8/19/14 tesora.com 20
  20. 20. OpenStack Trove : RabbitMQ • RabbitMQ • Up the default socket descriptor limit (as that will blow up pretty soon) • Number of queues and sockets will keep on growing, if you don’t enable RabbitMQ connections with heartbeat • Monitoring is the key to deal with RabbitMQ cluster configured with Mirrored queues 8/19/14 tesora.com 21
  21. 21. OpenStack Trove • GuestAgent Hearbeats (Service Status notifications) should be monitored for failure • Upgrading the Guest Agent is tricky on xsmall • Quota mismatch between Trove and Nova would be the biggest reason for instance failures • Resource mismatch between Trove and Nova • Schedule jobs to correct things 8/19/14 tesora.com 22
  22. 22. Thank you 8/19/14 tesora.com 23

×