Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Abhishek Kumar - CloudStack Locking Service

50 views

Published on

As CloudStack cannot work with any mysql clustering solution, it is time to explore a new locking service, manager and pluggable interface which would allow CloudStack DB to be HA enabled with multi-master read/write. Talk will focus on, - Need for a locking service and challenges with existing CloudStack architecture - Different possible clustering solution that can be adopted - Showcasing a PoC for future implementation with minimal changes to existing architecture using percona xtradb or any other clustering solution - Additionally, explore the idea of getting rid of mshost table, and use locking service to find about other management servers.

Published in: Technology
  • Be the first to comment

Abhishek Kumar - CloudStack Locking Service

  1. 1. CloudStack Locking Service Abhishek Kumar Software Developer, ShapeBlue abhishek.kumar@shapeblue.com
  2. 2. About me  Software Developer at ShapeBlue  From Meerut, India  Previously used to develop applications for desktops and mobile  Worked on CloudStack features – Domain, zone specific offerings, VM ingestion, container service plugin  Love going gym, watching action-thriller movies, discussing politics
  3. 3. Objective New locking service, manager and pluggable interface with ZooKeeper (using curator framework), Hazelcast or other distributed lock managers. Outcome: cloudstack db can be HA enabled with multi-master read/write, using clustering solution. Peer discovery
  4. 4. Why?  CloudStack can control 100s of hosts with 1000s of virtual machines  Can support multiple management servers  But for database!!!  Limited support for replication and high availability. Cannot use mult- master replication  Implementing active-active, active-passive configuration becomes difficult  Database clustering not possible
  5. 5. Topics  Locking Introduction  Database locking  Locking in CloudStack and its limitations  Distributed locks  Introduction  Different Distributed Lock Managers  Overview of Apache Zookeeper  Overview of Hazelcast  Demo  Implementation of new locking service, pluggable interface with Apache Zookeeper- Curator, Hazelcast  Comparison, current limitation, future work  Q & A
  6. 6. Lock  Lock or mutex is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy  Locks – usually threads of same process, Mutex – threads from different processes  Can be advisory or manadatory  Granularity - measure of the amount of data the lock is protecting. Fine for smaller, specific data and coarse for larger data  Issues –  Overhead  Contention  Deadlock
  7. 7. Database locks Ensuring transaction synchronicity  Mainly two types,  Pessimistic – Record is locked until the lock is released  Optimistic – System keeps copy of initial read and later verifies data on release accepting or rejecting update Wikipedia uses optimistic locking for document editing  Different granularity  Database level  File level  Table level  Page or block level  Row level  Column level
  8. 8. DB Locks Issues – Lock contention Many sessions requiring frequent access to same lock for short amount of time resulting in “single lane bridge” Example: Deploying 100s of VM simultaneously
  9. 9. DB Locks Issues – Long Term Blocking Many sessions requiring frequent access to same lock for long period of time resulting in blocking of all dependent sessions
  10. 10. DB Locks Issues – Database Deadlocks Occurs when two or more transactions hold dependent locks and neither can continue until the other releases
  11. 11. DB Locks Issues – contd. Other issues,  Overhead  Difficult to debug  Priority inversion  Convoying
  12. 12. Locking in CloudStack  Uses MySQL lock functions to acquire and release locks on database connections  A hashmap is kept for all the acquired locks and their connection in the code  Fast and effective as locking is taking place in database itself.
  13. 13. Locking in CloudStack – contd. Limitations with current design,  Cannot work with MySQL clustering solutions This is due to locking functions – GET_LOCK(), RELEASE_LOCK() are not supported by clustering solutions like Percona XtraDB, https://www.percona.com/doc/percona- xtradb-cluster/LATEST/limitation.html  HA enabled, multi-master DB cannot be implemented Solution could be implementing distributed locks using available distributed locking services
  14. 14. Distributed Locks  Synchronize accesses to shared resources for the applications distributed across a cluster on multiple machines  Coordination between different nodes  Ensure only one server can write to a database or write to a file.  Ensure that only one server can perform a particular action.  Ensure that there is a single master that processes all writes
  15. 15. Distributed Locking - Implementation  Complex compared to conventional OS or relational DB locking as more variables present, network, different nodes which could individually fail at any time  Different algorithms – Redis, Paxos, etc.  Implementation of Distributed Locking Manager (DLM)  Different types of lock DLM can grant, Null, Concurrent Read, Concurrent Write, Protected Read, Protected Write, Exclusive
  16. 16. Distributed Locking - Implementation Null (NL) Concurrent Read (CR) Concurrent Write (CW) Protected Read (PR) Protected Write (PW) Exclusive (EX)
  17. 17. Distributed Locking Manager  Apache ZooKeeper – high performance coordination service for distributed systems, can be used for distributed locks  Redis - advanced key-value cache and store, can be used to implement Redis algorithm for distributed lock management.  Hazelcast - distributed In-Memory Data Grid platform for Java  Chubby - lock service for loosely coupled distributed systems developed by Google  Etcd, Consul
  18. 18. Apache ZooKeeper  An open source, high-performance coordination service for distributed applications.  Exposes common services in simple interface:  naming  configuration management  locks & synchronization  group services … developers don't have to write them from scratch  Build your own on it for specific needs.  Apache Curator – Java client library
  19. 19. Apache ZooKeeper contd. • ZooKeeper Service is replicated over a set of machines • All machines store a copy of the data (in memory) • A leader is elected on service startup • Clients only connect to a single ZooKeeper server & maintains a TCP connection. • Client can read from any Zookeeper server, writes go through the leader & needs majority consensus.
  20. 20. Apache ZooKeeper Implementation Need to use Curator framework with it. Different implementation recipes available, https://github.com/apache/zookeeper/tree/master/zookeeper-recipes  Start an embedded server, create client to connect to this server, File dir = new File(tempDirectory, "zookeeper").getAbsoluteFile(); zooKeeperServer = new ZooKeeperServer(dir, dir, tickTime); serverFactory = new NIOServerCnxnFactory(); serverFactory.configure(new InetSocketAddress(clientPort), numConnections); serverFactory.startup(zooKeeperServer); … RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); curatorClient = CuratorFrameworkFactory.newClient(String.format("127.0.0.1:%d", clientPort), retryPolicy); curatorClient.start();  Locks can be acquired and released for a given name InterProcessMutex lock = new InterProcessMutex(curatorClient, String.format("%s%s", tempDirectory, name)); lock.acquire(timeoutSeconds, TimeUnit.SECONDS) … lock.release();
  21. 21. Hazelcast  The Hazelcast IMDG operational in-memory computing platform helps leading companies worldwide manage their data and distribute processing using in-memory storage and parallel execution for breakthrough application speed and scale.  Hazelcast implement a distributed version of some Java data structures like Maps, Set, Lists, Queue and Lock  ILock is the distributed implementation of java.util.concurrent.locks.Lock.
  22. 22. Hazelcast - Implementation  Define config, set CPSubsytem member, create HazelcastInstance objects Config config = new Config(); CPSubsystemConfig cpSubsystemConfig = config.getCPSubsystemConfig(); cpSubsystemConfig.setCPMemberCount(3); hazelcastInstance = Hazelcast.newHazelcastInstance(config); ...  Locks can be acquired and released FencedLock lock = hazelcastInstance.getCPSubsystem().getLock(name); lock.tryLock(timeoutSeconds, TimeUnit.SECONDS); ... lock.unlock();
  23. 23. Locking Service in CloudStack  Pluggable service implementation using existing distributed lock managers for different locking service plugins  Global setting to control the locking service, db.locking.service.plugin  Current implementation using Apache ZooKeeper and Hazelcast
  24. 24. Demo
  25. 25. Why generic framework design  Choice  Easier to develop  Performance difference
  26. 26. Locking Service in CloudStack - Issues  Apart from traditional issues wrt locking service, speed will be a major issue compared to existing database locking in CloudStack. Since locking will be managed by a server it will create an additional overhead 0 2 4 6 8 10 12 Lock 1 Lock 2 Lock 3 Lock 4 Lock 5 Lock 6 Lock 7 Lock 8 Lock 9 Lock 10 Lock 11 Lock 12 Lock 13 Lock 14 Lock 15 Timeinmilliseconds Locks Lock acquire performance during VM deployment Current DB Locking ZooKeeper Hazelcast
  27. 27. Future work  Current state – basic implementation with HazelCast, ZooKeeper  Testing with database clustering  Optimization for better performance  Implement peer discovery for getting rid of mshost table and using locking service for discovering different management server nodes.  Code cleanup and start PR  Target 4.15(if not 4.14)
  28. 28. Thank You! Thoughts and Question

×