Abhishek Kumar - CloudStack Locking Service

CloudStack Locking
Service
Abhishek Kumar
Software Developer, ShapeBlue
abhishek.kumar@shapeblue.com

About me
 Software Developer at ShapeBlue
 From Meerut, India
 Previously used to develop applications for desktops and mobile
 Worked on CloudStack features – Domain, zone specific offerings, VM
ingestion, container service plugin
 Love going gym, watching action-thriller movies, discussing politics

Objective
New locking service, manager and pluggable
interface with ZooKeeper (using curator
framework), Hazelcast or other distributed lock
managers.
Outcome: cloudstack db can be HA enabled with
multi-master read/write, using clustering
solution.
Peer discovery

Why?
 CloudStack can control 100s of hosts with 1000s of virtual machines
 Can support multiple management servers
 But for database!!!
 Limited support for replication and high availability. Cannot use mult-
master replication
 Implementing active-active, active-passive configuration becomes
difficult
 Database clustering not possible

Topics
 Locking Introduction
 Database locking
 Locking in CloudStack and its limitations
 Distributed locks
 Introduction
 Different Distributed Lock Managers
 Overview of Apache Zookeeper
 Overview of Hazelcast
 Demo
 Implementation of new locking service, pluggable interface with Apache Zookeeper-
Curator, Hazelcast
 Comparison, current limitation, future work
 Q & A

Lock
 Lock or mutex is a synchronization
mechanism for enforcing limits on access to a
resource in an environment where there are
many threads of execution. A lock is designed
to enforce a mutual exclusion concurrency
control policy
 Locks – usually threads of same process,
Mutex – threads from different processes
 Can be advisory or manadatory
 Granularity - measure of the amount of data
the lock is protecting. Fine for smaller,
specific data and coarse for larger data
 Issues –
 Overhead
 Contention
 Deadlock

Database locks
Ensuring transaction synchronicity
 Mainly two types,
 Pessimistic – Record is locked until the lock is released
 Optimistic – System keeps copy of initial read and later verifies data on release
accepting or rejecting update
Wikipedia uses optimistic locking for document editing
 Different granularity
 Database level
 File level
 Table level
 Page or block level
 Row level
 Column level

DB Locks Issues – Lock contention
Many sessions requiring frequent access to same lock for short amount of time resulting
in “single lane bridge”
Example: Deploying 100s of VM simultaneously

DB Locks Issues – Long Term Blocking
Many sessions requiring frequent access to same lock for long period of time resulting in
blocking of all dependent sessions

DB Locks Issues – Database Deadlocks
Occurs when two or
more transactions
hold dependent
locks and neither
can continue until
the other releases

DB Locks Issues – contd.
Other issues,
 Overhead
 Difficult to debug
 Priority inversion
 Convoying

Locking in CloudStack
 Uses MySQL lock functions to acquire and release locks on database
connections
 A hashmap is kept for all the acquired locks and their connection in the code
 Fast and effective as locking is taking place in database itself.

Locking in CloudStack – contd.
Limitations with current design,
 Cannot work with MySQL clustering solutions
This is due to locking functions – GET_LOCK(), RELEASE_LOCK() are not supported by
clustering solutions like Percona XtraDB, https://www.percona.com/doc/percona-
xtradb-cluster/LATEST/limitation.html
 HA enabled, multi-master DB cannot be implemented
Solution could be implementing distributed locks using available distributed
locking services

Distributed Locks
 Synchronize accesses to shared resources for the applications distributed
across a cluster on multiple machines
 Coordination between different nodes
 Ensure only one server can write to a database or write to a file.
 Ensure that only one server can perform a particular action.
 Ensure that there is a single master that processes all writes

Distributed Locking - Implementation
 Complex compared to conventional OS or relational DB locking as more
variables present, network, different nodes which could individually fail at
any time
 Different algorithms – Redis, Paxos, etc.
 Implementation of Distributed Locking Manager (DLM)
 Different types of lock DLM can grant,
Null, Concurrent Read, Concurrent Write, Protected Read, Protected Write, Exclusive

Distributed Locking - Implementation
Null (NL)
Concurrent Read (CR)
Concurrent Write (CW)
Protected Read (PR)
Protected Write (PW)
Exclusive (EX)

Distributed Locking Manager
 Apache ZooKeeper – high performance
coordination service for distributed
systems, can be used for distributed
locks
 Redis - advanced key-value cache and
store, can be used to implement Redis
algorithm for distributed lock
management.
 Hazelcast - distributed In-Memory Data
Grid platform for Java
 Chubby - lock service for loosely
coupled distributed systems developed
by Google
 Etcd, Consul

Apache ZooKeeper
 An open source, high-performance coordination service for distributed
applications.
 Exposes common services in simple interface:
 naming
 configuration management
 locks & synchronization
 group services
… developers don't have to write them from scratch
 Build your own on it for specific needs.
 Apache Curator – Java client library

Apache ZooKeeper contd.
• ZooKeeper Service is replicated over a set of machines
• All machines store a copy of the data (in memory)
• A leader is elected on service startup
• Clients only connect to a single ZooKeeper server &
maintains a TCP connection.
• Client can read from any Zookeeper server, writes go
through the leader & needs majority consensus.

Apache ZooKeeper Implementation
Need to use Curator framework with it. Different implementation recipes
available, https://github.com/apache/zookeeper/tree/master/zookeeper-recipes
 Start an embedded server, create client to connect to this server,
File dir = new File(tempDirectory, "zookeeper").getAbsoluteFile();
zooKeeperServer = new ZooKeeperServer(dir, dir, tickTime);
serverFactory = new NIOServerCnxnFactory();
serverFactory.configure(new InetSocketAddress(clientPort), numConnections);
serverFactory.startup(zooKeeperServer);
…
RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
curatorClient = CuratorFrameworkFactory.newClient(String.format("127.0.0.1:%d", clientPort), retryPolicy);
curatorClient.start();
 Locks can be acquired and released for a given name
InterProcessMutex lock = new InterProcessMutex(curatorClient, String.format("%s%s", tempDirectory, name));
lock.acquire(timeoutSeconds, TimeUnit.SECONDS)
…
lock.release();

Hazelcast
 The Hazelcast IMDG operational in-memory computing
platform helps leading companies worldwide manage their
data and distribute processing using in-memory storage
and parallel execution for breakthrough application speed
and scale.
 Hazelcast implement a distributed version of some Java
data structures like Maps, Set, Lists, Queue and Lock
 ILock is the distributed implementation of
java.util.concurrent.locks.Lock.

Hazelcast - Implementation
 Define config, set CPSubsytem member, create HazelcastInstance objects
Config config = new Config();
CPSubsystemConfig cpSubsystemConfig = config.getCPSubsystemConfig();
cpSubsystemConfig.setCPMemberCount(3);
hazelcastInstance = Hazelcast.newHazelcastInstance(config);
...
 Locks can be acquired and released
FencedLock lock = hazelcastInstance.getCPSubsystem().getLock(name);
lock.tryLock(timeoutSeconds, TimeUnit.SECONDS);
...
lock.unlock();

Locking Service in CloudStack
 Pluggable service implementation using
existing distributed lock managers for
different locking service plugins
 Global setting to control the locking
service, db.locking.service.plugin
 Current implementation using Apache
ZooKeeper and Hazelcast

Why generic framework design
 Choice
 Easier to develop
 Performance difference

Locking Service in CloudStack - Issues
 Apart from traditional issues wrt locking service, speed will be a major issue
compared to existing database locking in CloudStack. Since locking will be
managed by a server it will create an additional overhead
0
2
4
6
8
10
12
Lock 1 Lock 2 Lock 3 Lock 4 Lock 5 Lock 6 Lock 7 Lock 8 Lock 9 Lock
10
Lock
11
Lock
12
Lock
13
Lock
14
Lock
15
Timeinmilliseconds
Locks
Lock acquire performance during VM deployment
Current DB Locking ZooKeeper Hazelcast

Future work
 Current state – basic implementation with HazelCast, ZooKeeper
 Testing with database clustering
 Optimization for better performance
 Implement peer discovery for getting rid of mshost table and using locking
service for discovering different management server nodes.
 Code cleanup and start PR
 Target 4.15(if not 4.14)

Thank You!
Thoughts and Question

Abhishek Kumar - CloudStack Locking Service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Abhishek Kumar - CloudStack Locking Service

Similar to Abhishek Kumar - CloudStack Locking Service (20)

More from ShapeBlue

More from ShapeBlue (20)

Recently uploaded

Recently uploaded (20)

Abhishek Kumar - CloudStack Locking Service