Md sal clustering internals

MD-SAL Clustering Internals
Moiz Raja, Technical Lead, Cisco Systems
IRC: moizer
#ODSummit

▪ Abhishek Kumar
▪ Basheeruddin Ahmed
▪ Colin Dixon
▪ Harman Singh
▪ Kamal Rameshan
▪ Robert Varga
▪ Tony Tkacik
My Collaborators
3
Tom Pantelis
▪ Luis Gomez
▪ Phillip Shea
▪ Radhika Hirannaiah
▪ and many more…

▪ Architecture
▪ Modules
▪ Flows
▪ Diagnostics
▪ Questions
Agenda

Subsystems
6
member-1
member-2
member-3
Distributed Data Store
member-1 member-2
Remote RPC Connector

High Level Architecture
Distributed Data Store Remote RPC Connector
Persistence Remoting Clustering

Actor Systems
8
Distributed Data Store Remote RPC Connector
Actor
Hierarchy
Configuration
Dispatchers

Data Synchronization
9
Data store
Synchronized Data Tree
Raft for Distributed Consensus
Remote RPC
Synchronized RPC Registry
Gossip for data distribution

#ODSummit
Distributed Data Store Architecture

Accessing Remote Data
11
Client
member-1 member-2

Location Transparency
12
Client
member-1 member-2
DistributedDataStore

13
DOMStore

Communication
14
Client
member-1 member-2
Shard

Data Distribution
15
Client
member-1
member-2
member-3
topology
inventory

Module Based Shards
16
/
/inventory /topology /toaster

HA
17
member-2
member-3
inventory – follower -1
inventory – follower - 2
Client
member-1
inventory – leader

Raft Distributed Consensus
18
discoversnodewithhigherterm
Follower
Candidate
Leader
starts up/
recovers
times out,
starts elections
receives votes
from majority
of nodes
times out,
restarts elections
follower-2follower-1
leader
Election Replication/Consensus

Journal replication
19
leader
follower-1
follower-2
transaction-1
transaction-2
transaction-3
transaction-4
transaction-1
transaction-2
transaction-3
transaction-4
transaction-1
transaction-2
transaction-3
transaction-4

Snapshot Replication
20
leader
follower-1
follower-2

Durability/Recovery
21
Journal
Snapshots

#ODSummit
Remote RPC Architecture

Invoking a Remote RPC
23
Consumer
member-1 member-2
Provider

Location Transparency
24
Consumer
member-1 member-2
Provider
RpcProviderProxy
RemoteRpcBroker

RPC Registry
25
Provider
RPC
Registration
Listener
RPC Registry

RPC Registry Replication - Gossip
26
version=1
version=2
modify
change
version
Local bucket updates
change version
m1,v1
m2,v5
m3,v7
All buckets and their
versions known to all
members
Every 1 second members
send all known bucket
versions to any one peer
m1
m2 m3status
m2 m3
m1
local versions higher – send update
local versions lower – send status to sender

Modules
28
sal-clustering-commons
sal-akka-raft sal-remoterpc-connector
sal-distributed-datastore
sal-clustering-config
sal-akka-raft-example
sal-dummy-distributed-datastore
clustering-test-app

▪ Some common messages
▪ Actor base classes
▪ The Protobuf messages used in Helium
▪ The Protobuf NormalizedNode serialization code
▪ The NormalizedNode streaming code
▪ Other miscellaneous utility classes
sal-clustering-commons
29

▪ Implementation of the Raft Algorithm on top of akka
▪ Uses akka-persistence for durability
▪ Provides a base class called RaftActor which when can be extended
by anyone who wants to replicate state
▪ See sal-akka-raft-example which provides a simple implementation of
a replicated HashMap
sal-akka-raft
30

▪ ConcurrentDOMDataBroker
▪ DistributedDataStore
▪ Implementation of the DOMStore SPI
▪ Shard built on top of RaftActor
▪ Creates Shards based on Sharding strategy
▪ Code for a client to interact with the Shard Leader
sal-distributed-datastore
31

▪ RemoteRpcProvider
▪ Default RPC Provider. Invoked when an RPC is not found in the local
MD-SAL registry.
▪ Code for BucketStore which provides a mechanism to replicate state
based on Gossip
▪ Code for RpcBroker which allows invoking a remote rpc
sal-remoterpc-connector
32

Startup
34
DistributedConfigDataStoreProviderModule
ShardManager
Shard1 Shard Shard3 Shard4
createInstance
ActorContextwaitTillReadyLatch
create & waitTillReady

Recovery
35
Shard1 Shard Shard3 Shard4
ShardManager
read last known state from disk
ready
waitTillReadyLatch
countDown

▪ Recovery must be complete
▪ All Shard Leaders must be known
▪ Three messages are monitored by ShardManager
▪Cluster.MemberStatusUp
▪Used to figure out the address of a cluster member
▪LeaderStateChanged
▪Used to figure out if a Follower has a different Leader
▪ShardRoleChanged
▪Use to figured out any changes in a Shard’s Role
▪ Waiting is not infinite, by default it lasts only 90 seconds but is
configurable
▪ Will block config sub-system
Waiting for Ready
36

Creating a Transaction
37
newReadWriteTransaction
TransactionProxy
create

First Operation
38
ActorContext.findPrimary
PrimaryCache.lookup/ShardManager.findP
rimary
Found?
LocalTransactionContextRemoteTransactionContext
NoOpTransactionContext
TransactionProxy
write(“inventory”, node)
Local?
N
Y
N Y

Transactions
39
Client
Client
Local Transaction Remote Transaction
member-1
member-1member-2

Local Transaction Optimization
40
LocalTransactionContext Shard - Leader
write
merge
delete
ready
member-1

Remote Transaction Optimization
41
RemoteTransactionContext Shard Leader
write
merge
delete
ready
write mod
merge mod
delete mod
member-1 member-2

Transaction Rate Limiting
42
rate-limit = 100 Tx/Sec
Tx
Cohort
Shard Leader
member-2
20ms
Tx
Cohort
50ms
Tx
Cohort
15ms
after rate-limit/2 transactions done….
new-rate-limit = 25 Tx/Sec

Operation Limiting
43
RemoteTransactionContext Shard Leader
write
merge
delete
write mod
merge mod
delete mod
member-1 member-2
…
…
block

Commit Coordination
44
Shard Leader
member-2
Shard
CommitCoordinator
Tx1 - ready
Tx2 - ready
Tx3 - ready
Tx1 - commit
Tx3 - commit
Tx3 - abort
Tx2 - commit
Tx1
Tx2
Tx3

Managing the in-memory journal
Replicated To All
45
Client leader
follower-1 follower-2
commit transaction
txn
txn txn

Managing the in-memory journal
Cluster member unavailable
46
Client leader
follower-1
commit transaction
txn
txn
txn
txn
txn
txn
txn
txn

Data Change Notifications
47
Client leader
follower-1 follower-2
commit transaction
txn
txn txn
notify

Startup
49
RemoteRpcBrokerModule
createInstance
RpcManager
RemoteRpcProvider
RpcBroker RpcRegistry RemoteRpcImpl RpcListener

Default RPC Delegate
50
RpcManager SchemaContext
DOMRpcProviderService
read all rpc definitions
registerImplementation(remoteRpcImpl)

RPC Registered
51
RpcProviderRegistry
addRoutedRpcImpl
RoutedRpcRegistration
registerPath
RpcListener
RpcRegistry

52
RemoteRpcImpl
invokeRpc
RpcRegistry
Route found?
RpcBroker
ExecuteRpc
FooService
throw Exception

53
RemoteRpcImpl
Consumer
Provider
member-1 member-2
RpcBroker
RpcRegistry
invokeRpc
invokeRpcfindRoute
ExecuteRpc

#ODSummit
Data Store Diagnostics

Transaction Tracing
55
Created txn member-2-txn-9400 of type READ_WRITE on chain member-2-txn-chain-13
Client
Server
Tx member-2-txn-9400 read /(urn:opendaylight:inventory?...
member-3-shard-inventory-operational: Creating transaction : shard-member-2-txn-9400
Tx member-2-txn-9400 Readying 1 transactions for commit
Tx member-2-txn-9400 commit
member-3-shard-inventory-operational: Readying transaction member-2-txn-9400
member-3-shard-inventory-operational: Committing transaction member-2-txn-9400
Tx member-2-txn-9400: commit succeeded
Cluster Member
Initiator
Counter
Transaction Type
Module
Data store type

Replication Tracing
56
Leader
Sending AppendEntries to follower member-2-shard-topology-operational: AppendEntries [term=2, leaderId=member-1-shard-
topology-operational, prevLogIndex=520, prevLogTerm=2, entries=[Entry{index=521, term=2}], leaderCommit=520,
replicatedToAllIndex=-1]
Follower
handleAppendEntries: AppendEntries [term=2, leaderId=member-2-shard-topology-operational,
prevLogIndex=520, prevLogTerm=2, entries=[Entry{index=521, term=2}], leaderCommit=520,
replicatedToAllIndex=-1]
handleAppendEntries returning : AppendEntriesReply [term=2, success=true, logLastIndex=521,
logLastTerm=2, followerId=member-1-shard-topology-operational]
handleAppendEntriesReply from member-2-shard-topology-operational: applying to log –
commitIndex: 521, lastAppliedIndex: 520
handleAppendEntriesReply - FollowerLogInformation for member-2-shard-topology-operational updated:
matchIndex: 521, nextIndex: 522

Shard MBean
57
org.opendaylight.controller:type=DistributedOperationalDataStore,Category=Shards,name=member-1-shard-inventory-operational
Operational
Config
member-1
member-2
member-3
default
inventory
topology
operational
config
Attributes
AbortTransactionsCount CommitIndex CommittedTransactionsCount CurrentTerm FailedTransactionsCount
FollowerInfo FollowerInitialSync
Status
InMemoryJournalData
Size
InMemoryJournalLogSize LastApplied
LastCommittedTransactionTime LastIndex LastTerm Leader RaftState
ReadOnlyTransaction
Count
ReadWriteTransactionCount WriteOnlyTransaction
Count
VotedFor and more….

ShardManager MBean
58
org.opendaylight.controller:type=DistributedOperationalDataStore,Category=ShardManager,name=shard-manager-operational
Operational
Config
operational
config
Attributes
• LocalShards
• SyncStatus

Data store GeneralRuntimeInfo MBean
59
org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRuntimeInfo
Operational
Config
Attributes
• TransactionCreationRateLimit

Transaction Commit Rate
MBean
60
org.opendaylight.controller.cluster.datastore:name=distributed-data-store.config.commit.rate
Attributes
• 50thPercentile
• 75thPercentile
• 90thPercentile
• and so on…
operational
config
• Count
• Min
• Max
• StdDev

Data store GeneralRuntimeInfo MBean
61
org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRuntimeInfo
Operational
Config
Attributes
• TransactionCreationRateLimit

Message Statistics MBean
62
org.opendaylight.controller.actor.metric:name=/user/shardmanager-config.msg-rate.ActorInitialized
Attributes
• 50thPercentile
• 75thPercentile
• 90thPercentile
• and so on…
operational
config
• Count
• Min
• Max
• StdDev
Message Name

#ODSummit
Remote RPC Diagnostics

RemoteRpcBroker MBean
64
org.opendaylight.controller:type=RemoteRpcBroker,name=RemoteRpcRegistry
Attributes
• BucketVersions
• GlobalRpc
• LocalRegisteredRoutedRpc
Operations
• findRpcByName
• findRpcByRoute

Message Statistics MBean
65
org.opendaylight.controller.actor.metric:name=/user/rpc/registry.msg-rate.AddOrUpdateRoutes
Attributes
• 50thPercentile
• 75thPercentile
• 90thPercentile
• and so on…
• Count
• Min
• Max
• StdDev
Message Name

▪ Deploy a cluster
▪ Run clustering integration tests
▪ Write an application that works in the cluster
▪ Write bugs to report features which you find missing
▪ Try running dsBenchMark on a cluster
▪ Test out replication using the dummy data store
▪ Check out the code
▪ Send email to controller-dev@lists.opendaylight.org with questions
Suggested Next Steps…
67

Md sal clustering internals

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Md sal clustering internals

Similar to Md sal clustering internals (20)

Recently uploaded

Recently uploaded (20)

Md sal clustering internals

Editor's Notes