Distributed	Transactions	Are	Dead,	
Long	Live	Distributed	Transactions!
Support	for	transactions	in	Orleans	2.0
Sergey	Bykov
Microsoft,	@sergeybykov
@msftorleans
https://github.com/dotnet/orleans/
Plan
§ Basics
§ Challenges	of	Distributed	Transactions
§ Existing	Approaches
§ Challenges	for	Orleans	developers
§ Orleans	transactions
§ How	they	work
§ Work	in	Progress
§ Conclusion
Basics:	Canonical	Example
Transfer	$100	from account A	to account B
Atomicity all	or	nothing
Consistency constraints:	e.g.	can’t	end	up	with	a	negative	balance
Isolation tx2 can’t	see	extra	$100	in	B	until	tx1	completes
Durability data	is	reliably	stored
DBMS	Takes	Care	of	Everything	…	Locally
Data	is	local
No	network	involved
Failures	are	mostly	correlated
Key	focus	is	on	performance	vs.	isolation/consistency	tradeoff
Serializable
Repeatable	read
Read	committed
Read	uncommitted	
Snapshot	isolation
…
Distributed	Tx	is	a	different	ball	game
SQL	Server
Oracle
X/Open	
Compliant
Coordinator
(DTC)
Resource
Managers
Begin	Tx
Commit	Tx
Prepare
Commit
Configuration	is	very	hard
Compatibility	is	hit	and	miss,	mostly	miss
Most	modern	vendors	don’t	support	integration
Performance	is	bad
Latency	– roundtrips	with	coordinator
Throughput	– locks	and	single	coordinator
Reliability	is	hard	to	achieve
https://blogs.msdn.microsoft.com/distributedservices/2011/11/22/troubleshooting-msdtc-communication-checklist/
CAP
Distributed	Transactions
Are	Dead
“Life	Beyond	Distributed	Transactions”
CQRS/ES	to	the	Rescue
Transfer	$100	with	Event	Sourcing
$100	A	->	B	 $100	A	->	B	
Bank
entity
Append-only	log
Transfer	
$100	
from	
Account	A	
to	
Account	B
Account	
A
entity
Account	
B
entity
-$100+$100
Withdraw	$100	
AckAck
CQRS	+	Event	Sourcing
No	Atomicity
Eventual	Consistency
No	Isolation
Durable
Works	well	for	many	scenarios.
What	to	do	when	you	need	more	than	that?
Strong	Consistency	Strikes	Back
or
New	Kids	on	The	Block
Spanner
Cloud	Spanner
Eric	Brewer,	https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45855.pdf
Cosmos	DB
Cosmos	DB:	Consistency	Options
Still	A	Single	Database!
Microsoft Orleans
Orleans:	Framework	for	the	Cloud*
*In	addition	to	cloud	services, Orleans	fits	equally	well	for	building	a	wide	variety	of	distributed	applications
For	all	engineers
Simple	yet	powerful
3x	–10x	less	code
No	inherent	bottlenecks
No	single	points	of	failure
Known	best	patterns	and	
practices
Programmingmodel
Scalablebydefault
Programming	Model	of	Orleans
Grain	is	an	object	with	a	stable	ID
Lives	forever,	virtually
Encapsulates	state
No	direct	external	access
Message	passing
Grain	manages	its	own	state
Multiple	storage	systems	can	be	used
Great	fit	for	Event	Sourcing
Isolated	logs	of	state	changes
Excellent	scalability
No	built-in	coordination
Frontends Business	Logic Storage
A C I D
Transfer	of	$100	in	Orleans
$100	A	->	B	
Account	Grain
A
Account	Grain
B
Bank	Grain
Persist
Persist
Ack
Ack
Ack
$100	A	->	B
Account	Grain
B
What	We	Are	Forced	To	Do
$100	A	->	B	
Bank	Grain
Ack
$100	A	->	B	
Record	request
with	a	unique	ID	to	dedup
Recovery	
mechanism
Record
completion
Account	Grain
A
Persist
Persist
Ack
Ack
Observation	by Martin	Kleppmann
Account	Grain
B
What	We	Want	for	Transfer	Operation
$100	A	->	B	
Bank	Grain
Ack
[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
Persist
Persist
Ack
Ack
Account	Grain
B
We	Want	ACID	Guarantees
$100	A	->	B	
Bank	Grain
Ack
[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
Persist
Persist
Ack
Ack
$500
$500
$400
$600
That’s	What	Orleans	Transactions	Provide
Bank	Grain	Interface
public interface IBankGrain : IGrainWithIntegerKey
{
[Transaction(TransactionOption.RequiresNew)]
Task Transfer(Guid fromAccount, Guid toAccount, uint amount);
}
Account	Grain	Interface
public interface IAccountGrain : IGrainWithGuidKey
{
[Transaction(TransactionOption.Required)]
Task Withdraw(uint amount);
[Transaction(TransactionOption.Required)]
Task Deposit(uint amount);
[Transaction(TransactionOption.Required)]
Task<uint> GetBalance();
}
Transfer	Operation
public class BankGrain : Grain, IBankGrain
{
Task Transfer(Guid fromAccount, Guid toAccount, uint amount)
{
var from = GrainFactory.GetGrain<IAccountGrain>(fromAccount);
var to = GrainFactory.GetGrain<IAccountGrain>(toAccount);
Task t1 = from.Withdraw(amount);
Task t2 = to.Deposit(amount);
return Task.WhenAll(t1, t2);
}
}
Account	Grain:	Balance	State	Facet
public class Balance
{
public uint Value { get; set; } = 1000;
}
public class AccountGrain : Grain, IAccountGrain
{
private readonly ITransactionalState<Balance> balance;
public AccountGrain(
[TransactionalState("balance")]
ITransactionalState<Balance> balance)
{
this.balance = balance;
}
}
Account	Grain:	Operations
Task IAccountGrain.Deposit(uint amount)
{
this.balance.State.Value += amount;
this.balance.Save();
return Task.CompletedTask;
}
async Task<uint> IAccountGrain.GetBalance()
{
return this.balance.State.Value;
}
Task IAccountGrain.Withdraw(uint amount)
{
this.balance.State.Value -= amount;
this.balance.Save();
return Task.CompletedTask;
}
This	Is	Pretty	Much	It!
Just	need	to	add	some	configuration	to	silos
var builder = new SiloHostBuilder()
.UseLocalhostClustering()
…
.AddMemoryGrainStorageAsDefault()
.UseInClusterTransactionManager()
.UseInMemoryTransactionLog()
.UseTransactionalState();
var host = builder.Build();
await host.StartAsync();
How	does	it	work?
Architecture
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/EldeebBernstein-TransactionalActors-MSR-TR-1.pdf
Architecture
Transaction
Manager
Storage
Cluster	of	Silos Silo
Transaction	
Agent
TM
Account	Grain
B
Bank	Grain[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
TC
Transaction	log
TM
Tx	N:
A(N-5)
B(N-2)
Validate(N,	{A,	B},	{N-5,	N-2})
Committed
Wait	for	
N-5	and	N-2
N	depends	on
Success
2-Phase	Commit
What	if	something	fails?
Account	Grain
B
Bank	Grain[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
TC
Transaction	log
TM
Tx	N:
A(N-5)
B(N-2)
Validate(N,	{A,	B},	{N-5,	N-2})
Abort
Wait	for	
N-5	and	N-2
N	depends	on
Error
2PC	Abort
Drawbacks
1. Cascading	aborts
§ Can	happen	only	due	to	server/OS	failures,	which	are	rare
2. Single-grain	transactions	require	validation
§ Only	affects	latency,	not	throughput
3. Stand-alone	TM	is	extra	operational	cost
4. Centralized	TM	is	a	scalability	bottleneck
§ Nice	problem	to	have
§ Can	apply	some	old	techniques…
Account	Grain
B
Bank	Grain[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
TC
Transaction	log
TM
Tx	N:
A(N-5)
B(N-2)
Validate({N,	O,	P,	Q,	X,	Y,	Z})
Committed	({N,O,P,Q,X})	Aborted({Y,Z})
Success
Batching
Throughput
8-core	VM
That	was	the	beginning	(Beta)
Partnership	with	Research
Christopher	Meiklejohn Alejandro	Tomsic	Sebastian	BurckhardtPhil	Bernstein
TM	v2
Account	Grain
B
Bank	Grain[Tx(RequiresNew)]
$100	A	->	B	
Account	Grain
A
TC
Transaction	log
TM
Tx	N:
A(N-5)
B(N-2)
Validate(N,	{A,	B},	{N-5,	N-2})
Committed
Success
TM
Drawbacks	of	TM	v1
1. Cascading	aborts
§ Can	happen	only	due	to	server/OS	failures,	which	are	rare
2. Single-grain	transactions	require	validation
§ Only	affects	latency,	not	throughput
3. Stand-alone	TM	is	extra	operational	cost
4. Centralized	TM	is	a	scalability	bottleneck
§ Nice	problem	to	have
Best	of	all…
All	Code	Is	on	GitHub
Conclusion
§ There’s	still	space	for	innovation,	even	in	decades	old	problems
§ SQL	<->	NoSQL	<->	Distributed	SQL	<->	Distributed	Transactions
§ Middle-Tier	stack	allows	to	do	quite	a	bit
§ While	staying	agnostic	of	storage
§ Open	Source	all	the	way
§ You	are	welcome	to	participate!
Gracias!
Sergey	Bykov
Microsoft,	@sergeybykov
@msftorleans
https://github.com/dotnet/orleans/

Distributed Transactions are dead, long live distributed transaction!