Hybrid STM/HTM for Nested Transactions on OpenJDK

Hybrid STM/HTM for Nested
Transactions on OpenJDK
Keith Chapman
Purdue U
Tony Hosking
ANU/Data61, Purdue U
Eliot Moss
UMass

Motivation
STM has been around for ages
• But STM is slow
Commodity hardware for transactions available now
• But HTM approaches are only best effort
Out goal: Accelerate STM with HTM when possible

Nested Transactions
Allow composition of transactions
Two ﬂavors
• Closed Nested Transactions
• HTM inside STM not useful; must do all the same work
• Open Nested Transactions
• HTM inside STM avoids most TM work – HTM well-suited!

XJ: Transactional Java
XJ Language
• Supports ﬂat and closed/open/boosted nested transactions
The implementation supports hybrid transactions
• HTM support via Intel TSX
• HTM and STM can co-exist
Uses an Optimistic-reads / Pessimistic-writes protocol

TM Metadata
Txn Log
0 0Version # 0 0 0Txn ID 1
One metadata word for every object

STM Transaction Protocol (Read)
T1 Log
0 00 0

T1 Log
0 00 0
Check Mode

T1 Log
0 0
0 0
T1 Read

T1 Log
0 0
0 0
T1 Validate

T1 Log
0 0
T1 Commit

0 0
STM Transaction Protocol (Write)
T1 Log

0 0
T1 Log
Check Mode

T1 1
0 0
T1 Log
T1 Write

T1 11 0
T1 Log
T1 Commit

STM Conﬂict Detection
T1 Log
0 00 0
T2 Log
0 0

T1 Log
0 0
0 0
T2 Log
0 0
T1 Read

T1 Log
0 0
0 0
T2 Log
0 0
T2 Write
T2 1

T1 Log
0 0
0 0
T2 Log
0 0
T1 Validate
T2 1

T1 Log
0 0
T2 Log
0 0
T1 Abort
T2 1

Hybrid Transaction Protocol
STM – HTM conﬂicts detected by lock word accesses
• Explicit XABORT if locked by another transaction
• HTM reads – Read the metadata word
• STM writes modify the metadata word
• Causes HTM to abort
• HTM writes – Increment version number
• Causes STM read invalidation / HTM abort

Abstract Locking
&
Undo Operations

Abstract Locks for STM
Open / Boosted Atomic Method
Body
Acquire abstract locks
Release abstract locks
If top level transaction
Body
Log abstract locks
If nested transaction
Log undo operations

Abstract Locks for Hybrid TM
Body
Validate abstract locks
Body
Log abstract locks
If top level is HTM
Log undo operations

Why Validation Works
• HTM vs STM
• If abstract locks conﬂict they must touch some same
physical words in the abstract locking data structure
— otherwise they could not detect the conﬂict
Body

Why Validation Works
• HTM vs STM
• If abstract locks conflict they must touch some same
physical words in the abstract locking data structure
— otherwise they could not detect the conflict
• HTM vs HTM
• No conflict in the locking data structure because all
accesses to it are reads
• Any real conflicts that exist will occur on the actual
data structure
Body

STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM

STM and HTM Methods
STM needs logging HTM doesn't
Different actions during read/write
Different actions for abstract locks
HTM should fall back to STM
Maintain separate HTM and STM versions of methods

STM Method Variants
Original
(Non-txnal)

STM Method Variants
Original
(Non-txnal)
Can call
A B

STM Method Variants
Original
(Non-txnal)
Top-level Txn
STM
Can call
A B

STM Method Variants
STM
Transactionalized
Original
(Non-txnal)
Top-level Txn
STM
Can call
A B

STM Method Variants
STM
Transactionalized
Original
(Non-txnal)
New nested Txn
STM
Top-level Txn
STM
Can call
A B

Hybrid TM Method Variants
Original
(Non-txnal)

Original
(Non-txnal)Can call
A B

Router
Original
(Non-txnal)Can call
A B

Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B

Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B

HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B

Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B

Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
Router
Original
(Non-txnal)Can call
A B

Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B

Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B

Optimized
New nested Txn
HTM
HTM
Transactionalized
Top-level Txn
HTM
New nested
Txn HTM
STM
Transactionalized
Top-level Txn
STM
New nested
Txn STM
Router Nested Router
Original
(Non-txnal)Can call
A B
Parent is closed
A B
Parent is open
A B

XJ System Architecture
XJ source code

XJ source code XJ Compiler
standard Java
bytecode
compile

XJ source code XJ Compiler
standard Java
bytecode XJ Rewriter
bytecode
+ run-time calls
compile load

XJ source code
XJ run-time
library
XJ Compiler
standard Java
bytecode
+ run-time calls
HTM-enabled
JVM
compile load run

HTM 4-5 times faster than STM
XJ source code
XJ run-time
library
XJ Compiler
standard Java
bytecode
+ run-time calls
HTM-enabled
JVM
compile load run

VM Modiﬁcations
Kept to a minimum
Modiﬁcations done on OpenJDK:
• Native methods to begin, end, and abort a HW transaction
• Made them intrinsic to the HotSpot C1/C2 compilers
Had to go through several hoops to get HTM to work with HotSpot’s
optimizing compilers

Synchrobench
Micro-benchmarks to evaluate synchronization performance on various
data structures
Added ability to run multiple operations within a single transaction
(group size)
Included XJ versions of the benchmarks
• TransactionalFriendlyTreeSet
48-way, Intel Xeon E5-2690 v3 machine with 2 sockets of 12
hyperthreaded cores

Throughput(Normalized)
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Open nestedClosed nested
Group size 1
5% Updates

0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1
Group size 2
Open nestedClosed nested 5% Updates

0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1
Group size 2
Group size 4

0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1
Group size 2
Group size 4
Group size 8

0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1
Group size 2
Group size 4
Group size 8
Group size 16

0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
0
0.2
0.4
0.6
0.8
1
Threads
1 2 4 8 12 16 20 24 28 32 36 40 44 48
Group size 1
Group size 2
Group size 4
Group size 8
Group size 16
Group size 32

0
50
100
150
200
1
2
4
8
12
16
20
24
28
32
36
40
44
48
1
2
4
8
12
16
20
24
28
32
36
40
44
48
1
2
4
8
12
16
20
24
28
32
36
40
44
48
Group size 1 Group size 2 Group size 4
committedopsandabortedtxns(10
6
) open htm commits
closed htm commits
open stm commits
closed stm commits
htm aborts

Conclusions
STM and HTM can co-exist for nested transactions in Java
• Closed nesting — Similar to previous schemes
• Open nesting — Novel validation mechanism
• Implemented in OpenJDK on Intel TSX — Artifact evaluated

Conclusions
When it works, HTM is ~4-5× faster than STM

Conclusions
Open nesting increases the envelope of effectiveness for HTM

Conclusions
Open nesting increases the envelope of effectiveness for HTM
Production VM would need deeper modiﬁcation

Hybrid STM/HTM for Nested Transactions on OpenJDK

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Hybrid STM/HTM for Nested Transactions on OpenJDK

Similar to Hybrid STM/HTM for Nested Transactions on OpenJDK (20)

Recently uploaded

Recently uploaded (20)

Hybrid STM/HTM for Nested Transactions on OpenJDK