SlideShare a Scribd company logo
1 of 43
Jade Alglave – Architecture and Technology
Monica Beckwith – Infrastructure LOB; Advanced Server Team
Applying Concurrency Cookbook
Recipes to SPEC JBB
2 © 2019 Arm Limited
About Us
Dr. Jade Alglave
• Memory model architect at Arm
• Co-developer and maintainer of the
herd+diy toolsuite with Luc Maranget
(INRIA, France)
• Co-developer and maintainer of the
Linux kernel memory model
Monica Beckwith
Managed runtime performance architect
at Arm
• Experience with OpenJDK HotSpot JIT, GC
• Experience with JMM and with strong and weakly
ordered architecture such as x86-64, SPARC, Arm64
and (very briefly) PPC64
3 © 2019 Arm Limited
What We Will Cover Today
Introduction to –
Memory Models (Java Relaxed)
Performance Methodology using Litmus Tests and Tools
Performance Analysis and Measurement using Java Micro-Benchmark Harness (JMH)
Performance Study on Scaling CPU Cores and Simultaneous Multithreading (SMT)
4 © 2019 Arm Limited
What We Will NOT Cover Today
Details of –
Java Relaxed memory model
JMH benchmarking
SPEC JBB 2015 benchmarking
CPU cores and simultaneous multithreading (SMT)
5 © 2019 Arm Limited
Memory Models
- What value can a load read?
6 © 2019 Arm Limited
Multi-threaded hardware with shared memory
structure
A multi-threaded, concurrency-aware program
Processor Threads and Software Threads …
The Ideal Concurrent World of Hardware and Software
* This drawing is heavily inspired by “timethreads“ concept in
Doug Lea’s ‘Concurrent Programming in Java: Design Principles and
Patterns, Second Edition ‘
Object 2Object 1
Thread 1
LockThread 2 help
Thread1 Threadn
Shared Memory
W R W R
* This drawing is heavily inspired by ‘A Tutorial Introduction to the
ARM and POWER Relaxed Memory Models’ by Sewell et. al.
7 © 2019 Arm Limited
Sequentially Consistent Shared Memory
Execution Order == Program Order == Sequential Order
Object
2
Object
1
Thread 1
Lock
Thread 2 help
=+
Thread 1 Thread n
Shared Memory
W R W R
Timeline (Program Order)Timeline (Single Global Execution Order)
A Sequentially
Consistent Machine
• No local reordering
• Writes become visible
simultaneously to all threads
8 © 2019 Arm Limited
Sequential Consistency in Practice
Store Buffering Example
Initially, X and Y are 0 in memory; foo and bar are local (register) variables:
p0 p1
a: X = 1; c: Y = 1;
b: foo = Y; d: bar = X;
What are the permissible values for foo and bar?
On Sequential Consistency, they are the values reachable by interleavings:
{a,b,c,d} {c,d,a,b} {a,c,b,d}
Therefore we cannot have foo and bar both equal to 0.
9 © 2019 Arm Limited
The Real Concurrent World of Hardware
Multi Processor Threads/Cores with Tiered Memory Structure
CPUn
CPU0
L1D$
L2
LLC
L1 I$
CPU1
L2
L2
Usually shared
between all
cores
Could be multi-threaded
(SMT)
Usually private
Memory Controller
DDR Banks
L1D$L1 I$
L1D$L1 I$
IO
Can have Load and Store
buffers
Can have out of order
execution
10 © 2019 Arm Limited
Strong Models based on
Total Store Ordering
(TSO)
CPUn
CPU0
L1
D$ L2
LLC
L1
I$
CPU1
L2
L2
Memory Controller
L1
D$
L1
I$
L1
D$
L1
I$
IO
Life In The Real World Without Sequential Consistency
Relaxed vs Strong Memory Model
Strong Memory Models Weaker Memory Models
X86, SPARC POWER, Arm v7
• A thread can see it’s own
write before other threads
• All other threads see the
write simultaneously:
Multiple Copy Atomic Model
• Local reordering is allowed
• All threads are not
guaranteed to see the write
simultaneously: Not Multiple
Copy Atomic Model
11 © 2019 Arm Limited
Strong Models based on
Total Store Ordering
(TSO)
CPUn
CPU0
L1
D$ L2
LLC
L1
I$
CPU1
L2
L2
Memory Controller
L1
D$
L1
I$
L1
D$
L1
I$
IO
Life In The Real World Without Sequential Consistency
Relaxed vs Strong Memory Model
Weaker Memory Models
X86, SPARC ARM v8
• A thread can see it’s own
write before other threads
• All other threads see the
write simultaneously:
Multiple Copy Atomic Model
• Local reordering is allowed
• All threads are guaranteed to
see the write simultaneously:
Multiple Copy Atomic Model
12 © 2019 Arm Limited
• Can we reason about our concurrent programs following Sequential Consistency?
• Probably if we had a formal, preferably executable, memory models to ensure that we
understand the guarantees given by architectures and programming languages.
• Here’s where Jade would come in talking about her cool tools that allow programmers
to explore the consequences of a given memory model or generate vast families of
litmus tests to run against hardware.litmus tests
Going Back To Our Store Buffer Example
herd
litmus testcat model
Is this behavior allowed by the cat model?
Yes/No
litmus
on HW
litmus test
Is this behavior observed on HW?
Yes/No
litmus test
configuration file (~cat model)
diy
diy.inria.fr
13 © 2019 Arm Limited
X86 SB
{x=0; y=0;}
P0 | P1 ;
MOV [x],$1 | MOV [y],$1 ;
MOV EAX,[y] | MOV EAX,[x] ;
exists (0:EAX=0 / 1:EAX=0)
Hardware architecture and test name
Initial state (x and y are shared memory location)
Thread names
Sequence of instructions displayed as
columns
Question: can we observe this final state of
given that x=0; y=0?
Store Buffer Litmus Test on a TSO Hardware
14 © 2019 Arm Limited
Armed With Knowledge
On TSO hardware, can we observe the final state of foo=0 and bar=0; given that
X=0; Y=0?
...
...
Yes! All production architectures allow the outcome where both foo and bar equal 0.
So, what do we do? …
Use mfence as needed.
15 © 2019 Arm Limited
Performance
Methodology
- Using Litmus Tests and Tools
To Avoid Barriers Where-ever
Possible
16 © 2019 Arm Limited
What & The Why Of Barriers / Fences?
Barriers ensure ordering properties
Barriers enforce strong order
Barriers (when inserted correctly) restore sequential consistency
Barriers can be potentially expensive
Data Memory Barriers on Arm:
DMB SY (full system)
DMB ST (wait for store to complete)
DMB LD (wait for only loads to complete)
17 © 2019 Arm Limited
Normal Load-Stores
No Barriers
Litmus test
AArch64 MP
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | LDR W2,[X3] ;
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 / 1:X2=0)
Check for any reorder
Check if X0 = 1 and X2 = 0 can exist on P1.
18 © 2019 Arm Limited
Normal Stores
Load Barrier
Litmus test
AArch64 MP+DMB.LD
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB LD;
MOV W2,#1 | LDR W2,[X3] ;
STR W2,[X3] | ;
exists
(1:X0=1 / 1:X2=0)
Check for Store reorder
19 © 2019 Arm Limited
Load & Store Barriers
Litmus test
AArch64 MP+DMB.LD+DMB.ST
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB LD;
DMB ST | LDR W2, [X3];
MOV W2,#1 | ;
STR W2,[X3] | ;
exists
(1:X0=1 / 1:X2=0)
Generated assembler
#START _litmus_P1
ldr w4,[x1]
dmb ld
ldr w5,[x2]
#START _litmus_P0
mov w7,#1
str w7,[x0]
dmb st
mov w6,#1
str w6,[x2]
Test MP+DMB.ST+DMB.LD Allowed
Histogram (3 states)
499999:>1:X0=0; 1:X2=0;
20 :>1:X0=0; 1:X2=1;
499981:>1:X0=1; 1:X2=1;
No
Witnesses
Positive: 0, Negative: 1000000
Condition exists (1:X0=1 / 1:X2=0) is NOT vali
Hash=4d15dccdb1da0ce51fac17dea068d047
Observation MP+DMB.ST+DMB.LD Never 0 1000000
20 © 2019 Arm Limited
But Aren’t Barriers Expensive?
Acquire – Release (implicit barrier) semantic – One way barriers
LDAR - All loads and stores that are after an LDAR in program order, … must be observed
after the LDAR
STLR - All loads and stores preceding an STLR …, must be observed before the STLR
Thinking about lock-free?
21 © 2019 Arm Limited
Performance Analysis and
Measurement
- Using Java Micro-
Benchmarking Harness (JMH)
22 © 2019 Arm Limited
JMM Rule 1 for Volatile
Stores
Order 1
Normal/Volatile Load
Normal/Volatile Store
Can’t
Reorder
Order 2 Volatile Store
Any load/store
(normal or volatile)
followed by a ‘volatile
store’ can’t be
reordered.
23 © 2019 Arm Limited
Volatile Stores - Barriers With DMBs
JMM rule: Any load/store followed by Volatile Store can’t be re-ordered
Litmus test
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB LD ;
LDR W2,[X1] | LDR W2,[X3] ;
DMB SY | ;
STR W2,[X3] | ;
exists
(1:X0=1 / 1:X2=0)
Results
Test MP Allowed
States 3
1:X0=0; 1:X2=0;
1:X0=0; 1:X2=1;
1:X0=1; 1:X2=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (1:X0=1 / 1:X2=0)
Observation MP Never 0 3
24 © 2019 Arm Limited
Volatile Stores - Can Barriers Be Replaced By STLR?
JMM rule: Any load/store followed by Volatile Store can’t be re-ordered
Litmus test
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDR W0,[X1] ;
STR W0,[X1] | DMB LD ;
MOV W2,#1 | LDR W2,[X3] ;
STLR W2,[X3] | ;
exists
(1:X0=1 / 1:X2=0)
Results
Test MP Allowed
States 3
1:X0=0; 1:X2=0;
1:X0=0; 1:X2=1;
1:X0=1; 1:X2=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (1:X0=1 / 1:X2=0)
Observation MP Never 0 3
25 © 2019 Arm Limited
JMM Rule 2 for Volatile
Stores
• A ‘volatile store’ followed by any
normal load/store CAN be
reordered.
• A ‘volatile store’ followed by any
volatile load/store CANNOT be
reordered.
Order 1
Can Reorder
Volatile Store
Can’t
Reorder
Order 2
Normal Load
Normal Store
Volatile Load
Volatile Store
26 © 2019 Arm Limited
Volatile Stores – Can Barriers Be Replaced By STLR?
STLR doesn’t guarantee that a subsequent volatile load/store will not be reordered
static class
TestNormalLoadPostVolatileStores {
volatile int intField1;
int intNorm1;
public
TestNormalLoadPostVolatileStores() {
intField1 = 32;
intNorm1 = intField1;
}
}
JMH Test Code
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
LDR W0,[X1] | LDR W0,[X1] ;
MOV W2,#1 | MOV W2,#1 ;
STLR W2,[X3] | STR W2,[X3] ;
exists
(0:X0=1 / 1:X0=1)
Litmus Test Positive Event Structure
27 © 2019 Arm Limited
Volatile Stores – Can Barriers Be Replaced By STLR?
Success! STLR + LDAR of volatiles provide that guarantee
Litmus test
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
LDR W0,[X1] | LDAR W0,[X1] ;
MOV W2,#1 | MOV W2,#1 ;
STLR W2,[X3] | STR W2,[X3] ;
exists
(0:X0=1 / 1:X0=1)
Results
Test LB Allowed
States 3
0:X0=0; 1:X0=0;
0:X0=0; 1:X0=1;
0:X0=1; 1:X0=0;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:X0=1 / 1:X0=1)
Observation LB Never 0 3
28 © 2019 Arm Limited
Volatile Stores – JMH Profiles
Success! STLR + LDAR of volatiles provide that guarantee
Load Acquire – Store Release Pair
stlr w11, [x10] ;*putfield intField1 ;
add x10, x2, #0xc
ldar w11, [x10] ;*getfield intField1
Data Memory Barrier (inner share-ability domain
str w10, [x2,#12]
dmb ish ;*putfield intField1
ldr w11, [x2,#12]
dmb ishld ;*getfield intField1
36% faster on max SMT count!!
29 © 2019 Arm Limited
JMM Rule 1 for Volatile
Loads
Order 1 Volatile Load
Can’t
Reorder
Order 2
Normal/Volatile Load
Normal/Volatile Store
A ‘volatile load’ followed by any
load/store (normal or volatile) can’t be
reordered.
30 © 2019 Arm Limited
Volatile Load - Barriers With DMBs
JMM rule: A Volatile Load followed by any load/store can’t be re-ordered
Litmus test
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
LDR W0,[X1] | LDR W0,[X1] ;
DMB SY | MOV W2,#1 ;
MOV W2,#1 | DMB SY ;
STR W2,[X3] | STR W2,[X3] ;
exists
(0:X0=1 / 1:X0=1)
Results
Test LB Allowed
States 3
0:X0=0; 1:X0=0;
0:X0=0; 1:X0=1;
0:X0=1; 1:X0=0;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:X0=1 / 1:X0=1)
Observation LB Never 0 3
31 © 2019 Arm Limited
Volatile Stores – Can Barriers Be Replaced By LDAR?
Success! LDAR provides the right guarantee
Litmus test
{
0:X1=x; 0:X3=y;
1:X1=y; 1:X3=x;
}
P0 | P1 ;
LDAR W0,[X1] | LDR W0,[X1] ;
| MOV W2,#1 ;
MOV W2,#1 | DMB SY ;
STR W2,[X3] | STR W2,[X3] ;
exists
(0:X0=1 / 1:X0=1)
Results
Test LB Allowed
States 3
0:X0=0; 1:X0=0;
0:X0=0; 1:X0=1;
0:X0=1; 1:X0=0;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:X0=1 / 1:X0=1)
Observation LB Never 0 3
32 © 2019 Arm Limited
Performance Study
- Scaling CPU Cores /
Simultaneous Multithreading
(SMT)
33 © 2019 Arm Limited
0.9
0.95
1
1.05
1.1
1 4 Max
lse wolse wdmb
Applying Cook Book Recipes to SPECJBB
Bigger is Better
Core or
Thread
Count
With LSE;
With LDAR
(baseline)
Without
LSE;
With LDAR
With LSE;
With DMB
1 1.00 0.97 0.92
4 1.00 1.00 0.95
Max 1.00 1.01 1.00
•Fences/Barriers(e.g. DMB ST, DMB LD, DMB SY)
•Atomics/LSE (e.g. LDREX/STREX or CAS)
34 © 2019 Arm Limited
Single Core Performance
The Quest and Guarantee of Sequential Consistency
Hardware improvements measured on Java micro-benchmarks (OpenJDK JDK11):
• Object/memory allocations up to 2.4x faster
• Object/array initializations up to 5x faster
– Smart issuing and cost reduction of
SW barriers (i.e. DMB) required
by Arm’s relaxed memory model
• Copy chars up to 1.6x faster
• New atomic instructions improve locking
throughput and contention latency by up to 2x
Cortex-A72 Code Neoverse N1
0.21% dmb ishst ;*new (0 cycles) 0.00%
7.73% (~3.5 cycles) ldr x11, [sp,#8]
1.75% ldr w17, [x11,#12];*getfield 0.06%
mov x2, x0
0.51% ldp w0, w18,[x11,#16];*getfield 0.11%
0.42% ldp w3, w1, [x11,#24];*getfield
org.openjdk.bench.vm.compiler.generated.StoreAfterStore_testAllocAndZeroStore_jmhTest::testAllocAndZeroStore_avgt_jmhStub
35 © 2019 Arm Limited
Ares Single Core Performance
Hardware improvements measured on SPECJBB (OpenJDK JDK11):
• Neoverse N1 CPU improves performance from Cortex-A72 by 1.7x
Software improvements measured on SPECJBB:
• JDK11 improves performance vs JDK8 on Arm by up to 14%
36 © 2019 Arm Limited
Resources
http://g.oswego.edu/dl/jmm/cookbook.html
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
http://hg.openjdk.java.net/code-tools/jmh-jdk-
microbenchmarks/file/92c55597888e/README.md
http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_an
d_Cookbook_A08.pdf
37 © 2019 Arm Limited
Appendix
38 © 2019 Arm Limited
The herd+diy Toolsuite
The tool suite supports and provides a formal consistency model for all of the following:
- Arm, IBM Power, Intel x86
- Nvidia GPUs
- C/C++
- Linux C
This means that a user can experiment with the concurrency implemented at all these levels and
generate systematic families of tests to probe implementations.
diy.inria.fr
39 © 2019 Arm Limited
A Store Barrier Litmus Test
PodWW Rfe PodRR Fre
Fre PodWR Fre PodWR
A litmus test source has three main sections:
The initial state defines the initial values of registers and memory locations. Initialisation
to zero may be omitted.
The code section defines the code to be run concurrently — above there are two threads.
Yes we know, our X86 assembler syntax is a mistake.
The final condition applies to the final values of registers and memory locations.
40 © 2019 Arm Limited
Executing the model: herd
herd
litmus testcat model
Is this behavior allowed by the cat model?
Yes/No
The herd tool allows a user to execute a formal model, written in the cat language.
Given a litmus test and a cat model, herd runs the litmus test against the cat model:
herd tries to determine whether the model allows the final state given in the test can be reached.
41 © 2019 Arm Limited
Running tests on hardware: litmus
litmus
on HW
litmus test
Is this behavior observed on HW?
Yes/No
The litmus tool allows a user to run a litmus test against hardware.
The tool gathers all the final states that were observed on hardware during multiple runs of the test.
We can then compare the output of herd and litmus, to check whether they are in accord.
42 © 2019 Arm Limited
Generating tests: diy
litmus test
configuration file (~cat model)
diy
The diy tool allows a user to generate interesting families of litmus tests.
It takes as input a configuration file, where a user should list the features of interest to them.
We can use families of diy-generated tests to run validation campaigns,
comparing the cat model and prototypes.
The Cloud to Edge Infrastructure Foundation
for a World of 1T Intelligent Devices
Thank You!

More Related Content

What's hot

JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013Vladimir Ivanov
 
Thesis - LLVM toolchain support as a plug-in for Eclipse CDT
Thesis - LLVM toolchain support as a plug-in for Eclipse CDTThesis - LLVM toolchain support as a plug-in for Eclipse CDT
Thesis - LLVM toolchain support as a plug-in for Eclipse CDTTuononenP
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, UkraineVladimir Ivanov
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLinaro
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationLinaro
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMKris Mok
 
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)Linaro
 
HKG15-301: OVS implemented via ODP & vendor SDKs
HKG15-301: OVS implemented via ODP & vendor SDKsHKG15-301: OVS implemented via ODP & vendor SDKs
HKG15-301: OVS implemented via ODP & vendor SDKsLinaro
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsSamsung Open Source Group
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLinaro
 
Kickstarting IOT using NodeRED
Kickstarting IOT using NodeREDKickstarting IOT using NodeRED
Kickstarting IOT using NodeREDRajesh Sola
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...Linaro
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLinaro
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bspNeil Armstrong
 
JMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialJMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialMiro Wengner
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...AMD Developer Central
 
LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201Linaro
 
BKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleBKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleLinaro
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control InterfaceLinaro
 

What's hot (20)

JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013JVM JIT-compiler overview @ JavaOne Moscow 2013
JVM JIT-compiler overview @ JavaOne Moscow 2013
 
Thesis - LLVM toolchain support as a plug-in for Eclipse CDT
Thesis - LLVM toolchain support as a plug-in for Eclipse CDTThesis - LLVM toolchain support as a plug-in for Eclipse CDT
Thesis - LLVM toolchain support as a plug-in for Eclipse CDT
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
 
HKG15-301: OVS implemented via ODP & vendor SDKs
HKG15-301: OVS implemented via ODP & vendor SDKsHKG15-301: OVS implemented via ODP & vendor SDKs
HKG15-301: OVS implemented via ODP & vendor SDKs
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
Kickstarting IOT using NodeRED
Kickstarting IOT using NodeREDKickstarting IOT using NodeRED
Kickstarting IOT using NodeRED
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - status
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
 
JMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezialJMC/JFR: Kotlin spezial
JMC/JFR: Kotlin spezial
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201
 
BKK16-502 Suspend to Idle
BKK16-502 Suspend to IdleBKK16-502 Suspend to Idle
BKK16-502 Suspend to Idle
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 

Similar to Applying Concurrency Cookbook Recipes to SPEC JBB

Advanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAAdvanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAGanesan Narayanasamy
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISAGanesan Narayanasamy
 
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...KTN
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Jorisimec.archive
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Kvm for ibm_z_systems_v1.1.2_limits
Kvm for ibm_z_systems_v1.1.2_limitsKvm for ibm_z_systems_v1.1.2_limits
Kvm for ibm_z_systems_v1.1.2_limitsKrystel Hery
 
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cG108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cTony Pearson
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languagesAnkit Pandey
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaJohanAspro
 
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docx
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docxFall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docx
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docxlmelaine
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kazuhito Ohkawa
 
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...Yuji Kubota
 

Similar to Applying Concurrency Cookbook Recipes to SPEC JBB (20)

Advanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISAAdvanced High-Performance Computing Features of the Open Power ISA
Advanced High-Performance Computing Features of the Open Power ISA
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISA
 
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
Implementing AI: High Performance Architectures: Arm SVE and Supercomputer Fu...
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Kvm for ibm_z_systems_v1.1.2_limits
Kvm for ibm_z_systems_v1.1.2_limitsKvm for ibm_z_systems_v1.1.2_limits
Kvm for ibm_z_systems_v1.1.2_limits
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cG108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905c
 
Optimization in Programming languages
Optimization in Programming languagesOptimization in Programming languages
Optimization in Programming languages
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
opt-mem-trx
opt-mem-trxopt-mem-trx
opt-mem-trx
 
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docx
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docxFall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docx
Fall 2016 Insurance Case Study – Finance 360Loss ControlLoss.docx
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
 

More from Monica Beckwith

The ilities of software engineering.pptx
The ilities of software engineering.pptxThe ilities of software engineering.pptx
The ilities of software engineering.pptxMonica Beckwith
 
Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!Monica Beckwith
 
Intro to Garbage Collection
Intro to Garbage CollectionIntro to Garbage Collection
Intro to Garbage CollectionMonica Beckwith
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsMonica Beckwith
 
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORSOPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORSMonica Beckwith
 
The Performance Engineer's Guide to Java (HotSpot) Virtual Machine
The Performance Engineer's Guide to Java (HotSpot) Virtual MachineThe Performance Engineer's Guide to Java (HotSpot) Virtual Machine
The Performance Engineer's Guide to Java (HotSpot) Virtual MachineMonica Beckwith
 
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Monica Beckwith
 
JFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceJFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceMonica Beckwith
 
Java Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideJava Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideMonica Beckwith
 
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...Monica Beckwith
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationMonica Beckwith
 
Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Monica Beckwith
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCMonica Beckwith
 
Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Way Improved :) GC Tuning Confessions - presented at JavaOne2015Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Way Improved :) GC Tuning Confessions - presented at JavaOne2015Monica Beckwith
 
GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)Monica Beckwith
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerMonica Beckwith
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 

More from Monica Beckwith (20)

The ilities of software engineering.pptx
The ilities of software engineering.pptxThe ilities of software engineering.pptx
The ilities of software engineering.pptx
 
A G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptxA G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptx
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
 
QCon London.pdf
QCon London.pdfQCon London.pdf
QCon London.pdf
 
Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!Enabling Java: Windows on Arm64 - A Success Story!
Enabling Java: Windows on Arm64 - A Success Story!
 
Intro to Garbage Collection
Intro to Garbage CollectionIntro to Garbage Collection
Intro to Garbage Collection
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
 
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORSOPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
OPENJDK: IN THE NEW AGE OF CONCURRENT GARBAGE COLLECTORS
 
The Performance Engineer's Guide to Java (HotSpot) Virtual Machine
The Performance Engineer's Guide to Java (HotSpot) Virtual MachineThe Performance Engineer's Guide to Java (HotSpot) Virtual Machine
The Performance Engineer's Guide to Java (HotSpot) Virtual Machine
 
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!
 
JFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceJFokus Java 9 contended locking performance
JFokus Java 9 contended locking performance
 
Java Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideJava Performance Engineer's Survival Guide
Java Performance Engineer's Survival Guide
 
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
 
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time CompilationThe Performance Engineer's Guide To HotSpot Just-in-Time Compilation
The Performance Engineer's Guide To HotSpot Just-in-Time Compilation
 
Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GC
 
Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Way Improved :) GC Tuning Confessions - presented at JavaOne2015Way Improved :) GC Tuning Confessions - presented at JavaOne2015
Way Improved :) GC Tuning Confessions - presented at JavaOne2015
 
GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)GC Tuning Confessions Of A Performance Engineer - Improved :)
GC Tuning Confessions Of A Performance Engineer - Improved :)
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Applying Concurrency Cookbook Recipes to SPEC JBB

  • 1. Jade Alglave – Architecture and Technology Monica Beckwith – Infrastructure LOB; Advanced Server Team Applying Concurrency Cookbook Recipes to SPEC JBB
  • 2. 2 © 2019 Arm Limited About Us Dr. Jade Alglave • Memory model architect at Arm • Co-developer and maintainer of the herd+diy toolsuite with Luc Maranget (INRIA, France) • Co-developer and maintainer of the Linux kernel memory model Monica Beckwith Managed runtime performance architect at Arm • Experience with OpenJDK HotSpot JIT, GC • Experience with JMM and with strong and weakly ordered architecture such as x86-64, SPARC, Arm64 and (very briefly) PPC64
  • 3. 3 © 2019 Arm Limited What We Will Cover Today Introduction to – Memory Models (Java Relaxed) Performance Methodology using Litmus Tests and Tools Performance Analysis and Measurement using Java Micro-Benchmark Harness (JMH) Performance Study on Scaling CPU Cores and Simultaneous Multithreading (SMT)
  • 4. 4 © 2019 Arm Limited What We Will NOT Cover Today Details of – Java Relaxed memory model JMH benchmarking SPEC JBB 2015 benchmarking CPU cores and simultaneous multithreading (SMT)
  • 5. 5 © 2019 Arm Limited Memory Models - What value can a load read?
  • 6. 6 © 2019 Arm Limited Multi-threaded hardware with shared memory structure A multi-threaded, concurrency-aware program Processor Threads and Software Threads … The Ideal Concurrent World of Hardware and Software * This drawing is heavily inspired by “timethreads“ concept in Doug Lea’s ‘Concurrent Programming in Java: Design Principles and Patterns, Second Edition ‘ Object 2Object 1 Thread 1 LockThread 2 help Thread1 Threadn Shared Memory W R W R * This drawing is heavily inspired by ‘A Tutorial Introduction to the ARM and POWER Relaxed Memory Models’ by Sewell et. al.
  • 7. 7 © 2019 Arm Limited Sequentially Consistent Shared Memory Execution Order == Program Order == Sequential Order Object 2 Object 1 Thread 1 Lock Thread 2 help =+ Thread 1 Thread n Shared Memory W R W R Timeline (Program Order)Timeline (Single Global Execution Order) A Sequentially Consistent Machine • No local reordering • Writes become visible simultaneously to all threads
  • 8. 8 © 2019 Arm Limited Sequential Consistency in Practice Store Buffering Example Initially, X and Y are 0 in memory; foo and bar are local (register) variables: p0 p1 a: X = 1; c: Y = 1; b: foo = Y; d: bar = X; What are the permissible values for foo and bar? On Sequential Consistency, they are the values reachable by interleavings: {a,b,c,d} {c,d,a,b} {a,c,b,d} Therefore we cannot have foo and bar both equal to 0.
  • 9. 9 © 2019 Arm Limited The Real Concurrent World of Hardware Multi Processor Threads/Cores with Tiered Memory Structure CPUn CPU0 L1D$ L2 LLC L1 I$ CPU1 L2 L2 Usually shared between all cores Could be multi-threaded (SMT) Usually private Memory Controller DDR Banks L1D$L1 I$ L1D$L1 I$ IO Can have Load and Store buffers Can have out of order execution
  • 10. 10 © 2019 Arm Limited Strong Models based on Total Store Ordering (TSO) CPUn CPU0 L1 D$ L2 LLC L1 I$ CPU1 L2 L2 Memory Controller L1 D$ L1 I$ L1 D$ L1 I$ IO Life In The Real World Without Sequential Consistency Relaxed vs Strong Memory Model Strong Memory Models Weaker Memory Models X86, SPARC POWER, Arm v7 • A thread can see it’s own write before other threads • All other threads see the write simultaneously: Multiple Copy Atomic Model • Local reordering is allowed • All threads are not guaranteed to see the write simultaneously: Not Multiple Copy Atomic Model
  • 11. 11 © 2019 Arm Limited Strong Models based on Total Store Ordering (TSO) CPUn CPU0 L1 D$ L2 LLC L1 I$ CPU1 L2 L2 Memory Controller L1 D$ L1 I$ L1 D$ L1 I$ IO Life In The Real World Without Sequential Consistency Relaxed vs Strong Memory Model Weaker Memory Models X86, SPARC ARM v8 • A thread can see it’s own write before other threads • All other threads see the write simultaneously: Multiple Copy Atomic Model • Local reordering is allowed • All threads are guaranteed to see the write simultaneously: Multiple Copy Atomic Model
  • 12. 12 © 2019 Arm Limited • Can we reason about our concurrent programs following Sequential Consistency? • Probably if we had a formal, preferably executable, memory models to ensure that we understand the guarantees given by architectures and programming languages. • Here’s where Jade would come in talking about her cool tools that allow programmers to explore the consequences of a given memory model or generate vast families of litmus tests to run against hardware.litmus tests Going Back To Our Store Buffer Example herd litmus testcat model Is this behavior allowed by the cat model? Yes/No litmus on HW litmus test Is this behavior observed on HW? Yes/No litmus test configuration file (~cat model) diy diy.inria.fr
  • 13. 13 © 2019 Arm Limited X86 SB {x=0; y=0;} P0 | P1 ; MOV [x],$1 | MOV [y],$1 ; MOV EAX,[y] | MOV EAX,[x] ; exists (0:EAX=0 / 1:EAX=0) Hardware architecture and test name Initial state (x and y are shared memory location) Thread names Sequence of instructions displayed as columns Question: can we observe this final state of given that x=0; y=0? Store Buffer Litmus Test on a TSO Hardware
  • 14. 14 © 2019 Arm Limited Armed With Knowledge On TSO hardware, can we observe the final state of foo=0 and bar=0; given that X=0; Y=0? ... ... Yes! All production architectures allow the outcome where both foo and bar equal 0. So, what do we do? … Use mfence as needed.
  • 15. 15 © 2019 Arm Limited Performance Methodology - Using Litmus Tests and Tools To Avoid Barriers Where-ever Possible
  • 16. 16 © 2019 Arm Limited What & The Why Of Barriers / Fences? Barriers ensure ordering properties Barriers enforce strong order Barriers (when inserted correctly) restore sequential consistency Barriers can be potentially expensive Data Memory Barriers on Arm: DMB SY (full system) DMB ST (wait for store to complete) DMB LD (wait for only loads to complete)
  • 17. 17 © 2019 Arm Limited Normal Load-Stores No Barriers Litmus test AArch64 MP { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | LDR W2,[X3] ; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 / 1:X2=0) Check for any reorder Check if X0 = 1 and X2 = 0 can exist on P1.
  • 18. 18 © 2019 Arm Limited Normal Stores Load Barrier Litmus test AArch64 MP+DMB.LD { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB LD; MOV W2,#1 | LDR W2,[X3] ; STR W2,[X3] | ; exists (1:X0=1 / 1:X2=0) Check for Store reorder
  • 19. 19 © 2019 Arm Limited Load & Store Barriers Litmus test AArch64 MP+DMB.LD+DMB.ST { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB LD; DMB ST | LDR W2, [X3]; MOV W2,#1 | ; STR W2,[X3] | ; exists (1:X0=1 / 1:X2=0) Generated assembler #START _litmus_P1 ldr w4,[x1] dmb ld ldr w5,[x2] #START _litmus_P0 mov w7,#1 str w7,[x0] dmb st mov w6,#1 str w6,[x2] Test MP+DMB.ST+DMB.LD Allowed Histogram (3 states) 499999:>1:X0=0; 1:X2=0; 20 :>1:X0=0; 1:X2=1; 499981:>1:X0=1; 1:X2=1; No Witnesses Positive: 0, Negative: 1000000 Condition exists (1:X0=1 / 1:X2=0) is NOT vali Hash=4d15dccdb1da0ce51fac17dea068d047 Observation MP+DMB.ST+DMB.LD Never 0 1000000
  • 20. 20 © 2019 Arm Limited But Aren’t Barriers Expensive? Acquire – Release (implicit barrier) semantic – One way barriers LDAR - All loads and stores that are after an LDAR in program order, … must be observed after the LDAR STLR - All loads and stores preceding an STLR …, must be observed before the STLR Thinking about lock-free?
  • 21. 21 © 2019 Arm Limited Performance Analysis and Measurement - Using Java Micro- Benchmarking Harness (JMH)
  • 22. 22 © 2019 Arm Limited JMM Rule 1 for Volatile Stores Order 1 Normal/Volatile Load Normal/Volatile Store Can’t Reorder Order 2 Volatile Store Any load/store (normal or volatile) followed by a ‘volatile store’ can’t be reordered.
  • 23. 23 © 2019 Arm Limited Volatile Stores - Barriers With DMBs JMM rule: Any load/store followed by Volatile Store can’t be re-ordered Litmus test { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB LD ; LDR W2,[X1] | LDR W2,[X3] ; DMB SY | ; STR W2,[X3] | ; exists (1:X0=1 / 1:X2=0) Results Test MP Allowed States 3 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=1; No Witnesses Positive: 0 Negative: 3 Condition exists (1:X0=1 / 1:X2=0) Observation MP Never 0 3
  • 24. 24 © 2019 Arm Limited Volatile Stores - Can Barriers Be Replaced By STLR? JMM rule: Any load/store followed by Volatile Store can’t be re-ordered Litmus test { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; MOV W0,#1 | LDR W0,[X1] ; STR W0,[X1] | DMB LD ; MOV W2,#1 | LDR W2,[X3] ; STLR W2,[X3] | ; exists (1:X0=1 / 1:X2=0) Results Test MP Allowed States 3 1:X0=0; 1:X2=0; 1:X0=0; 1:X2=1; 1:X0=1; 1:X2=1; No Witnesses Positive: 0 Negative: 3 Condition exists (1:X0=1 / 1:X2=0) Observation MP Never 0 3
  • 25. 25 © 2019 Arm Limited JMM Rule 2 for Volatile Stores • A ‘volatile store’ followed by any normal load/store CAN be reordered. • A ‘volatile store’ followed by any volatile load/store CANNOT be reordered. Order 1 Can Reorder Volatile Store Can’t Reorder Order 2 Normal Load Normal Store Volatile Load Volatile Store
  • 26. 26 © 2019 Arm Limited Volatile Stores – Can Barriers Be Replaced By STLR? STLR doesn’t guarantee that a subsequent volatile load/store will not be reordered static class TestNormalLoadPostVolatileStores { volatile int intField1; int intNorm1; public TestNormalLoadPostVolatileStores() { intField1 = 32; intNorm1 = intField1; } } JMH Test Code { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; LDR W0,[X1] | LDR W0,[X1] ; MOV W2,#1 | MOV W2,#1 ; STLR W2,[X3] | STR W2,[X3] ; exists (0:X0=1 / 1:X0=1) Litmus Test Positive Event Structure
  • 27. 27 © 2019 Arm Limited Volatile Stores – Can Barriers Be Replaced By STLR? Success! STLR + LDAR of volatiles provide that guarantee Litmus test { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; LDR W0,[X1] | LDAR W0,[X1] ; MOV W2,#1 | MOV W2,#1 ; STLR W2,[X3] | STR W2,[X3] ; exists (0:X0=1 / 1:X0=1) Results Test LB Allowed States 3 0:X0=0; 1:X0=0; 0:X0=0; 1:X0=1; 0:X0=1; 1:X0=0; No Witnesses Positive: 0 Negative: 3 Condition exists (0:X0=1 / 1:X0=1) Observation LB Never 0 3
  • 28. 28 © 2019 Arm Limited Volatile Stores – JMH Profiles Success! STLR + LDAR of volatiles provide that guarantee Load Acquire – Store Release Pair stlr w11, [x10] ;*putfield intField1 ; add x10, x2, #0xc ldar w11, [x10] ;*getfield intField1 Data Memory Barrier (inner share-ability domain str w10, [x2,#12] dmb ish ;*putfield intField1 ldr w11, [x2,#12] dmb ishld ;*getfield intField1 36% faster on max SMT count!!
  • 29. 29 © 2019 Arm Limited JMM Rule 1 for Volatile Loads Order 1 Volatile Load Can’t Reorder Order 2 Normal/Volatile Load Normal/Volatile Store A ‘volatile load’ followed by any load/store (normal or volatile) can’t be reordered.
  • 30. 30 © 2019 Arm Limited Volatile Load - Barriers With DMBs JMM rule: A Volatile Load followed by any load/store can’t be re-ordered Litmus test { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; LDR W0,[X1] | LDR W0,[X1] ; DMB SY | MOV W2,#1 ; MOV W2,#1 | DMB SY ; STR W2,[X3] | STR W2,[X3] ; exists (0:X0=1 / 1:X0=1) Results Test LB Allowed States 3 0:X0=0; 1:X0=0; 0:X0=0; 1:X0=1; 0:X0=1; 1:X0=0; No Witnesses Positive: 0 Negative: 3 Condition exists (0:X0=1 / 1:X0=1) Observation LB Never 0 3
  • 31. 31 © 2019 Arm Limited Volatile Stores – Can Barriers Be Replaced By LDAR? Success! LDAR provides the right guarantee Litmus test { 0:X1=x; 0:X3=y; 1:X1=y; 1:X3=x; } P0 | P1 ; LDAR W0,[X1] | LDR W0,[X1] ; | MOV W2,#1 ; MOV W2,#1 | DMB SY ; STR W2,[X3] | STR W2,[X3] ; exists (0:X0=1 / 1:X0=1) Results Test LB Allowed States 3 0:X0=0; 1:X0=0; 0:X0=0; 1:X0=1; 0:X0=1; 1:X0=0; No Witnesses Positive: 0 Negative: 3 Condition exists (0:X0=1 / 1:X0=1) Observation LB Never 0 3
  • 32. 32 © 2019 Arm Limited Performance Study - Scaling CPU Cores / Simultaneous Multithreading (SMT)
  • 33. 33 © 2019 Arm Limited 0.9 0.95 1 1.05 1.1 1 4 Max lse wolse wdmb Applying Cook Book Recipes to SPECJBB Bigger is Better Core or Thread Count With LSE; With LDAR (baseline) Without LSE; With LDAR With LSE; With DMB 1 1.00 0.97 0.92 4 1.00 1.00 0.95 Max 1.00 1.01 1.00 •Fences/Barriers(e.g. DMB ST, DMB LD, DMB SY) •Atomics/LSE (e.g. LDREX/STREX or CAS)
  • 34. 34 © 2019 Arm Limited Single Core Performance The Quest and Guarantee of Sequential Consistency Hardware improvements measured on Java micro-benchmarks (OpenJDK JDK11): • Object/memory allocations up to 2.4x faster • Object/array initializations up to 5x faster – Smart issuing and cost reduction of SW barriers (i.e. DMB) required by Arm’s relaxed memory model • Copy chars up to 1.6x faster • New atomic instructions improve locking throughput and contention latency by up to 2x Cortex-A72 Code Neoverse N1 0.21% dmb ishst ;*new (0 cycles) 0.00% 7.73% (~3.5 cycles) ldr x11, [sp,#8] 1.75% ldr w17, [x11,#12];*getfield 0.06% mov x2, x0 0.51% ldp w0, w18,[x11,#16];*getfield 0.11% 0.42% ldp w3, w1, [x11,#24];*getfield org.openjdk.bench.vm.compiler.generated.StoreAfterStore_testAllocAndZeroStore_jmhTest::testAllocAndZeroStore_avgt_jmhStub
  • 35. 35 © 2019 Arm Limited Ares Single Core Performance Hardware improvements measured on SPECJBB (OpenJDK JDK11): • Neoverse N1 CPU improves performance from Cortex-A72 by 1.7x Software improvements measured on SPECJBB: • JDK11 improves performance vs JDK8 on Arm by up to 14%
  • 36. 36 © 2019 Arm Limited Resources http://g.oswego.edu/dl/jmm/cookbook.html https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf http://hg.openjdk.java.net/code-tools/jmh-jdk- microbenchmarks/file/92c55597888e/README.md http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_an d_Cookbook_A08.pdf
  • 37. 37 © 2019 Arm Limited Appendix
  • 38. 38 © 2019 Arm Limited The herd+diy Toolsuite The tool suite supports and provides a formal consistency model for all of the following: - Arm, IBM Power, Intel x86 - Nvidia GPUs - C/C++ - Linux C This means that a user can experiment with the concurrency implemented at all these levels and generate systematic families of tests to probe implementations. diy.inria.fr
  • 39. 39 © 2019 Arm Limited A Store Barrier Litmus Test PodWW Rfe PodRR Fre Fre PodWR Fre PodWR A litmus test source has three main sections: The initial state defines the initial values of registers and memory locations. Initialisation to zero may be omitted. The code section defines the code to be run concurrently — above there are two threads. Yes we know, our X86 assembler syntax is a mistake. The final condition applies to the final values of registers and memory locations.
  • 40. 40 © 2019 Arm Limited Executing the model: herd herd litmus testcat model Is this behavior allowed by the cat model? Yes/No The herd tool allows a user to execute a formal model, written in the cat language. Given a litmus test and a cat model, herd runs the litmus test against the cat model: herd tries to determine whether the model allows the final state given in the test can be reached.
  • 41. 41 © 2019 Arm Limited Running tests on hardware: litmus litmus on HW litmus test Is this behavior observed on HW? Yes/No The litmus tool allows a user to run a litmus test against hardware. The tool gathers all the final states that were observed on hardware during multiple runs of the test. We can then compare the output of herd and litmus, to check whether they are in accord.
  • 42. 42 © 2019 Arm Limited Generating tests: diy litmus test configuration file (~cat model) diy The diy tool allows a user to generate interesting families of litmus tests. It takes as input a configuration file, where a user should list the features of interest to them. We can use families of diy-generated tests to run validation campaigns, comparing the cat model and prototypes.
  • 43. The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent Devices Thank You!