Wait-free data structures on embedded multi-core systems

Tobias Fuchs
Evaluation of Task Scheduling
Algorithms and Wait-Free Data
Structures for Embedded Multi-Core
Systems
• Vortrag zur Masterarbeit
• Aufgabensteller: Prof. Dr. Dieter Kranzlmüller
• Betreuer: Dr. Karl Fürlinger (LMU)
Dr. Tobias Schüle (Siemens CT)
• Datum des Vortrags: 05.11.2014

Structure of this talk
1. Introduction
1. Motivation
2. Problem Statement and Objectives
2. Wait-free data structures
1. Foundations
2. Pools
3. Queues
4. Stacks
3. Task Scheduling
1. Work stealing
2. Prioritized work stealing in EMBB
4. Conclusion
Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems 2

Wait-freedom:
Motivation

Motivation
Wait-free algorithms
• Strongest possible fault tolerance
• Guarantee progress and upper bound for execution time
Gains:
+ Progress is potentially a formal constraint in real-time
computing
+ Wait-freedom eliminates the classic concurrency problems:
Deadlocks, Priority Inversion, Convoying, Kill-Intolerance

Problem statement
State of the art
No suitable wait-free data structures for embedded systems:
• Employing mechanisms such as garbage collection
• Not designed for restricted resources
• No evaluation for latency
Challenges:
- Transforming data structures to wait-free equivalents is
non-trivial, usually from-scratch redesign
- Implementations depend on platform architecture

Objectives
1. Review and evaluation of state of the art approaches for
suitability on embedded systems
2. Real-time compliant implementations of wait-free data
structures
3. Definition, implementation and evaluation of suitable
benchmark scenarios for wait-free data structures and
task scheduling algorithms
+ Automated verification derived from semantic definition

Foundations

Progress conditions
Classification of progress
On the Nature of Progress (Herlihy, Shavit 2011)

Real-time requirements
Performance priorities on real-time systems
Guarantees on worst-case runtime behavior
 Aim for latency / jitter-reduction, neglecting throughput
 Avoid non-determinism, as in malloc / new (see: MISRA)

Evaluation methodology
Real-time applications are designed to optimize latency
Related work does not evaluate latency, but only mean or
median throughput
Evaluation of worst-case latency is tough:
• In related work, measurements outside of 97.5% confidence
interval are considered outliers and ignored
• These outliers are our data

Pools

Wait-free data structures:
Pools
Pools
… realize dynamic memory allocation
… while eliminating heap fragmentation
• Fundamental data structure of any concurrent container
• Fixed number of objects in static or automatic memory
• Pools manage concurrent removal and reclamation of
objects
RemoveAny(pool, er) Remove and return element er
Add(pool, e) Add element e back to the pool

Pools:
Related work
Related work
Close to none:
• Several lock-free pools, e.g. tree-based
• Wait-free pools: array-based, simple yet inefficient
Why are wait-free pools hard to design?
Common wait-free paradigms require dynamic memory
allocation …

Array-based pools
Array-based wait-free pools
• Consists of array holding atomic reservation flags
• Threads traverse reservation array from the beginning
and try to reserve a flag atomically (CAS)
• Index of successfully toggled flag is acquired element index
• Worst-case complexity: O(n)

Compartment pool
Wait-free pool with thread-specific compartments
• Array-based pool with additional range of elements that
can only be acquired by a specific thread
• Threads acquire elements from their private compartment
first

Pools - Evaluation

Queues

Queues:
Related work
Related work
Kogan and Petrank presented the first wait-free queue for
multiple enqueuers and dequeuers
Wait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
- Implemented in Java
- Relying on garbage collection
- Requires monotonic counter (phase)

Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queue
Redesign helping scheme to remove phase counter
• In original publication, new phase value is greater than all
phases of any announced operation (including non-pending)

Kogan-Petrank queue
• Modification: Help all other non-pending operations first
• Possibly helping operations that are newer than the thread‘s
own operation

Kogan-Petrank queue
• Fairness is maintained: all other threads are guaranteed
to help this thread’s operation before engaging in their own

Kogan-Petrank queue
Memory reclamation
Hazard pointers scheme typically presented as a solution
Hazard pointers: Safe memory reclamation for lock-free objects (Michael, 2004)

Kogan-Petrank queue
Introduce hazard pointers
Step 1: Find upper memory bound for hazard pointers
Step 2: Guard queue nodes using hazard pointers

Kogan-Petrank queue
Culprit: Guarding is not wait-free
pointer p = node.Next;
// -- possible change of node.Next –
while(hp.GuardPointer(p) && p != node.Next) {
// Release and retry, unbounded number of retries
hp.ReleaseGuard(p);
}

Kogan-Petrank queue
Culprit: Guarding is not wait-free
Fortunately, retry loops can be avoided in the Kogan-
Petrank queue, but the implementation is not trivial
see implementation at
https://github.com/fuchsto/embb/tree/benchmark/

Queues - Evaluation
Queue benchmark scenarios
In addition to scenarios for bag semantics
• Buffer latency
Elements enqueued with current timestamp, difference from
timestamp at dequeue is buffer latency

Queues - Evaluation

Stacks

Stacks:
Related work
Related work
Fatourou presented a wait-free “universal” construction
that is applicable for stacks
Wait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)

Elimination stack
Fatourou’s universal construction SIM
A highly efficient universal construction (Fatourou, 2011)
Principle
• Optimized helping scheme
• Threads apply operations to a local copy of the stack
• Every thread tries to replace the global shared object with
its local copy via CAS
• Only applicable for shared objects with small state

Elimination stack
Elimination
• Push and Pop have reverse semantics:
Push(Pop(stack)) = Pop(Push(stack)) = stack
• Eliminated operations are completed immediately
if they do not alter the object’s state
Significantly improves performance if applicable

Elimination stack
Original version is not suitable for real-time applications:
- ABA problem is prevented using tagged pointers
- Thread-local pools with unbounded capacity
- No deallocation in published algorithm

Elimination stack
Modified version of Fatourou’s stack
- Uses hazard pointers for safe reclamation
- Uses compartment pool with limited capacity
- Employs the elimination scheme from the original
publication

Stacks:
Evaluation

Task scheduling

Task Scheduling:
Objectives
Task Scheduling
• Intra-process task scheduling with priority queues
• Low-overhead, fine-grained scheduling of thousands of
small tasks
 Priorities:
Focus on low latency and jitter reduction (i.e. predictability),
thus regarding maximum throughput as a secondary
benchmark.

Task scheduling:
Work stealing
Work stealing
• One worker thread per
SMP core, no migration
• Tasks passed as &func
• Load-balancing on task
queues
• Many flavors of concrete
implementations

Task scheduling:
Work stealing
Work stealing with task priorities
• Extended work-stealing
by queues for every
priority
•

Conclusion

Conclusion
Revisiting the objective
• Wait-free implementations of pools, queues and stacks now
available for real-time applications
• Benchmark framework and evaluation tools (R) are
published as open source
• Reproducible evaluation of real-time performance
• Verification tool chain on the way

Conclusion
Recommendations
• Wait-free data structures can rival performance of lock-free
implementations
• But are hard to maintain
• Formal wait-freedom is practically not achievable
Employ wait-free data structures for fault-tolerance, not as a
guarantee for critical deadlines

Thank You
Source code (data structures, benchmarks, R scripts):
https://github.com/fuchsto/embb/tree/benchmark/
Official development source base of embb:
https://github.com/siemens/embb/tree/development/
Wiki to this thesis:
http://wiki.coreglit.ch

Wait-free data structures on embedded multi-core systems

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (10)

Similar to Wait-free data structures on embedded multi-core systems

Similar to Wait-free data structures on embedded multi-core systems (20)

Recently uploaded

Recently uploaded (20)

Wait-free data structures on embedded multi-core systems