This document discusses approaches for designing concurrent applications. It compares task-based and actor-based concurrency, traditional locking approaches versus software transactional memory (STM), and data replication versus decentralized data stores. The key points are that actor models may be better for event-driven systems, STM enables composable operations, and decentralized data can improve performance of complex queries over large datasets. It emphasizes testing approaches before assuming performance impacts and having use cases in mind when choosing patterns.
4. Traditional Approaches
Task-based thread [pools]
(e.g. database connections, server sockets)
Hand coded locks
(to access shared memory "safely")
Data: sharding or replication
(to increase throughput on data access)
5. Less Traditional Approaches
Actor-based processes
(e.g. message passing like in Erlang)
Software Transactional Memory (STM)
(consistent and safe way to access shared state)
Data: decentralized datastores
(run map/reduce queries on many nodes at once)
6. Task-based vs Actor-based
task threads access shared mailboxes buffer incoming
state in objects messages
actors do not share state,
task threads compete for thus not competing for locks
locks on objects messages sent
synchronous operations asynchronously
within task thread actors react to messages sent
limited task scheduling (e. to them
g. wait, notify)
8. When might actors be better?
complexity of the task-based model becomes bottleneck
(debugging race conditions, deadlocks, livelocks,
starvation). Depends on your use case.
system is event-driven conceptually. Easier to translate
to high level abstraction in actor-based models.
9. Locks vs STM
Flexibility: fine vs coarse Analogous to database
grained choice transaction recording
Pessimistic locking each txn as log entry
Locking semantic need Optimistic reading
to be hand coded Atomic transaction
Composable operations Supports composable
are not well supported operations
11. When to use STM?
Using more cores/processors (STM=performance++) on
larger numbers of cores/processors (~>=4)
Hand coding and debugging locking semantics for
application becomes your bottleneck to prevent
deadlocks and livelocks
Priority inversion often hinders performance
BUT YOU CAN'T use STM when operation on shared
state cannot be undone. Must be undoable!
12. Replication vs Decentralized
Can improve throughput Improve throughput,
Some flexibility: replication performance of complex
strategies for a few use cases queries using map/reduce
Requires full replica(s) of data Flexibility to optimize two of
set on each node three: Consistency,
Availability, Partition tolerance
(CAP Theorem)
Does not require full replica(s)
of data set
13. When to use decentralized data?
Large data set you want distribute without
creating/managing your own sharding scheme
Want to optimize two of CAP
Run distributed map/reduce complex queries
BUT datastore should satisfy your other needs first.
Usually key-value/bucket lookup, not RDBMS!
14. Other Approaches...(not in production)
Compiler parallel optimizations
e.g. Haskell sparks
Persistent data structures
to aid concurrency throughput by better API design
15. General Tips
Use SLA metrics/measures to optimize relevant parts of
your concurrent system judiciously
Ensure your applications fit use case(s) for approach
Test your hypothesis by benchmarking
NEVER assume your changes have made the impact you expect.
There is no silver bullet: think, implement and test!