This talk was based on my Master's thesis which I had completed earlier that year. It gives an overview on how certain parallel dynamic programming can be computed in parallel efficiently, and what we want that to mean here.
The plots in "Performance Examples" show speedup S on the left and efficiency E on the right, both against input size.
Read more over here: http://reitzig.github.io/publications/Reitzig2012
4. Goal 1
Understand what efficiency means in parallel algorithms.
Goal 2
Characterise dynamic programming recurrences in a suitable way.
5. Goal 1
Understand what efficiency means in parallel algorithms.
Goal 2
Characterise dynamic programming recurrences in a suitable way.
Goal 3
Find and implement efficient parallel algorithms for DP.
10. Complexity theory
Classifies problems
Focuses on inherent parallelism
Answers: How many processors do you need to be really fast
on inputs of a given size?
But...
...p grows with n – no statement about constant p and growing n!
15. Work and depth
Work W = TA
1 and depth D = TA
∞
Brent’s Law: A with W
p ≤ TA
p < W
p + D is possible in a certain
setting.
16. Work and depth
Work W = TA
1 and depth D = TA
∞
Brent’s Law: A with W
p ≤ TA
p < W
p + D is possible in a certain
setting.
But...
...has limited applicability and D can be slippery!
20. Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
But...
...what are good values?
Clear: SA
p ∈ [0, p] and EA
p ∈ [0, 1]
21. Relative runtimes
Speedup SA
p :=
TA
1
TA
p
Efficiency EA
p := TB
p·TA
p
But...
...what are good values?
Clear: SA
p ∈ [0, p] and EA
p ∈ [0, 1] – but we can certainly not always
hit the optima!
22. Proposal: Asymptotic relative runtimes
Definition
SA
p(∞) := lim inf
n→∞
SA
p(n)
?
= p
EA
p (∞) := lim inf
n→∞
EA
p (n)
?
= 1
23. Proposal: Asymptotic relative runtimes
Definition
SA
p(∞) := lim inf
n→∞
SA
p(n)
?
= p
EA
p (∞) := lim inf
n→∞
EA
p (n)
?
= 1
Goal
Find parallel algorithms that are asymptotically as scalable and
efficient as possible for all p.
24. Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
25. Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
Not:
More processors are always better.
26. Disclaimer
This means:
A good parallel algorithm can utilise any number of processors if
the inputs are large enough.
Not:
More processors are always better.
Just as in sequential algorithmics.
28. Afterthoughts
Machine model
Keep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-thread
communication, ...
29. Afterthoughts
Machine model
Keep it simple: (P)RAM with p processors and spawn/join.
Which quantities to analyse?
Elementary operations, memory accesses, inter-thread
communication, ...
Implicit interaction – blocking, communication via memory, ... – is
invisible in code!
61. Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
62. Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
63. Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
64. Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
65. Future Work
Fill gaps in theory (caching and communication).
Generalise theory to more dimensions and interleaved DPs.
Improve and extend implementations.
More experiments (different problems, more diverse machines).
Improve compiler integration (detection, backtracing, result
functions).
Integrate with other tools.