Task-based Augmented Merge trees
with Fibonacci Heaps
Charles Gueunet, Kitware & UPMC
Pierre Fortin, UPMC
Julien Jomier, Kitware
Julien Tierny, CNRS-UPMC
Topological analysis
• Topological abstractions
• Segmentation
[Bock et al. 2017]
[Favelier et al. 2016]
Large data sets
● [512³]
● 2.4 GB
● Compute power through parallelism
Related work
• Sequential
• Augmented tree
[Carr et al. 2000]
Related work
• Sequential
• Augmented tree
• Monotone path
[Chiang et al. 2005]
Related work
• Parallel
• Partitions:
• Load imbalance
• Redundant work
Partitions
V X
X
Shared memory:
● [Pascucci04]
● [Gueunet16]
Distributed:
● [Morozov13]
● [Landge14]
V
Monotonepath
Related work
• Parallel
• Partitions:
• Load imbalance
• Redundant work
• Monotone path (MP):
• Not augmented
Partitions
V X
X
Shared memory:
● [Pascucci04]
● [Gueunet16]
Distributed:
● [Morozov13]
● [Landge14]
V ● [Natarajan15]
● [Maadasamy12]
● [Carr16]
Monotonepath
Related work
• Parallel
• Partitions:
• Load imbalance
• Redundant work
• Monotone path (MP):
• Not augmented
Partitions
V X
X
Shared memory:
● [Pascucci04]
● [Gueunet16]
Distributed:
● [Morozov13]
● [Landge14]
Us!
V ● [Natarajan15]
● [Maadasamy12]
● [Carr16]
Monotonepath
• A local algorithm based on Fibonacci heaps
Contributions
• A local algorithm based on Fibonacci heaps
• Task-based parallelism:
• Augmented merge tree
• Augmented contour tree
Contributions
• A local algorithm based on Fibonacci heaps
• Task-based parallelism:
• Augmented merge tree
• Augmented contour tree
• Ready to use implementation
• Generic input (VTU/VTI, 2D/3D)
• Generic output
• Open source (TTK)
Contributions
Preliminaries
Mesh
• Generic input
• Piecewise linear scalar field
• Regular grid
• Implicit triangulation (TTK)
• Level set
•
Scalars
• Level set
•
• Sublevel set
•
Scalars
Merge tree
• Track merge of sublevel sets CC
• 1 Arc <-> 1 CC
Contour tree
• Track merge of level sets CC
• 1 Arc <-> 1 CC
Overview
• One local growth per arc region
Merge tree computation
Sequential
Leaf search
• Extract minima (JT)
• Start at minima
Leaf growth
Saddle stopping condition
• Vertex lower than current
• Fibonacci heap
• Constant merge time
Merging two regions
1 2 3 4
Saddle growth
• New growths:
• On saddles completed
Saddle growth
• New growths:
• On saddles completed
• Until 1 growth remain
Trunk
• Monotone path until root
Trunk
• Monotone path until root
• Vertex to arc: using scalars
Trunk
• Monotone path until root
• Vertex to arc: using scalars
Parallel merge tree computation
Tasks
• Asynchronous work unit
• Dynamic load balancing
• Runtimes:
• OpenMP
• Intel TBB
• Intel Cilk Plus
[R. van der Pas, IWOMP 2009]
Parallel leaf search
• Local operations
• Embarrassingly parallel
Parallel leaf growth
• Use tasks
Parallel saddle stopping condition
• No change
• Local only
Parallel merging regions
1 2 3 4
Parallel trunk detect
• Atomic counter on active tasks
• Initialized: number of leaves
Parallel trunk process
• Chunk of vertices
• Using scalar value
Parallel contour tree computation
Post-processing
• Post-processing
• Arc: Regular vertex list
Post-processing
• Post-processing
• Arc: Regular vertex list
• Nodes: Report in other tree
Contour tree computation
• Post-processing
• Arc: Regular vertex list
• Nodes: Report in other tree
• Combination
Results
Intel Xeon E5-2630 v3 CPUs (2.4 GHz, 2x8 cores, 2x16 threads)
64 GB of RAM
C++ (GCC-5.4.0), VTK, OpenMP 4.0 (additional material + TTK)
Merge tree computation
• 512³ regular grids
Join tree
Split tree
Time in seconds
Join tree
Split tree
Time in seconds
Merge tree computation
• Trunk not predictable
Join tree
Split tree
Time in seconds
Merge tree computation
• Most time-consuming
Join tree
Split tree
Times in seconds
Merge tree computation
• Average speedup: 10/16
Detailed speedups
Sequential comparison
LT: Libtourtre, CF: Contour Forests, FTM: Our implementation. 256³ grid
Time in seconds
Parallel comparison
Time in seconds
LT: Libtourtre, CF: Contour Forests, FTM: Our implementation. 256³ grid
Contour tree computation
Time in seconds
• 512³ regular grids
• Average speedup 7/16
`
Sequential compare (CT)
Time in seconds
LT: Libtourtre, CF: Contour Forests, FTM: Our implementation. 256³ grid
Parallel compare (CT)
LT: Libtourtre, CF: Contour Forests, FTM: Our implementation. 256³ grid
Time in seconds
Limitations
Application
Application
Segmentation
Application
Segmentation
Conclusion
Take home message
• New algorithm:
• Fast in sequential
• Good parallel performances
• Dynamic load balancing
thanks to tasks
• No redundancy
• Augmented tree
Take home message
• VTK-based implementation
• Generic input
• VTU/VTI
• 2D/3D
• Generic output
• Open source (TTK)
• Ready-to-use
• New algorithm:
• Fast in sequential
• Good parallel performances
• Dynamic load balancing
thanks to tasks
• No redundancy
• Augmented tree
Perspective
• Improve arc growth scalability
• Parallel combination
• Improve memory usage
• Study in-situ visualization capabilities
Unpredictable best scheduling:
Sequentially: How to maximize trunk ?
Appendix
E
A
B
C
D
J
I
F
G
H
K
L
Title Annotation
• Interesting fact

FTM tree