Parallel implementation of geodesic distance transform with application in superpixel segmentation


Published on

T.Q. Pham, in Proceedings of DICTA 2013, pp.19-26.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Parallel implementation of geodesic distance transform with application in superpixel segmentation

  1. 1. PARALLEL IMPLEMENTATION OF GEODESIC DISTANCE TRANSFORM WITH APPLICATION IN SUPERPIXEL SEGMENTATION Tuan Q. Pham Canon Information Systems Research Australia (CiSRA) 1 Thomas Holt drive, North Ryde, NSW 2113, Australia. ABSTRACT This paper presents a parallel implementation of geodesic distance transform using OpenMP. We show how a sequentialbased chamfer distance algorithm can be executed on parallel processing units with shared memory such as multiple cores on a modern CPU. Experimental results show a speedup of 2.6 times on a quad-core machine can be achieved without loss in accuracy. This work forms part of a C implementation for geodesic superpixel segmentation of natural images. Index Terms— geodesic distance transform, OpenMP, superpixel segmentation 1. INTRODUCTION Due to a raster order organisation of pixels in an image, many image processing algorithms operate in a sequential fashion. This sequential processing is suitable for running on a single processor system. However, even Personal Computers (PC) now have multiple processing cores. In fact, the number of cores on a chip is likely to double every 18 months to sustain Moore’s law [23]. As a result, there is a strong need to parallelise existing image processing algorithms to run more efficiently on multi-core hardware. OpenMP (Open Multi-Processing) is a powerful yet simple-to-use application programming interface that supports many functionalities for parallel programming. OpenMP uses a shared-memory model, in which all threads share a common address space. Each thread can have additional private data under explicit user control. This shared-memory model simplifies the task of programming because it avoids the need to synchronise memory across different processors on a distributed system. The shared-memory model also fits well with the multi-core architecture of modern CPUs. Parallel programming using OpenMP has gained significant interests in the image processing community in recent years. In 2010, the IEEE Signal Processing Society dedicated a whole issue of its flagship publication, the IEEE Signal Processing Magazine, to signal processing on multiple core platforms. In this issue, Slabaugh et al. demonstrated a 2- to 4time speedup of several popular image processing algorithms on a quad-core machine using OpenMP [25]. The demonstrated algorithms involve either pixel-wise processing (image warping, image normalisation) or small neighbourhoodwise processing (binary morphology, median filtering). All of these algorithms generate the output at each pixel independently of those at other output pixels. As a result, they are naturally extendable to parallel implementation. This type of data-independent task parallelisation can even be done automatically by a compiler [11]. Parallel implementation of sequential-based image processing algorithms, however, still requires manual adaptation by an experienced programmer. In this paper, we present a parallel implementation of Geodesic Distance Transform (GDT) using OpenMP. GDT accepts a greyscale cost image together with a set of seed points. It outputs a distance transform image whose intensity at each pixel is the geodesic distance from that pixel to a nearest seed point. The geodesic distance between two points is the sum of pixel costs along a minimum-cost path connecting these two points. The nearest seed mapping forms an over-segmentation of the input image [18, 29]. Fast image segmentation is the main reason why a parallel implementation of GDT is desirable [10, 2, 7, 28]. There are two main approaches to GDT estimation: a chamfer distance propagation algorithm [15] and a wavefront propagation algorithm [27]. Both algorithms are sequential in nature, i.e. they are not directly parallelisable. The chamfer algorithm was selected for parallelisation in this paper due to its simple raster scan access over the image data. The rest of the paper is organised as follows. Section 2 provides some background on GDT and the chamfer distance propagation algorithm. Section 3 reviews previous attempts in the literature to parallelise (Euclidean) distance transform. Our proposed parallel implementation of GDT is presented in Section 4. Section 5 evaluates the speed and accuracy of our parallel implementation on different images and different computers. Section 6 presents an application of GDT in superpixel segmentation of images. Section 7 concludes the paper.
  2. 2. destination 1 10 9 0.8 8 7 0.6 source 0.4 6 seed 5 4 a) forward propagation b) backward propagation 3 0.2 2 minimum path, cost = 1.7 straight path, cost = 11.1 1 0 a) cost image f (x, y) b) geodesic distance transform Fig. 1. Minimum-cost path versus straight path on an uneven cost surface generated by the membrane function in Matlab. 2. BACKGROUND ON GEODESIC DISTANCE TRANSFORM Geodesic distance or topographical distance [16] is a greyweighted distance between two points on a greyscale cost surface. The geodesic distance is calculated as the sum of pixel costs along a minimum-cost path joining the two points. An example is illustrated in Figure 1a, where the image intensities f (x, y) represent the cost of traversing each pixel. Two different paths from a source point in the middle of the image to a destination point at the top-right corner are drawn. The minimum cost path in dotted cyan line, despite being a longer path, integrates over a smaller total cost than the straight path in magenta (1.7 versus 11.1). The cost image f can be seen as a terrain surface, where the red blob corresponds to a high mountain. Figure 1a basically illustrates that going across a steep mountain incurs a much higher cost than going around its flat base to reach the other side. Figure 1b shows the GDT of the image in Figure 1a given one seed point at the centre of the image. The intensity of each pixel represents the geodesic distance from that pixel to the central seed point. 2.1. Chamfer distance propagation algorithm GDT can be estimated efficiently using chamfer distance propagation [21]. The path between two pixels is approx√ imated by discrete line segments of 1- or 2-pixel length connecting a pixel with one of its eight immediate neighbours. Initially, the distance transform at every pixel is set to infinity except at locations of the seed points where the distance transform is zero. The distance transform at every pixel is then updated by an iterative distance propagation process. Each iteration comprises two passes over the image. A forward pass scans the image rows from top to bottom, each row is scanned from left to right (Figure 2a). A backward pass scans the image rows from bottom up, each row is scanned from right to left (Figure 2b). The forward pass propagates the distance transform of four causal neighbours (shaded grey in Figure 2a) to the cur- Fig. 2. One iteration of distance propagation comprises of a forward pass followed by a backward pass. rent pixel P (x, y) according to equation (1):  d(x − 1, y − 1) + bf (x, y)  d(x, y − 1) + af (x, y) d(x, y) = min   d(x + 1, y − 1) + bf (x, y) d(x − 1, y) + af (x, y)     (1) √ where a = 0.9619 ≈ 1 and b = 1.3604 ≈ 2 are optimal chamfer coefficients for a 3×3 neighbourhood [4]. Similarly, the backward pass propagates the distance transform from four anti-causal neighbours (shaded grey in Figure 2b) to the current pixel P (x, y) according to equation (2):   d(x + 1, y + 1) + bf (x, y)  d(x, y + 1) + af (x, y)   d(x, y) = min   d(x − 1, y + 1) + bf (x, y)  (2) d(x + 1, y) + af (x, y) Equations (1) and (2) apply to pixels which have a full set of 8 immediate neighbours. Pixels at image border need a different treatment because some of the neighbours are out of bound. These out-of-bound neighbours are ignored in the distance propagation equations (1) and (2). 2.2. Example An example of GDT given more than one seed points is given in Figure 3. Figure 3a-b show an input image and its gradient energy, respectively. The gradient energy is used as a nonnegative cost image, from which the GDT is computed. Four seed points are shown as circles of different colours in Figure 3b. Figure 3c-d show intermediate distance transforms after a first forward and a first backward pass through the cost image (blue=low distance, red=high distance). In the first forward pass, the top-left region of the distance transform is not updated because these pixels do not have a seed in their causal path. After the first backward pass, the distance transform gradually settles into its final form before converging at the twentieth iteration (which looks very similar to the GDT after 10 iterations in Figure 3e). Many iterations are required because the minimum-cost paths are usually not straight, they require multiple distance propagations from different directions. Fortunately, fewer iterations are required if
  3. 3. (a) Input (320×240) (b) Gradient energy Fig. 4. Image partitioning strategy for a parallel chamfer distance transform on a distributed system [24] (the distances of shaded pixels are transmitted across processors). (c) intermediate GDT after 1st forward pass (d) intermediate GDT after 1st backward pass (e) GDT after 10 iterations (f) nearest seed label after 1st forward pass (g) nearest seed label after 1st backward pass (h) nearest seed label after 10 iterations Fig. 3. Geodesic distance transform and nearest seed label computed from the gradient energy image with 4 seed points. there are more seeds because the geodesic paths generally become shorter, hence do not contain many twists and turns. The last row of Figure 3 shows the corresponding nearest seed labels of the intermediate distance transforms in the second row. Each coloured segment corresponds to a set of pixels with a common nearest seed point. Pixels with the same coloured label should be connected because they are connected to the common seed point via some geodesic paths. Fragmentation happens on Figure 3g because this is an intermediate result. After the GDT converges, the segmentation boundaries generally trace out strong edges in the scene (Figure 3h). This leads to a geodesic image segmentation algorithm to be presented later in Section 6. 3. LITERATURE SURVEY ON PARALLEL DISTANCE TRANSFORM Most previous techniques on parallel distance transform compute Euclidean Distance Transform (EDT) instead of GDT. EDT accepts a binary image and returns the Euclidean distance from each pixel to a nearest nonzero pixel in the binary image. EDT is a special case of GDT when the cost image is constant and positive. A squared Euclidean distance r2 can be decomposed into two components x2 + y 2 , each of which can be estimated independently using a Voronoi diagram of the nonzero pixels in the binary image [6]. A parallel implementation of EDT using OpenMP on a 24-core system achieves 18-time speedup [14]. A parallel implementation of the chamfer EDT was presented by Shyu et al. in [24]. This method computes the EDT on a distributed system. As a result, the intermediate results across different processors have to be synchronised using Message Passing Interface (MPI). Similar to the original chamfer algorithm in [21], Shyu et al.’s implementation requires two passes over the image: a forward pass to propagate the distance transform from causal neighbours, followed by a backward pass to propagate the distance transform from anti-causal neighbours. To parallelise these sequential passes, Shyu et al. partitions the input image into bands, the distance computation of each band is assigned to a processor. At each processor, the image band is further partitioned into parallelograms. The label of each parallelogram in Figure 4 specifies its order of processing (partitions n and n are processed concurrently). Due to the propagation of causal information, the parallelogram labelled 3 on the second band must wait for the result of the parallelogram labelled 2 on the first band. The EDT of the last row of parallelogram 2 (shaded grey) must be transmitted to the next processor before parallelogram 3 can be processed. After this first data transmission, processor 1 and 2 can work in parallel on its partition 3 and 3 , respectively. This process of local distance propagation followed by data transmission repeats for partition 4 and 4 and so on. 4. PARALLEL GEODESIC DISTANCE TRANSFORM This section presents our parallel implementation of GDT using OpenMP. Our implementation is motivated by the parallel implementation of the chamfer distance transform in [24]. Shyu et al.’s implementation, however, targets distributed memory systems, in which data need to be synchronised across processors by message passing. Using the shared memory model present in multicore CPUs, we avoid the need to synchronise data. The iterative nature of GDT also allows a simpler image partitioning strategy. Unlike EDT, GDT requires more than one iterations of forward+backward passes. As a result, the GDT can be propagated from one image band to the next in a subsequent iteration rather than within the current pass like in [24]. Our implementation therefore only uses a band-based image partitioning across different processors. This fits well with the parallel for construct in OpenMP.
  4. 4. Algorithm 1 Parallel chamfer distance transform (shaded rows are compiler directives to enable parallel computation). 1 f o r ( i t e r = 0 ; i t e r <10; i t e r ++ ) 2 { ....... 3 / / Forward p r o p a g a t i o n 4 forwardPropagationFirstRow ( . . . ) ; 5 #pragma omp parallel for private( ... private variable declarations ... ) 6 7 8 9 f o r ( i = 1 ; i <h e i g h t ; i ++ ) Fig. 5. Band-based image partitioning strategy for parallel implementation of geodesic distance transform in OpenMP (shaded pixels are visited in the current propagation iteration). 10 { fwdProp ( . . . ) ; } / / Backward p r o p a g a t i o n backwardPropagationLastRow ( . . . ) ; #pragma omp parallel for private( ... private variable declarations ... ) 11 f o r ( i = h e i g h t −2; i >=0; i − − ) { bwdProp ( . . . ) ; } 12 } / / End o f i t e r a t i v e c h a m f e r d i s t a n c e p r o p a g a t i o n Figure 5 illustrates our band-based image partitioning strategy for a forward propagation of the GDT. The first image row is processed by the master thread outside any parallel processing block. The first row is treated differently from the rest because pixels on the first row have only one causal neighbour. The remaining image rows are partitioned into non-overlapping bands of equal height (called chunk size in OpenMP terminology). Each band is processed concurrently by a different thread. If there are more bands than the total number of threads, the unprocessed bands will be assigned to threads in a round-robin fashion (static scheduling) or to the next available thread (dynamic scheduling). A pseudo code of the parallel implementation of GDT in OpenMP is given in Algorithm 1. Details of the distance propagation are handled in the functions fwdProp(), forwardPropagationFirstRow(), bwdProp(), and backwardPropagationLastRow(). This pseudo code differs from a non-parallel implementation of GDT only in the shaded lines, where a compiler directive appears just before a standard for loop in C. This omp parallel for directive tells the master thread to create a team of parallel threads to process the for loop iterations. When the team of threads completes the statements in the for loop, they synchronise and terminate, leaving only the master thread running. This process is known as the fork-join model of parallel execution [5]. One important requirement in parallel programming is the parallel region must be thread-safe. In order words, each iteration of the for loop should be able to be executed independently without interaction across different threads (e.g., no data dependencies). In GDT, this means the distance propagation within one band should not wait for the result of the previous band. Thread 2 on Figure 5, for example, should not wait until Thread1 finishes the computation of band 1. This means the GDT of band 1 is not propagated to band 2 within the current iteration (it will be in the next iteration). To avoid data dependencies and racing conditions , private variables undergoing change within each thread should be declared in the private clause of the parallel for directive. Because the computed distances from one thread are not used by other threads within the current iteration, it may take longer for the GDT to propagate distances from the top band to the bottom band and vice versa. However, given a dense sampling of seed points, each seed point only has a limited spatial range of influence. In other words, the distance transform at one pixel is never propagated for more than a few bands away. The range of influence depends on seed density and chunk size. In general, a few iterations of forward+backward propagation (fewer than 30) are sufficient for most cases. 5. EVALUATION We compare three different implementations of chamferbased geodesic distance transform: non-parallel, parallel using OpenMP with static scheduling (i.e. round-robin assignment of threads to iterations), and parallel using OpenMP with dynamic scheduling (tasks are assigned to a next available thread). Given an input image, the cost image is computed from the gradient energy plus a constant regularisation offset (e.g., the median gradient energy value), and the seeds from local gradient minima. Low-amplitude random noise is added to the cost image to produce envenly distributed local minima even in flat image regions. 5.1. Task scheduling model and chunk size OpenMP allows two main type of task scheduling: static scheduling, where blocks of iterations are assigned to threads in a round-robin fashion, and dynamic scheduling, where the next block of iterations is assigned to the next available thread. The size of each block, a.k.a the chunk size, is configurable. For static scheduling, the default chunk size is the number of iterations (i.e. number of image rows in our case) divided by the number of threads. To compare different scheduling methods and chunk sizes, we ran GDT on a 1936×1288 cost image (the gradient energy of the image in Figure 9) with 1017 evenly distributed seeds and measured the runtimes. The seeds were selected as
  5. 5. a)2.8GHz quad-core(8 threads) b)2.4GHz dual-core(2 threads) Fig. 6. Runtime as a function of chunk size for different parallel implementations of GDT on a 2MP image with 1017 seeds and roughly 30 iterations of distance propagation. local minima of the cost image using non-maximum suppression (NMS) [19] with a suppression radius (i.e. minimum separation distance) of 20 pixels. The GDT converges in 30 to 31 iterations for all runs with chunk size greater than 10. The same experiment was carried out on two different machines: an Intel Xeon 2.8 GHz quad-core processor with 12 GB of RAM and Microsoft Visual Studio 2010 compiler, and an Intel Core 2 Duo P9400 2.4 GHz dual-core processor with 4 GB of RAM and Microsoft Visual Studio 2005 compiler. The runtimes on these two machines are plotted in Figure 6 for different chunk size, where each data point is averaged over ten repeated runs. Several conclusions can be drawn from Figure 6. There is little difference in the runtimes of static and dynamic scheduling (the red and blue lines). Both parallel implementations are significantly faster than the non-parallel implementation (green line). The speedup factor of parallel versus non-parallel reaches a maximum of 2.6 times on a quad-core machine and 1.3 times on a dual-core one. This maximum speedup occurs at the default chunk size, which is 1288/8=161 for the quad-core and 1288/2=644 for the dual-core machine (there are eight threads on a quad-core processor due to Intel’s hyper-threading technology). The highest speed gain is also achieved at integer fractions (i.e. 1/2, 1/3, 1/4, ...) of the default chunk size. This is when the total number of iterations (1288 image rows) is evenly distributed amongst all threads. In short, static scheduling with default chunk size works best for GDT. This default chunk size will therefore be used in all subsequent experiments. 5.2. Number of iterations until convergence We now show that the number of distance propagation iterations depends on the density of seed points. As stated ealier, the seed points are selected as local minima of the cost image using non-maximum suppression. We varied the NMS radius from 5 to 100 pixels, which results in a number of seed points ranging from 14000 down to 30, respectively. Figure 7a plots the number of distance propagation itera- a) number of GDT iterations b) speedup on a quad-core CPU Fig. 7. Number of iterations until convergence and speedup factor as a function of number of seed points on a 2MP image. tions versus the number of seed points for the same 2MP image used in the previous experiment. As the seeds get denser, the minimum geodesic paths become shorter. Fewer iterations are therefore required to propagate the GDT. If the seeds are sparsely sampled (e.g. less than 1000 seeds for a 2MP image), the parallel implementations require more iterations to complete the GDT compared to the non-parallel one. The reason for this has been mentioned at the end of Section 4. For more than 500 seeds per mega-pixels, there is no difference in the number of iterations for either parallel or non-parallel implementations. Because seed density affects the number of iterations, it also affects the speedup factor. Figure 7b plots the speedup factor of two parallel implementations over the non-parallel one as a function of seed number. Similar to the experiment in the previous subsection, the runtimes are averaged over ten identical runs to smooth out sudden glitches due to the processors being summoned upon high-priority operating system tasks. OpenMP implementations on a quad-core machine speed up GDT by a factor between 1.7 and 2.5. The maximum speedup is achieved when there are 500 seeds per mega-pixels (i.e. one seed for every 50×50 image block). The speedup factor reduces slightly when there are more than 500 seeds per mega-pixels. 5.3. Runtime for different image sizes This subsection investigates the runtime and speedup factor of parallel GDT for different image sizes given the same seed selection strategy. Ten images of different sizes ranging from 0.4 to 10 MP were chosen. For each image, the number of seeds is set to a default value equal to the square root number of pixels. Adaptive NMS (crobust = 1) [3] is used on a negated cost image to produce an exact number of seed points. The runtime results are plotted in Figure 8, where the x-axis specifies the square root of the total number of pixels in the image (which is also the number of seed points or the image width for square images). Figure 8a shows that it takes less than half a second to compute the GDT for a 3MP image. For a 10MP image, the
  6. 6. a) runtime b) speedup factor Fig. 8. Runtime and speedup factor for images of different sizes on a 2.8GHz quad-core machine with 12GB of RAM. runtime increases to 1.5 seconds. The runtime is linearly proportional to the number of pixels in the image (quadratically proportional to the image width as shown in Figure 8a). However, the runtime is image-content dependent as suggested by the two data points around an image width of 1500. Despite having a similar number of pixels, a 1936×1288 image took 0.28 seconds to compute its GDT, while a 1842×1380 image took 0.42 seconds (under static scheduling). Figure 8b shows the speedup factor of two parallel implementations over the non-parallel one. Once again, the speedup is image-content dependent. For 0.5MP images, the speedup factor ranges from 1 to 3 times. As the images get bigger, the speedup factor range shrinks to between 2 to 2.5 times. This variation is due to the different complexity of edges in each image. 6. APPLICATION: SUPERPIXEL SEGMENTATION A superpixel is a group of connected pixels sharing some common properties such as intensity, colour or texture [20]. A useful superpixel segmentation partitions the image into regularly sized and shaped superpixels (i.e. close to round) that respect scene boundaries. This type of segmentation facilitates edge-preserving image processing because the processing can be done on individual superpixels, which do not include pixels across differently textured regions. As mentioned earlier, GDT produces a label image, in which each pixel is associated with its nearest seed label (nearest in term of geodesic distance). Pixels with a common nearest seed are connected; together they form a superpixel. Using the strategy mentioned at the beginning of Section 5, where the cost image is the input image’s gradient energy plus a small offset and the seed points are its local minima, the input image can be segmented into geodesic superpixels. To make the superpixels’ shapes more regular, we moved each seed point to its superpixel centroid [8] and rerun the geodesic distance transform. An example of segmentation of a 2MP image into 1000 superpixels using 3 iterations of seed recentroiding, each with 10 iterations of distance propagation is given in Figure 9. Cyan lines denote the superpixel bound- Fig. 9. 1000 geodesic superpixels on a 1936×1288 image. aries, and yellow dots denote the recentroidal seed points. The superpixel boundaries closely follow strong edges in the image. Note that these superpixels are not designed to cover every edge in the image, especially edges in highly textured areas. This is because geodesic superpixels are grown from well-separated seed points. They do not shrink to fit arbitrarily small regions commonly found in fine textures. We compared our superpixel segmentation result on a 968×644 image in Figure 10 against eight other segmentation methods: • Watershed [16] with shallow region removal using Mathworks’ Image Processing Toolbox (watershed and imhmin) and small region removal using our own Matlab implementation • FH, i.e. graph-based segmentation [9], using a C implementation from the authors 1 • Quickshift [26] using a C implementation from VLFeat2 • Entropy rate [13] using C/MEX code from the authors3 • Centroidal Voronoi Tessellation (CVT) [8] using our own Matlab implementation • Superpixel lattices [17] using a C/MEX implementation from the authors 4 • SLIC superpixels [1] using a command line Windows executable from the authors 5 1 FH:˜pff/segment/ 3 Entropy rate: mingyliu/ ˜ 4 Superpixel lattices: vis/pvl/index.php?option=com_content&view= article&id=76:superpixel-lattices-code&catid=49: downloads&Itemid=62 5 SLIC: RK_SLICSuperpixels/index.html 2 Quickshift:
  7. 7. (a) SLIC superpixels (4.6 seconds) (b) geodesic superpixels (0.64 second) (c) TurboPixels (207 seconds) Fig. 11. Comparison of 3 superpixel segmentation methods (runtime was measured on the full 2MP image in Figure 9). Fig. 10. Results of 9 different superpixel segmentation methods on a 968×644 image (images are ordered as in the table, # denotes number of superpixels returned by the method). • TurboPixels [12] using a Matlab implementation from the authors 6 Default parameters were used for all methods, except for: • FH: min area for region merging was tuned (=22) to produce a desired number of segments • Quickshift: maxdist was tuned (=13) to produce a desired number of segments • SLIC: spatial weight = 5 was chosen instead of 10 (default) for better edge-following superpixels The results in Figure 10 show that only SLIC, TurboPixels and our method produces regular superpixels that follow scene boundaries. Watershed produces a good edge-following segmentation that rivals the recent graph-based and meanshift techniques. Entropy rate superpixel segmentation produces irregular segments around flat image areas. CVT is regular but does not follow image edges. Superpixel lattice produces blocky segmentation. A close-up comparison of three methods that produces the most edge-following regular superpixels is given in Figure 11. SLIC superpixels follow edges well but have jaggy boundaries around textured areas. Our method produces the most regular and edge-following superpixels visually. TurboPixels produces more regular superpixels than SLIC but it misses some strong edges. Geodesic superpixel segmentation is also the fastest methods amongst the three presented. Ours is one order of magnitude faster than SLIC and two orders of 6 TurboPixels: research.html˜babalex/ magnitude faster than TurboPixels using executables from the corresponding authors. This speed advantage is partially due to the parallel GDT implementation on a quad-core machine. We also evaluate all nine superpixel methods using two measures of superpixel regularity. To measure size regularity, the standard deviation of all superpixels’ areas is used. We normalised the standard deviation by the averaged superpixel area to yield a unit-free measure. The smaller the normalised standard deviation of superpixel size is, the better. To measure shape regularity, we used a modified version of the isoperimetric quotient in [22]. The isoperimetric quotient is inverted so that smaller measure means more regular shape. This inverted isoperimetric quotient is computed as the ratio of su√ perpixel Perimeter over the square root of its Area (P/ A). We averaged this ratio over all superpixels to achieve a single √ shape measure per method. The P/ A ratio has a theoretical √ lower bound of 2 π ≈ 3.54 for a circular segment. However, this lower bound is never achieved since circles by themselves cannot form a 2D tessellation. Known tessellations such as √ hexagonal and square grid have an average P/ A ratio of √ 8 3 ≈ 3.72 and 4, respectively. Figure 12 compares the size and shape regularity of the superpixels shown in Figure 10 over the whole image. As expected, CVT produces the smallest area deviation and average ratio. Irregular segmentation methods such as Watershed, FH and QuickShift, on the other hand, produce large values for both measures. Of the three edge-following superpixel methods, SLIC produces the most regular size but least regular shape superpixels, TurboPixel produces the most regular shaped but least regular size superpixels. Our geodesic method achieves a balance between size and shape regularity. 7. CONCLUSION We have shown that the sequential chamfer algorithm for computing geodesic distance transform can be modified for parallel implementation on multicore processors using OpenMP. The parallel implementations yield an exact GDT using a slightly higher number of iterations than a non-parallel implementation. However, the overall speed is increased if the parallel implementations are run under a multicore processor. A speedup factor of 1.3 is achieved for a dual-core machine and 2.6 for a quad-core machine. When
  8. 8. Fig. 12. Comparison of superpixel regularity from different methods (smaller is better). applied to a gradient energy image with evenly distributed seeds, GDT can segment an image into regularly sized and shaped superpixels. Our geodesic superpixel segmentation produces regularly edge-following superpixels at a faster speed than many state-of-the-art methods. 8. ACKNOWLEDGMENT The author would like to thank Khanh Doan and Ernest Wan for reviewing an earlier version of this paper. 9. REFERENCES [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. S¨ sstrunk, “SLIC superpixels compared to state-of-the-art u superpixel methods,” PAMI, 34(11):2274–2282, 2012. [2] X. Bai, and G. Sapiro, “A geodesic framework for fast interactive image and video segmentation and matting,” in Proc. of ICCV, 2007, pp. 510–517. [3] M. Brown, R. Szeliski, and S. Winder, “Multi-image matching using multi-scale oriented patches,” in Proc. of CVPR, 2005, pp. 510–517. [4] M.A. Butt and P. Maragos, “Optimum design of chamfer distance transforms,” IEEE Trans. on Image Processing, 7(10):1477–1484, 1998. [5] B. Chapman, G. Jost, and R. van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, The MIT Press, 2007. [6] D. Coeurjolly and A. Montanvert, “Optimal separable algorithms to compute the reverse Euclidean distance transformation and discrete medial axis in arbitrary dimension,” PAMI, 29(3):437–448, Mar. 2007. [7] A. Criminisi, T. Sharp, and A. Blake, “GeoS: Geodesic image segmentation,” in Proc. of ECCV, 2008, pp. 99–112. [8] Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi tessellations: Applications and algorithms,” SIAM Review, 41(4):637–676, Dec. 1999. [9] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient graphbased image segmentation,” IJCV, 59(2):167–181, 2004. [10] L. Grady, “Random walks for image segmentation,” PAMI, 28(11):1768–1783, 2006. [11] Intel, “Automatic parallelization with Intel compilers,” in Intel guide for developing multithreaded application. Intel Corporation, 2011. [12] A. Levinshtein, A. Stere, K.N. Kutulakos, D.J. Fleet, S.J. Dickinson, and K. Siddiqi, “TurboPixels: Fast superpixels using geometric flows,” PAMI, 31(12):2290–2297, 2009. [13] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in Proc. of CVPR, 2011, pp. 2097–2104. [14] D. Man, K. Uda, H. Ueyama, Y. Ito, and K. Nakano, “Implementations of parallel computation of Euclidean distance map in multicore processors and GPUs,” in Proc. of the First Int’l Conf. on Networking and Computing, 2010, ICNC ’10, pp. 120–127. [15] P. Maragos and M.A. Butt, “Curve evolution, differential morphology, and distance transforms applied to multiscale and eikonal problems,” Fundamenta Informaticae, 41(1-2):91– 129, Jan. 2000. [16] F. Meyer, “Topographic distance and watershed lines,” Signal Processing, 38(1):113–125, July 1994. [17] A.P. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones, “Superpixel lattices,” in Proc. of CVPR, 2008. [18] G. Peyr´ , M. P´ chaud, R. Keriven, and L.D. Cohen, “Geodesic e e methods in computer vision and graphics,” Foundations and Trends in Computer Graphics, 5(3-4):197–397, 2010. [19] T.Q. Pham, “Non-maximum suppression using fewer than two comparisons per pixel,” in Proc. ACIVS, 2010, pp. 438–451. [20] X. Ren and J. Malik, “Learning a classification model for segmentation,” in Proc. of ICCV, 2003. [21] A. Rosenfeld and J.L. Pfaltz, “Distance functions on digital pictures,” Pattern Recognition, 1(1):33–61, 1968. [22] A. Schick, M. Fischer, and R. Stiefelhagen, “Measuring and evaluating the compactness of superpixels,” in Proc. of ICPR, 2012, pp. 930–934. [23] J. Shalf, J. Bashor, D. Patterson, K. Asanovic, K. Yelick, K. Keutzer, and T. Mattson, “The manycore revolution: Will HPC lead or follow?,” SciDAC Review, 14:40–49, 2009. [24] S.J. Shyu, T.W. Chou, and T.L. Chia, “Distance transformation in parallel,” J. of Informatics & Electronics, 1(1):43–54, 2006. [25] G. Slabaugh, R. Boyes, and X. Yang, “Multicore image processing with OpenMP,” Signal Processing Magazine, 27(2):134–138, 2010. [26] A. Vedaldi and S. Soatto, “Quick shift and kernel methods for mode seeking,” in Proc. of ECCV (4), 2008, pp. 705–718. [27] B.J. Verwer, P.W. Verbeek, and S.T. Dekker, “An efficient uniform cost algorithm applied to distance transforms,” PAMI, 11(4):425–429, 1989. [28] P. Wang, G. Zeng, R. Gan, J. Wang, and H. Zha, “Structuresensitive superpixels via geodesic distance,” IJCV, 103(1):1– 21, 2013. [29] G. Zeng, P. Wang, J. Wang, R. Gan, and H. Zha, “Structuresensitive superpixels via geodesic distance,” in Proc. of ICCV, 2011.