SlideShare a Scribd company logo
“Efficient Variable Size Template Matching
Using Fast Normalized Cross Correlation on
           Multicore Processors”
Durgaprasad Gangodkar, Sachin Gupta, Gurbinder Gill,
           Padam Kumar, Ankush Mittal



Department of Electronics and Computer Engineering
   INDIAN INSTITUTE OF TECHNOLOGY
                     Roorkee
                      India
                                                       1
Contents
1. Introduction

2. NVIDIA’s Compute Unified Device Architecture

3. Normalized and Fast Normalized Cross Correlation
4. Parallel Implementation of Fast Normalized Cross
  Correlation

5. Experimental Details and Performance Evaluation

6. Conclusion
                                                      2
1. Introduction
 Template Matching has its applications in image and signal
processing like image registration, object detection, pattern
matching etc. Given a source image and a template, the
matching algorithm finds the location of template within the
image in terms of specific measures.
• Full search (FS) or exhaustive search algorithms consider
  every pixel in the block to find out the best match --
  computationally very expensive.
• Though there are different measures proposed. An empirical
  study found NCC provides the best performance in all image
  categories in the presence of various image distortions [9].
  NCC is also more robust against image variations such as
  illumination changes then widely used SAD and MAD .
                                                           3
• However NCC is computationally very expensive
  than SAD or MAD, which is a significant drawback in
  its real-time application.

• In this paper we propose the parallel
  implementation of template matching using Full
  Search using NCC as a measure using the concept of
  pre-computed sum-tables [10][11] referred to as
  FNCC for high resolution images on NVIDIA’s
  Graphics Processing Units (GP-GPU’s)



                                                   4
2. NVIDIA’s Compute Unified Device Architecture
• GP-GPUs have emerged as front runners for low-cost
  high-performance computing (HPC) machines
• GTX280 can provide theoretical peak performance of
  around 933 GFLOPs (single precision) and 78 GFLOPs
  (double precision).
• A kernel executes a scalar sequential program on a set of
  parallel threads. The programmer organizes these threads
  into a grid of thread blocks.
Challenges:
• Higher global memory latency
• Higher CPU – Device data transfer latency
• Limited availability of registers
• Limited high-speed shared memory
• Thread synchronization and dynamic kernel configuration
                                                        5
Main contributions of this paper:
1. Novel strategy for parallel calculation of sum-tables using
   prefix-sum algorithm that optimally uses high-speed shared
   memory of GPU.
2. Adaptation of the kernel configuration to variable sized
   templates and efficient use of shared memories offered by
   CUDA
3. Exploitation of the asynchronous nature of kernel calls to
   optimally distribute computation between host and device.
4. Data parallelism in the algorithms by dividing
   computationally intensive tasks for parallel and scalable
   execution on the multiple cores.

                                                            6
3. Normalized and Fast Normalized Cross Correlation
  • NCC has been commonly used as a metric to evaluate the
    similarity (or dissimilarity) measure between two
    compared images[8][9].
  • Template of size ܰ‫	ݕܰ × ݔ‬is matched with an image of
    size ‫.ݕܯ × ݔܯ‬
  • The position (‫)ݏ݋݌ݒ ,	ݏ݋݌ݑ‬of the template ‫ ݐ‬in image ݂ is
    determined by calculating the NCC value at every step.
  • The basic equation for NCC is as given in (1)

                  ∑         ( f ( x, y) − fu,v )(t( x − u, y − v) − t )
    γ u,v =          x, y
                                                                                     (1)
              ∑
              x, y
                     ( f ( x, y) − f u,v )   2
                                                 ∑
                                                 x, y
                                                        (t( x − u, y − v) −t )   2



                                                                                           7
u+N       −1 v + N       −1
                 1              x              y

  f   u ,v   =             ∑             ∑          f (x, y)   (2)
               N xN   y    x=u           y=v


• Direct computation of (1) involves the order of
  ܰ‫ )	ݕܰ −	ݕܯ() ݔܰ − ݔܯ(	ݕܰ × ݔ‬calculations.
• For example, to match a small 16×16 pixel template
  with a 250×250 pixel image would require a total of
  more than “14 million calculations”




                                                                     8
Fast Normalized Cross Correlation (FNCC)
• Calculation of the denominator of equation using the
  concept of sum-tables[10][11].
• ‫ݒ ,ݑ(ݏ‬ሻ	ܽ݊݀	‫2ݏ‬ሺ‫ݒ ,ݑ‬ሻ are sum tables over image
  function and image energy respectively.
• The sum-tables of image function and image energy
  are computed recursively as given below:
                                                 (1)

                                                 (2)

                                                 (3)


                                                 (4)
                                                         9
4. Parallel Implementation of Template
                     Matching
• Though FNCC reduces computational time for low
  resolution images, incurs substantial time for high
  resolution images.
• We adopt two stage approach for template matching
   – In the first stage we parallelize the computation of the
     sum-tables
   – In the second stage we parallelize the computation of
     normalized cross correlation by utilizing the sum-tables
     as a look up.

                                                          10
Computation of Sum-Tables
• The sum tables are calculated by taking the cumulative sum
  over the image points.
• We make use of parallel prefix-sum algorithm as shown in
  figure




 The figure illustrates the working of prefix sum algorithm,
 where n/2 threads can work in parallel to calculate prefix sum
 in O(logn) time complexity
                                                           11
• Sum-tables for template on the host CPU, while GPU is busy
  calculating the sum-tables for the source image exploiting
  asynchronous nature of kernel calls. This eliminates idling of
  host CPU when device is busy
• One row to a thread block.
• Task of each thread grouped in a block configuration
  dynamically decided by template size.
• Every thread caches data in shared memory for template
  image of variable resolution.
• Parallel prefix-sum transpose Parallel prefix-sum
  transpose sum-table
• Use of device pointers in total of four kernels to avoid data
  transfer latencies.
                                                           12
Template matching using FNCC
• For a template of size ܰ௫ × ܰ௬ pixels we divide the source
  image into search window of 2ܰ௫ × 2ܰ௬ pixels.
• The correlation value is calculated utilizing the sum-tables
  as lookup by moving the template over the referenced
  search window pixel by pixel, covering the entire search
  window.




• Highest Correlation indicates best match
• The task of computing correlation for each search window
  is assigned to a single thread.                        13
• The target image is dynamically divided into search
  windows according to the x and y dimensions of the
  variable sized template such that we get the maximum
  number of threads per block.
• Every thread block dynamically caches data such that
  constraint of shared memory (16 KB per block ) is never
  violated.
                                                            14
5. Experimental Details and Performance
                Evaluation
• Execution time and speedup of proposed parallel
  implementation FCC algorithm evaluated on benchmark
  dataset .
• Sequential code implemented on Intel Xeon 3.2 GHz
  processor with 1 GB of DRAM and 32 bit Windows XP OS.
• Parallel code was implemented on NVIDIA GTX 280 having
  1 GB of DDR3 onboard Intel Xeon 3.2 GHz processor with 1
  GB of DRAM and 32 bit Windows XP OS.



                                                      15
CUDA
 Image Size in   Template                                   Sequential
                  Size in    Thread    Threads    Execution Time in sec.    Speedup
    pixels         pixels    Blocks   Per Block    Time in
                                                     sec.
512x512   32x32             5x8       3x2         0.517     1.372          2.7
          24x32             8x5       2x5         0.260     1.097          4.3
          24x16             5x6       6x4         0.047     0.543          11.6
          16x16             5x6       7x6         0.033     0.406          12.3
1024x1024 32x32             9x16      3x2         1.311     6.170          4.8
          24x32             16x9      2x5         0.639     4.773          7.5
          24x16             10x11     6x4         0.179     2.518          14.1
          16x16             10x11     7x6         0.121     1.893          15.6
2048x1080 32x32             10x32     3x2         2.848     13.474         4.8
          24x32             17x17     2x5         1.261     10.344         8.3
          24x16             11x22     6x4         0.391     5.551          14.3
          16x16             10x22     7x6         0.239     4.116          17.3

• For frame size of 2048x1080 and template size 16x16 we could
  achieve the considerable reduction in execution time from 4.116 sec
  to 239 ms yielding a speedup of around 17x.
                                                                                  16
• As the resolution of the image increases the speed-up
  obtained also increases hence opening up the scope for
  handling high resolution digital images.



                                                       17
6. Conclusion
• Every thread has been assigned an independent task of
  computing the correlation for template which eliminates
  inter-thread communication, inter-thread dependencies and
  synchronization.
• Dynamic arrangement of threads into blocks and grids has
  been done depending on the size of the template.
• We have also devised efficient strategy to make use of the
  faster shared memory to overcome memory access latency.
• Thread configuration is scalable to match low resolution or
  high resolution images and varying size template.
• Our future work involves exploring division of larger
  templates into smaller sub-templates further exploit the
  computational power of multicore processors               18
References
1. Ryan, T. W.: The Prediction of Cross-Correlation Accuracy in Digital Stereo-Pair Images. PhD thesis,
   University of Arizona (1981)
2. Burt, P. J., Yen, C., Xu, X.: Local Correlation Measures for Motion Analysis: A Comparative Study. In:
   IEEE Conf. Pattern Recognition and Image Processing, pp. 269-274. IEEE Press, Las Vegas (1982).
3. Essannouni, L., Ibn-Elhaj, E., Aboutajdine, D.: Fast Cross-Spectral Image Registration Using New
   Robust Correlation. In: Journal of Real-Time Image Processing, vol. 1, no. 2, pp. 123-12. Springer
   (2006)
4. Minoru, M., Kunio, K.: Fast Template Matching Based on Normalized Cross Correlation Using
   Adaptive Block Partitioning and Initial Threshold Estimation. In: IEEE International Symposium on
   Multimedia, pp. 196 – 203. IEEE Press, Taichung, Taiwan (2010)
5. Luo, J., Konofagou, E. E.: A Fast Normalized Cross-Correlation Calculation Method for Motion
   Estimation. In: IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, vol. 57, no. 6, pp.
   1347 – 1357. (2010)
6. Zhu, S., Ma, K. K.: A New Diamond Search Algorithm for Fast Block Matching Motion Estimation. In:
   IEEE Trans. Image Processing, vol. 9, no. 2, pp. 287–290. (2000)
7. Tham, J. Y., Ranganath, S., Ranganath, M., Kassim, A. A.: A Novel Unrestricted Center-Biased
   Diamond Search Algorithm for Block Motion Estimation. In: IEEE Trans. Circuits Syst. Video
   Technol., vol. 8, no. 4, pp. 369–377. (1998)
8. Zhu, C., Lin, X., Chau, L.: Hexagon-Based Search Pattern for Fast Block Motion Estimation. In: IEEE
   Trans. Circuits Syst. Video Technol., vol. 12, no. 5, pp. 349-355. (2002)
9. Lewis, J. P.: Fast Template Matching. In: Vision Interface 95, Canadian Image Processing and Pattern
   Recognition Society, pp. 120–123. Quebec City, Canada (1995)
                                                                                                    19
10. Briechl K., Hanebeck, U. D.: Template Matching Using Fast Normalized Cross Correlation. In: SPIE,
    vol. 4387, no. 95. AeroSense Symposium, Orlando, Florida (2001)
11. NVIDIA CUDA Programming Guide, Version 2.2, pp. 10, 27-35, 75-97. (2009)
12. Hii, A. J. H., Hann, C. E., Chase, J. G., Van Houten, E. E. W.: Fast Normalized Cross Correlation for
    Motion Tracking Using Basis Functions. In: Journal of Computer Methods and Programs in
    Biomedicine, vol. 82, no. 2, pp. 144–156. Elsevier (2006)




                                                                                                      20
Thank You

            21

More Related Content

What's hot

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Shanghai Jiao Tong University(上海交通大学)
 
Image classification using neural network
Image classification using neural networkImage classification using neural network
Image classification using neural network
Bhavyateja Potineni
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Eun Ji Lee
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Preferred Networks
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
Ding Li
 
2D_BLIT_software_Blackness
2D_BLIT_software_Blackness2D_BLIT_software_Blackness
2D_BLIT_software_BlacknessShereef Shehata
 
Fractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationFractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block Classification
IRJET Journal
 
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
csandit
 
Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4IAEME Publication
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
[Paper] learning video representations from correspondence proposals
[Paper]  learning video representations from correspondence proposals[Paper]  learning video representations from correspondence proposals
[Paper] learning video representations from correspondence proposals
Susang Kim
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMikhail Kurnosov
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodIAEME Publication
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 

What's hot (19)

DCT
DCTDCT
DCT
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
 
Image classification using neural network
Image classification using neural networkImage classification using neural network
Image classification using neural network
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
2D_BLIT_software_Blackness
2D_BLIT_software_Blackness2D_BLIT_software_Blackness
2D_BLIT_software_Blackness
 
Fractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationFractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block Classification
 
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
 
Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
[Paper] learning video representations from correspondence proposals
[Paper]  learning video representations from correspondence proposals[Paper]  learning video representations from correspondence proposals
[Paper] learning video representations from correspondence proposals
 
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer SystemsMapping Parallel Programs into Hierarchical Distributed Computer Systems
Mapping Parallel Programs into Hierarchical Distributed Computer Systems
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s method
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 

Viewers also liked

Template matching
Template matchingTemplate matching
Template matching
Hasan Ijaz
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
Mustafa Salam
 
Introduction to Digital Image Correlation (DIC)
Introduction to Digital Image Correlation (DIC)Introduction to Digital Image Correlation (DIC)
Introduction to Digital Image Correlation (DIC)
Instron
 
Digital Image Correlation Presentation
Digital Image Correlation PresentationDigital Image Correlation Presentation
Digital Image Correlation Presentation
trilionqualitysystems
 
Facial recognition technology by vaibhav
Facial recognition technology by vaibhavFacial recognition technology by vaibhav
Facial recognition technology by vaibhavVaibhav P
 
Face detection using template matching
Face detection using template matchingFace detection using template matching
Face detection using template matchingBrijesh Borad
 
Correlation coefficient
Correlation coefficientCorrelation coefficient
Correlation coefficientCarlo Magno
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlabAshutosh Shahi
 
Correlation
CorrelationCorrelation
CorrelationTech_MX
 
Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)
Namrata Saxena
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
Anil Mishra
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLAB
Ray Phan
 

Viewers also liked (17)

Template matching
Template matchingTemplate matching
Template matching
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
 
Introduction to Digital Image Correlation (DIC)
Introduction to Digital Image Correlation (DIC)Introduction to Digital Image Correlation (DIC)
Introduction to Digital Image Correlation (DIC)
 
Digital Image Correlation Presentation
Digital Image Correlation PresentationDigital Image Correlation Presentation
Digital Image Correlation Presentation
 
Facial recognition technology by vaibhav
Facial recognition technology by vaibhavFacial recognition technology by vaibhav
Facial recognition technology by vaibhav
 
Face detection using template matching
Face detection using template matchingFace detection using template matching
Face detection using template matching
 
Research Paper On Correlation
Research Paper On CorrelationResearch Paper On Correlation
Research Paper On Correlation
 
Correlation coefficient
Correlation coefficientCorrelation coefficient
Correlation coefficient
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlab
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLAB
 

Similar to Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors

Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
IAEME Publication
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
From Experimentation to Production: The Future of WebGL
From Experimentation to Production: The Future of WebGLFrom Experimentation to Production: The Future of WebGL
From Experimentation to Production: The Future of WebGL
FITC
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
ruvex
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
WoochulShin10
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
BharathiLakshmiAAssi
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
Bharathi Lakshmi Pon
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
MarcinJedyk
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
RAHUL BHOJWANI
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
Marcin Jedyk
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
An35225228
An35225228An35225228
An35225228
IJERA Editor
 
ES_SAA_OG_PF_ECCTD_Pos
ES_SAA_OG_PF_ECCTD_PosES_SAA_OG_PF_ECCTD_Pos
ES_SAA_OG_PF_ECCTD_PosSyed Asad Alam
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
Dong-Won Shin
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
supratikmondal6
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Dongmin Choi
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
byteLAKE
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
Universitat Politècnica de Catalunya
 

Similar to Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors (20)

Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
From Experimentation to Production: The Future of WebGL
From Experimentation to Production: The Future of WebGLFrom Experimentation to Production: The Future of WebGL
From Experimentation to Production: The Future of WebGL
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
An35225228
An35225228An35225228
An35225228
 
ES_SAA_OG_PF_ECCTD_Pos
ES_SAA_OG_PF_ECCTD_PosES_SAA_OG_PF_ECCTD_Pos
ES_SAA_OG_PF_ECCTD_Pos
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 

Recently uploaded

UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors

  • 1. “Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors” Durgaprasad Gangodkar, Sachin Gupta, Gurbinder Gill, Padam Kumar, Ankush Mittal Department of Electronics and Computer Engineering INDIAN INSTITUTE OF TECHNOLOGY Roorkee India 1
  • 2. Contents 1. Introduction 2. NVIDIA’s Compute Unified Device Architecture 3. Normalized and Fast Normalized Cross Correlation 4. Parallel Implementation of Fast Normalized Cross Correlation 5. Experimental Details and Performance Evaluation 6. Conclusion 2
  • 3. 1. Introduction Template Matching has its applications in image and signal processing like image registration, object detection, pattern matching etc. Given a source image and a template, the matching algorithm finds the location of template within the image in terms of specific measures. • Full search (FS) or exhaustive search algorithms consider every pixel in the block to find out the best match -- computationally very expensive. • Though there are different measures proposed. An empirical study found NCC provides the best performance in all image categories in the presence of various image distortions [9]. NCC is also more robust against image variations such as illumination changes then widely used SAD and MAD . 3
  • 4. • However NCC is computationally very expensive than SAD or MAD, which is a significant drawback in its real-time application. • In this paper we propose the parallel implementation of template matching using Full Search using NCC as a measure using the concept of pre-computed sum-tables [10][11] referred to as FNCC for high resolution images on NVIDIA’s Graphics Processing Units (GP-GPU’s) 4
  • 5. 2. NVIDIA’s Compute Unified Device Architecture • GP-GPUs have emerged as front runners for low-cost high-performance computing (HPC) machines • GTX280 can provide theoretical peak performance of around 933 GFLOPs (single precision) and 78 GFLOPs (double precision). • A kernel executes a scalar sequential program on a set of parallel threads. The programmer organizes these threads into a grid of thread blocks. Challenges: • Higher global memory latency • Higher CPU – Device data transfer latency • Limited availability of registers • Limited high-speed shared memory • Thread synchronization and dynamic kernel configuration 5
  • 6. Main contributions of this paper: 1. Novel strategy for parallel calculation of sum-tables using prefix-sum algorithm that optimally uses high-speed shared memory of GPU. 2. Adaptation of the kernel configuration to variable sized templates and efficient use of shared memories offered by CUDA 3. Exploitation of the asynchronous nature of kernel calls to optimally distribute computation between host and device. 4. Data parallelism in the algorithms by dividing computationally intensive tasks for parallel and scalable execution on the multiple cores. 6
  • 7. 3. Normalized and Fast Normalized Cross Correlation • NCC has been commonly used as a metric to evaluate the similarity (or dissimilarity) measure between two compared images[8][9]. • Template of size ܰ‫ ݕܰ × ݔ‬is matched with an image of size ‫.ݕܯ × ݔܯ‬ • The position (‫)ݏ݋݌ݒ , ݏ݋݌ݑ‬of the template ‫ ݐ‬in image ݂ is determined by calculating the NCC value at every step. • The basic equation for NCC is as given in (1) ∑ ( f ( x, y) − fu,v )(t( x − u, y − v) − t ) γ u,v = x, y (1) ∑ x, y ( f ( x, y) − f u,v ) 2 ∑ x, y (t( x − u, y − v) −t ) 2 7
  • 8. u+N −1 v + N −1 1 x y f u ,v = ∑ ∑ f (x, y) (2) N xN y x=u y=v • Direct computation of (1) involves the order of ܰ‫ ) ݕܰ − ݕܯ() ݔܰ − ݔܯ( ݕܰ × ݔ‬calculations. • For example, to match a small 16×16 pixel template with a 250×250 pixel image would require a total of more than “14 million calculations” 8
  • 9. Fast Normalized Cross Correlation (FNCC) • Calculation of the denominator of equation using the concept of sum-tables[10][11]. • ‫ݒ ,ݑ(ݏ‬ሻ ܽ݊݀ ‫2ݏ‬ሺ‫ݒ ,ݑ‬ሻ are sum tables over image function and image energy respectively. • The sum-tables of image function and image energy are computed recursively as given below: (1) (2) (3) (4) 9
  • 10. 4. Parallel Implementation of Template Matching • Though FNCC reduces computational time for low resolution images, incurs substantial time for high resolution images. • We adopt two stage approach for template matching – In the first stage we parallelize the computation of the sum-tables – In the second stage we parallelize the computation of normalized cross correlation by utilizing the sum-tables as a look up. 10
  • 11. Computation of Sum-Tables • The sum tables are calculated by taking the cumulative sum over the image points. • We make use of parallel prefix-sum algorithm as shown in figure The figure illustrates the working of prefix sum algorithm, where n/2 threads can work in parallel to calculate prefix sum in O(logn) time complexity 11
  • 12. • Sum-tables for template on the host CPU, while GPU is busy calculating the sum-tables for the source image exploiting asynchronous nature of kernel calls. This eliminates idling of host CPU when device is busy • One row to a thread block. • Task of each thread grouped in a block configuration dynamically decided by template size. • Every thread caches data in shared memory for template image of variable resolution. • Parallel prefix-sum transpose Parallel prefix-sum transpose sum-table • Use of device pointers in total of four kernels to avoid data transfer latencies. 12
  • 13. Template matching using FNCC • For a template of size ܰ௫ × ܰ௬ pixels we divide the source image into search window of 2ܰ௫ × 2ܰ௬ pixels. • The correlation value is calculated utilizing the sum-tables as lookup by moving the template over the referenced search window pixel by pixel, covering the entire search window. • Highest Correlation indicates best match • The task of computing correlation for each search window is assigned to a single thread. 13
  • 14. • The target image is dynamically divided into search windows according to the x and y dimensions of the variable sized template such that we get the maximum number of threads per block. • Every thread block dynamically caches data such that constraint of shared memory (16 KB per block ) is never violated. 14
  • 15. 5. Experimental Details and Performance Evaluation • Execution time and speedup of proposed parallel implementation FCC algorithm evaluated on benchmark dataset . • Sequential code implemented on Intel Xeon 3.2 GHz processor with 1 GB of DRAM and 32 bit Windows XP OS. • Parallel code was implemented on NVIDIA GTX 280 having 1 GB of DDR3 onboard Intel Xeon 3.2 GHz processor with 1 GB of DRAM and 32 bit Windows XP OS. 15
  • 16. CUDA Image Size in Template Sequential Size in Thread Threads Execution Time in sec. Speedup pixels pixels Blocks Per Block Time in sec. 512x512 32x32 5x8 3x2 0.517 1.372 2.7 24x32 8x5 2x5 0.260 1.097 4.3 24x16 5x6 6x4 0.047 0.543 11.6 16x16 5x6 7x6 0.033 0.406 12.3 1024x1024 32x32 9x16 3x2 1.311 6.170 4.8 24x32 16x9 2x5 0.639 4.773 7.5 24x16 10x11 6x4 0.179 2.518 14.1 16x16 10x11 7x6 0.121 1.893 15.6 2048x1080 32x32 10x32 3x2 2.848 13.474 4.8 24x32 17x17 2x5 1.261 10.344 8.3 24x16 11x22 6x4 0.391 5.551 14.3 16x16 10x22 7x6 0.239 4.116 17.3 • For frame size of 2048x1080 and template size 16x16 we could achieve the considerable reduction in execution time from 4.116 sec to 239 ms yielding a speedup of around 17x. 16
  • 17. • As the resolution of the image increases the speed-up obtained also increases hence opening up the scope for handling high resolution digital images. 17
  • 18. 6. Conclusion • Every thread has been assigned an independent task of computing the correlation for template which eliminates inter-thread communication, inter-thread dependencies and synchronization. • Dynamic arrangement of threads into blocks and grids has been done depending on the size of the template. • We have also devised efficient strategy to make use of the faster shared memory to overcome memory access latency. • Thread configuration is scalable to match low resolution or high resolution images and varying size template. • Our future work involves exploring division of larger templates into smaller sub-templates further exploit the computational power of multicore processors 18
  • 19. References 1. Ryan, T. W.: The Prediction of Cross-Correlation Accuracy in Digital Stereo-Pair Images. PhD thesis, University of Arizona (1981) 2. Burt, P. J., Yen, C., Xu, X.: Local Correlation Measures for Motion Analysis: A Comparative Study. In: IEEE Conf. Pattern Recognition and Image Processing, pp. 269-274. IEEE Press, Las Vegas (1982). 3. Essannouni, L., Ibn-Elhaj, E., Aboutajdine, D.: Fast Cross-Spectral Image Registration Using New Robust Correlation. In: Journal of Real-Time Image Processing, vol. 1, no. 2, pp. 123-12. Springer (2006) 4. Minoru, M., Kunio, K.: Fast Template Matching Based on Normalized Cross Correlation Using Adaptive Block Partitioning and Initial Threshold Estimation. In: IEEE International Symposium on Multimedia, pp. 196 – 203. IEEE Press, Taichung, Taiwan (2010) 5. Luo, J., Konofagou, E. E.: A Fast Normalized Cross-Correlation Calculation Method for Motion Estimation. In: IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, vol. 57, no. 6, pp. 1347 – 1357. (2010) 6. Zhu, S., Ma, K. K.: A New Diamond Search Algorithm for Fast Block Matching Motion Estimation. In: IEEE Trans. Image Processing, vol. 9, no. 2, pp. 287–290. (2000) 7. Tham, J. Y., Ranganath, S., Ranganath, M., Kassim, A. A.: A Novel Unrestricted Center-Biased Diamond Search Algorithm for Block Motion Estimation. In: IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, pp. 369–377. (1998) 8. Zhu, C., Lin, X., Chau, L.: Hexagon-Based Search Pattern for Fast Block Motion Estimation. In: IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 5, pp. 349-355. (2002) 9. Lewis, J. P.: Fast Template Matching. In: Vision Interface 95, Canadian Image Processing and Pattern Recognition Society, pp. 120–123. Quebec City, Canada (1995) 19
  • 20. 10. Briechl K., Hanebeck, U. D.: Template Matching Using Fast Normalized Cross Correlation. In: SPIE, vol. 4387, no. 95. AeroSense Symposium, Orlando, Florida (2001) 11. NVIDIA CUDA Programming Guide, Version 2.2, pp. 10, 27-35, 75-97. (2009) 12. Hii, A. J. H., Hann, C. E., Chase, J. G., Van Houten, E. E. W.: Fast Normalized Cross Correlation for Motion Tracking Using Basis Functions. In: Journal of Computer Methods and Programs in Biomedicine, vol. 82, no. 2, pp. 144–156. Elsevier (2006) 20
  • 21. Thank You 21