In this presentation we propose the parallel implementation of template matching using Full Search using NCC as a measure using the concept of pre-computed sum-tables referred to as FNCC for high resolution images on NVIDIA’s Graphics Processing Units (GP-GPU’s)
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
An image can be seen as a matrix I, where I(x, y) is the brightness of the pixel located at coordinates (x, y). In the Convolutional neural network, the kernel is nothing but a filter
that is used to extract the features from the images.
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
This presentation on Convolutional neural network tutorial (CNN) will help you understand what is a convolutional neural network, hoe CNN recognizes images, what are layers in the convolutional neural network and at the end, you will see a use case implementation using CNN. CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a "ConvNet". Convolutional networks can also perform optical character recognition to digitize text and make natural-language processing possible on analog and hand-written documents. CNNs can also be applied to sound when it is represented visually as a spectrogram. Now, lets deep dive into this presentation to understand what is CNN and how do they actually work.
Below topics are explained in this CNN presentation(Convolutional Neural Network presentation)
1. Introduction to CNN
2. What is a convolutional neural network?
3. How CNN recognizes images?
4. Layers in convolutional neural network
5. Use case implementation using CNN
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...csandit
This paper proposes a new framework based on Binary Decision Diagrams (BDD) for the graph distribution problem in the context of explicit model checking. The BDD are yet used to represent the state space for a symbolic verification model checking. Thus, we took advantage of high compression ratio of BDD to encode not only the state space, but also the place where each state will be put. So, a fitness function that allows a good balance load of states over the nodes of an homogeneous network is used. Furthermore, a detailed explanation of how to
calculate the inter-site edges between different nodes based on the adapted data structure is presented
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Vision with OF: https://www.udemy.com/vision-with-of/
This lecture is from the computer vision section of the course on udemy. Learn how to use various template matching algorithms such as squared difference, normalized square difference, correlation and correlation co-efficient for template matching. This technique can be modified for object detection and tracking.
Template matching is a technique in computer vision used for finding a sub-image of a target image which matches a template image. This technique is widely used in object detection fields such as vehicle tracking, robotics , medical imaging, and manufacturing .
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
This presentation on Convolutional neural network tutorial (CNN) will help you understand what is a convolutional neural network, hoe CNN recognizes images, what are layers in the convolutional neural network and at the end, you will see a use case implementation using CNN. CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a "ConvNet". Convolutional networks can also perform optical character recognition to digitize text and make natural-language processing possible on analog and hand-written documents. CNNs can also be applied to sound when it is represented visually as a spectrogram. Now, lets deep dive into this presentation to understand what is CNN and how do they actually work.
Below topics are explained in this CNN presentation(Convolutional Neural Network presentation)
1. Introduction to CNN
2. What is a convolutional neural network?
3. How CNN recognizes images?
4. Layers in convolutional neural network
5. Use case implementation using CNN
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...csandit
This paper proposes a new framework based on Binary Decision Diagrams (BDD) for the graph distribution problem in the context of explicit model checking. The BDD are yet used to represent the state space for a symbolic verification model checking. Thus, we took advantage of high compression ratio of BDD to encode not only the state space, but also the place where each state will be put. So, a fitness function that allows a good balance load of states over the nodes of an homogeneous network is used. Furthermore, a detailed explanation of how to
calculate the inter-site edges between different nodes based on the adapted data structure is presented
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Vision with OF: https://www.udemy.com/vision-with-of/
This lecture is from the computer vision section of the course on udemy. Learn how to use various template matching algorithms such as squared difference, normalized square difference, correlation and correlation co-efficient for template matching. This technique can be modified for object detection and tracking.
Template matching is a technique in computer vision used for finding a sub-image of a target image which matches a template image. This technique is widely used in object detection fields such as vehicle tracking, robotics , medical imaging, and manufacturing .
Introduction to Digital Image Correlation (DIC)Instron
This presentation introduces Digital Image Correlation, the optical technique that compares images of a tested specimen’s surface to generate full-field strain and displacement maps.
Introduction to Digital Image Processing Using MATLABRay Phan
This was a 3 hour presentation given to undergraduate and graduate students at Ryerson University in Toronto, Ontario, Canada on an introduction to Digital Image Processing using the MATLAB programming environment. This should provide the basics of performing the most common image processing tasks, as well as providing an introduction to how digital images work and how they're formed.
You can access the images and code that I created and used here: https://www.dropbox.com/sh/s7trtj4xngy3cpq/AAAoAK7Lf-aDRCDFOzYQW64ka?dl=0
From Experimentation to Production: The Future of WebGLFITC
Presented at FITC Toronto 2017
More info at http://fitc.ca/event/to17/
Hector Arellano, Firstborn
Morgan Villedieu, Firstborn
Overview
You don’t need an advanced degree in graphics engineering to use WebGL as a robust solution in your web design and development. During this talk you will discover how to harness the power of WebGL for real-world application.
Objective
Discover real-world applications for advanced WebGL techniques
Target Audience
Designers or developers excited to conquer the complexity associated with WebGL
Five Things Audience Members Will Learn
Explore the outer limits of physics effects, shaders and experimentation
Understand how these techniques can be applied to transform 3D to 2D shadows and post-processing
Render real-time liquid in WebGL
Use DOM as a texture so you get the power of WebGL without having to worry about a fallback system
Master the basics by utilizing libraries
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
Introduction to computer vision with Convoluted Neural Networks - going over history of CNNs, describing basic concepts such as convolution and discussing applications of computer vision and image recognition technologies
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Here, we have implemented CNN network in FPGA by incorporating a novel technique of convolution which includes pipelining technique as well as parallelism (by optimizing) between the two.
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
See our presentation from the 6th International EULAG Users Workshop. We talked about taking HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. It explains how Machine Learning can dynamically optimize HPC simulations and byteLAKE's software autotuning solution.
Find out more about byteLAKE at: www.byteLAKE.com
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Similar to Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors (20)
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Efficient Variable Size Template Matching Using Fast Normalized Cross Correlation on Multicore Processors
1. “Efficient Variable Size Template Matching
Using Fast Normalized Cross Correlation on
Multicore Processors”
Durgaprasad Gangodkar, Sachin Gupta, Gurbinder Gill,
Padam Kumar, Ankush Mittal
Department of Electronics and Computer Engineering
INDIAN INSTITUTE OF TECHNOLOGY
Roorkee
India
1
2. Contents
1. Introduction
2. NVIDIA’s Compute Unified Device Architecture
3. Normalized and Fast Normalized Cross Correlation
4. Parallel Implementation of Fast Normalized Cross
Correlation
5. Experimental Details and Performance Evaluation
6. Conclusion
2
3. 1. Introduction
Template Matching has its applications in image and signal
processing like image registration, object detection, pattern
matching etc. Given a source image and a template, the
matching algorithm finds the location of template within the
image in terms of specific measures.
• Full search (FS) or exhaustive search algorithms consider
every pixel in the block to find out the best match --
computationally very expensive.
• Though there are different measures proposed. An empirical
study found NCC provides the best performance in all image
categories in the presence of various image distortions [9].
NCC is also more robust against image variations such as
illumination changes then widely used SAD and MAD .
3
4. • However NCC is computationally very expensive
than SAD or MAD, which is a significant drawback in
its real-time application.
• In this paper we propose the parallel
implementation of template matching using Full
Search using NCC as a measure using the concept of
pre-computed sum-tables [10][11] referred to as
FNCC for high resolution images on NVIDIA’s
Graphics Processing Units (GP-GPU’s)
4
5. 2. NVIDIA’s Compute Unified Device Architecture
• GP-GPUs have emerged as front runners for low-cost
high-performance computing (HPC) machines
• GTX280 can provide theoretical peak performance of
around 933 GFLOPs (single precision) and 78 GFLOPs
(double precision).
• A kernel executes a scalar sequential program on a set of
parallel threads. The programmer organizes these threads
into a grid of thread blocks.
Challenges:
• Higher global memory latency
• Higher CPU – Device data transfer latency
• Limited availability of registers
• Limited high-speed shared memory
• Thread synchronization and dynamic kernel configuration
5
6. Main contributions of this paper:
1. Novel strategy for parallel calculation of sum-tables using
prefix-sum algorithm that optimally uses high-speed shared
memory of GPU.
2. Adaptation of the kernel configuration to variable sized
templates and efficient use of shared memories offered by
CUDA
3. Exploitation of the asynchronous nature of kernel calls to
optimally distribute computation between host and device.
4. Data parallelism in the algorithms by dividing
computationally intensive tasks for parallel and scalable
execution on the multiple cores.
6
7. 3. Normalized and Fast Normalized Cross Correlation
• NCC has been commonly used as a metric to evaluate the
similarity (or dissimilarity) measure between two
compared images[8][9].
• Template of size ܰ ݕܰ × ݔis matched with an image of
size .ݕܯ × ݔܯ
• The position ()ݏݒ , ݏݑof the template ݐin image ݂ is
determined by calculating the NCC value at every step.
• The basic equation for NCC is as given in (1)
∑ ( f ( x, y) − fu,v )(t( x − u, y − v) − t )
γ u,v = x, y
(1)
∑
x, y
( f ( x, y) − f u,v ) 2
∑
x, y
(t( x − u, y − v) −t ) 2
7
8. u+N −1 v + N −1
1 x y
f u ,v = ∑ ∑ f (x, y) (2)
N xN y x=u y=v
• Direct computation of (1) involves the order of
ܰ ) ݕܰ − ݕܯ() ݔܰ − ݔܯ( ݕܰ × ݔcalculations.
• For example, to match a small 16×16 pixel template
with a 250×250 pixel image would require a total of
more than “14 million calculations”
8
9. Fast Normalized Cross Correlation (FNCC)
• Calculation of the denominator of equation using the
concept of sum-tables[10][11].
• ݒ ,ݑ(ݏሻ ܽ݊݀ 2ݏሺݒ ,ݑሻ are sum tables over image
function and image energy respectively.
• The sum-tables of image function and image energy
are computed recursively as given below:
(1)
(2)
(3)
(4)
9
10. 4. Parallel Implementation of Template
Matching
• Though FNCC reduces computational time for low
resolution images, incurs substantial time for high
resolution images.
• We adopt two stage approach for template matching
– In the first stage we parallelize the computation of the
sum-tables
– In the second stage we parallelize the computation of
normalized cross correlation by utilizing the sum-tables
as a look up.
10
11. Computation of Sum-Tables
• The sum tables are calculated by taking the cumulative sum
over the image points.
• We make use of parallel prefix-sum algorithm as shown in
figure
The figure illustrates the working of prefix sum algorithm,
where n/2 threads can work in parallel to calculate prefix sum
in O(logn) time complexity
11
12. • Sum-tables for template on the host CPU, while GPU is busy
calculating the sum-tables for the source image exploiting
asynchronous nature of kernel calls. This eliminates idling of
host CPU when device is busy
• One row to a thread block.
• Task of each thread grouped in a block configuration
dynamically decided by template size.
• Every thread caches data in shared memory for template
image of variable resolution.
• Parallel prefix-sum transpose Parallel prefix-sum
transpose sum-table
• Use of device pointers in total of four kernels to avoid data
transfer latencies.
12
13. Template matching using FNCC
• For a template of size ܰ௫ × ܰ௬ pixels we divide the source
image into search window of 2ܰ௫ × 2ܰ௬ pixels.
• The correlation value is calculated utilizing the sum-tables
as lookup by moving the template over the referenced
search window pixel by pixel, covering the entire search
window.
• Highest Correlation indicates best match
• The task of computing correlation for each search window
is assigned to a single thread. 13
14. • The target image is dynamically divided into search
windows according to the x and y dimensions of the
variable sized template such that we get the maximum
number of threads per block.
• Every thread block dynamically caches data such that
constraint of shared memory (16 KB per block ) is never
violated.
14
15. 5. Experimental Details and Performance
Evaluation
• Execution time and speedup of proposed parallel
implementation FCC algorithm evaluated on benchmark
dataset .
• Sequential code implemented on Intel Xeon 3.2 GHz
processor with 1 GB of DRAM and 32 bit Windows XP OS.
• Parallel code was implemented on NVIDIA GTX 280 having
1 GB of DDR3 onboard Intel Xeon 3.2 GHz processor with 1
GB of DRAM and 32 bit Windows XP OS.
15
16. CUDA
Image Size in Template Sequential
Size in Thread Threads Execution Time in sec. Speedup
pixels pixels Blocks Per Block Time in
sec.
512x512 32x32 5x8 3x2 0.517 1.372 2.7
24x32 8x5 2x5 0.260 1.097 4.3
24x16 5x6 6x4 0.047 0.543 11.6
16x16 5x6 7x6 0.033 0.406 12.3
1024x1024 32x32 9x16 3x2 1.311 6.170 4.8
24x32 16x9 2x5 0.639 4.773 7.5
24x16 10x11 6x4 0.179 2.518 14.1
16x16 10x11 7x6 0.121 1.893 15.6
2048x1080 32x32 10x32 3x2 2.848 13.474 4.8
24x32 17x17 2x5 1.261 10.344 8.3
24x16 11x22 6x4 0.391 5.551 14.3
16x16 10x22 7x6 0.239 4.116 17.3
• For frame size of 2048x1080 and template size 16x16 we could
achieve the considerable reduction in execution time from 4.116 sec
to 239 ms yielding a speedup of around 17x.
16
17. • As the resolution of the image increases the speed-up
obtained also increases hence opening up the scope for
handling high resolution digital images.
17
18. 6. Conclusion
• Every thread has been assigned an independent task of
computing the correlation for template which eliminates
inter-thread communication, inter-thread dependencies and
synchronization.
• Dynamic arrangement of threads into blocks and grids has
been done depending on the size of the template.
• We have also devised efficient strategy to make use of the
faster shared memory to overcome memory access latency.
• Thread configuration is scalable to match low resolution or
high resolution images and varying size template.
• Our future work involves exploring division of larger
templates into smaller sub-templates further exploit the
computational power of multicore processors 18
19. References
1. Ryan, T. W.: The Prediction of Cross-Correlation Accuracy in Digital Stereo-Pair Images. PhD thesis,
University of Arizona (1981)
2. Burt, P. J., Yen, C., Xu, X.: Local Correlation Measures for Motion Analysis: A Comparative Study. In:
IEEE Conf. Pattern Recognition and Image Processing, pp. 269-274. IEEE Press, Las Vegas (1982).
3. Essannouni, L., Ibn-Elhaj, E., Aboutajdine, D.: Fast Cross-Spectral Image Registration Using New
Robust Correlation. In: Journal of Real-Time Image Processing, vol. 1, no. 2, pp. 123-12. Springer
(2006)
4. Minoru, M., Kunio, K.: Fast Template Matching Based on Normalized Cross Correlation Using
Adaptive Block Partitioning and Initial Threshold Estimation. In: IEEE International Symposium on
Multimedia, pp. 196 – 203. IEEE Press, Taichung, Taiwan (2010)
5. Luo, J., Konofagou, E. E.: A Fast Normalized Cross-Correlation Calculation Method for Motion
Estimation. In: IEEE Trans. on Ultrasonics, Ferroelectrics and Frequency Control, vol. 57, no. 6, pp.
1347 – 1357. (2010)
6. Zhu, S., Ma, K. K.: A New Diamond Search Algorithm for Fast Block Matching Motion Estimation. In:
IEEE Trans. Image Processing, vol. 9, no. 2, pp. 287–290. (2000)
7. Tham, J. Y., Ranganath, S., Ranganath, M., Kassim, A. A.: A Novel Unrestricted Center-Biased
Diamond Search Algorithm for Block Motion Estimation. In: IEEE Trans. Circuits Syst. Video
Technol., vol. 8, no. 4, pp. 369–377. (1998)
8. Zhu, C., Lin, X., Chau, L.: Hexagon-Based Search Pattern for Fast Block Motion Estimation. In: IEEE
Trans. Circuits Syst. Video Technol., vol. 12, no. 5, pp. 349-355. (2002)
9. Lewis, J. P.: Fast Template Matching. In: Vision Interface 95, Canadian Image Processing and Pattern
Recognition Society, pp. 120–123. Quebec City, Canada (1995)
19
20. 10. Briechl K., Hanebeck, U. D.: Template Matching Using Fast Normalized Cross Correlation. In: SPIE,
vol. 4387, no. 95. AeroSense Symposium, Orlando, Florida (2001)
11. NVIDIA CUDA Programming Guide, Version 2.2, pp. 10, 27-35, 75-97. (2009)
12. Hii, A. J. H., Hann, C. E., Chase, J. G., Van Houten, E. E. W.: Fast Normalized Cross Correlation for
Motion Tracking Using Basis Functions. In: Journal of Computer Methods and Programs in
Biomedicine, vol. 82, no. 2, pp. 144–156. Elsevier (2006)
20