The presentation contains brief introduction to a variety of algorithms which can be used for Image Recognition. It focuses on assessing the different tools and algorithms in terms of its reliability to identify desktop/web elements. The criterion is identified as such because the results of the research will be used for desktop automation.
To know more about RAX Automation Suite, visit www.raxsuite.com
4. Feature Detection
Feature Matching
Specific patterns which are unique and can be
easily compared and tracked.
SIFT - Scale-Invariant Feature Transform
SURF - Speeded-Up Robust Features
ORB - Oriented FAST and Rotated BRIEF
2
10. 10
How it should work:
- Use OCR on the whole
screen
- Find the nth occurence
of word
- Get the image position
of that word
11. “
- We cannot train fast.ai to identify all
icons/text that the user will use for
automation.
- Feature matching is good for finding
scene images but an image of a website
,for example, could be full of text.
Therefore confusing the features to be
matched.
11
Assessments:
12. “
- Template Matching is the most reliable
one for citrix automation especially if it
is for matching images with static
graphical user interface.
- OCR is good for web content which
contains dynamic changes to its
graphical design that may be hard for
template matching to track.
12
Assessments:
14. Proposal:
Template
Matching
14
- Template Matching would be used for content
which is unique and those with minimal
changes to the GUI.
- The user should be able to choose the size of
the cropped template. Larger template means
more unique elements included.
15. Proposal:
15
- If template matching still fails, then the user
should opt in to OCR.
- OCR should be able to find the nth occurrence
of the word and get its position.
- Then it could click anywhere on the screen
relative to the text’s position.
OCR
16. Suggestion:
16
- The keyboard keys & shortcut keys would be a
great tool for navigating and getting the cursor
to the different textboxes/links.
Ex. Tab, Page up, Page Down
Win + Down = Minimize
etc.
CNN - similar to a neural network but assumes that the input is an image so it can extract the specific properties that an image have.
Basically we will build our own model and train it based on what the elements on the desktop can be seen.
These pictures of dogs and cats are fed as training images for the system which is labeled. So the system should be fed with a lot of pictures for higher accuracy.
After that the system can give a percentage on how confident it is in identifying if it is a cat or dog.
Dot product, Loss function.
look for the regions in images which have maximum variation when moved (by a small amount) in all regions around it.
SIFT - good when scale of images changes
SURF - faster than SIFT
ORB - since SIFT and SURF are both patented. OPENCV DEVS CREATED THIS ONE WHICH IS Said to be faster.
Basically template matching uses a subimage and finds it specific position inside a larger image.
The goal is the find the highest matching area.
The algorithm will start by sliding the subimage and compare it.
Square difference of the pixels
Correlation - connection of two variables
Correlation coefficient - statistical relationship of two variables