Maxim Kamensky - Applying image matching algorithms to video recognition and autonomous robot navigation

Applying image matching algorithms to
video recognition and
autonomous robot navigation
Maxim Kamensky, CEO, Invarivision
Dmitriy Yeremeyev, CTO, Invarivision
EECVC presentation, July 9 2016

Image matching algorithms
Features selection
● well explored technology
● able to find partially closed images
● find rotated images
● works slowly
● recognize a little number of objects
Template-based algorithms
● work fast
● able to store many images
● do not cope well with overlapped images
● do not recognize rotated images
Feature
extraction
Input image
Classification
Object
type
Feature
vector
Template
Input image
Search in
template
database
Object
type
Keywords: SURF, SIFT, ConvNet, etc Keywords: BiGG

AVM - Associative Video Memory
Templates - recognition matrices
3x3
7x7
15x15
31x31
Associative tree
Root base
Base 1L1 Base 2L1 Base nL1
Base 1Lm Base 1Lm Base 1Lm
Level 1
Level m
Associative base
Recognition matrix Associated data
Image
Read / Write operation
Associated data
Template-based image matching algorithm

Technique of AVM testingAlso we have tested algorithm AVM on images of "Amsterdam Library of Object Images" (ALOI).
ALOI database have several expressions of the same object. So, it allows to compare how well the algorithm recognizes different
expressions of the same object against how well the algorithm discriminates different objects. To calculate this we perform step listed
below:
Separate database to training and test parts:
● Each object is off-plane rotated every 5 degrees;
● Training part: rotations 0, 10, 20, 30, ... 350; 36 expressions at all;
● Testing part: rotations 5, 15, 25, ... 355; 36 expressions at all;
● Do this for N objects from the dataset.
So for every object we have separate AVM with 36 learned object expressions:
● Match each model against each image of the test part of the database;
● Take the model maximal similarity response for a test image;
● If a model and a test image are of the same object this is genuine (same) matching pair;
● If a model and a test image are of different objects this is impostor (different) matching pair;
● Now we have N * 36 genuine matching pairs and N * ((N – 1) * 36) impostor matching pairs.
We draw special kind of ROC graph called Decision Error Tradeoff (DET) graph:
● On X axis we have False Acceptance Rate (FAR) or False Positive Rate (FPR);
● On Y axis we have False Recjection Rate (FRR) or False Negative Rate (FNR);
● Both axis are logarithmic;
Create models from the training part:
● Each model is in the separate AVM;
● Add 36 training expressions to the AVM with 80x80 key image size for instance;

Christmas bear, © Amsterdam Library of Object Images

AVM performance
Time performance - average time of processing each image (in ms)
Tree capacity - total number of images in the tree (Intel® Xeon® CPU L5630 @ 2.13GHz)

Object search in image
Object training (write)
Sliding window (read).
Scan step is ⅛ of window size.
Window size is scaled up by 25% on each step
Window position is adjusted by AVM
Result : object id, x, y, scale

Autonomous navigation of robots in indoor spaces
Navigation module based on AVM technology
allows the robot to orientate in a space and
navigate precisely to a defined point on the map.
Images
AVM search treeWebcam
Recognized?
Actual position
* X, Y coordinates
* azimuth
*Pairs:
image -> X,Y
and azimuth
Yes
In our case the visual navigation for robot is just sequence of
images with associated coordinates that was memorized
inside AVM tree.

Using of AVM in robotics
Object trackingFollow me

Augmented reality by AVM
3D marker of target position

Implementation for Roborealm - AVM Navigator
AVM Navigator is an module of the RoboRealm system that provides
object recognition and autonomous robot navigation using a single video
camera on the robot as the main sensor for navigation.
Localization error
The localization errors is about 0.1 meter (10
centimeters).

Navigating outdoors
Route recognitionRoute training

Image matching
in video processing

Automatic searching of video fragments
Film Frame
Image
s-core #1
AVM search tree
S-core cluster
Database
Film ID Position
MultiTrack - assembling module
s-core #2
AVM search tree
s-core #N
AVM search tree
Video fragment #1
Film name, position, length
Video fragment #2
Video fragment #M

MultiTrack - assembling of duplicates
Fragment #1 Unknown video Fragment #2 Fragment #3 Fragment #4
Scanned video
Source fragment #1
Duplicate video #1.1
Source fragment #2
Source fragment #3 Source fragment #4
Search results

Distributed system
Customer system
REST API
Invarivision - ISS
Base server
* Task management
* Database
* s-core
* s-coordinator
Node server #1
* s-core
* s-coordinator
Node server #2
* s-core
* s-coordinator
Node server #N
* s-core
* s-coordinator
All these servers can contain applications for
video processing and image recognition.
s-coordinator - application for coordination
of video processing.
s-core - application for reading/writing of the
separate images in the search tree.

Software structure
Video
Database
s-coordinator s-core #1
s-core #2
s-core #N
Ethernet
UDP Multicast
Image
*Film ID
*Position
Frame change
detector
Write
operation
Read/search
operation
12%

Scaling of the search system
s-core #1,1 s-core #2,1 s-core #N,1
s-core #1,2 s-core #2,2 s-core #N,2
s-core #1,M s-core #2,M s-core #N,M
Write speed and capacity
Readspeed
Network of the search cores
Computer cluster

Scheme of the video write
1 2 3 4 5 6 7 8 9 ...
Video frames
3x3
s-core #1,1 s-core #2,1 s-core #3,1

Scheme of the video read
123456789...
Videoframes
3x3

Tree splitting
Scaling of capacity
Tree #1
Tree #1.1 Tree #1.2
Capacity alignment Adding video
for searching
Alignment system Images
Tree #1
Tree #2
Tree #N
Next stage
Tree #1
Tree #2
Tree #N

Storage capacity
RAM 6.2Mb → 1 hour of video
(with FCD 12%)
Server with 256Gb RAM → 41290 hours
of source video for searching
Using SSD disk as a swap space
1.4TB → 225806 hours
Speed with FCD set to 12% of frames on 1 base server
Dual Xeon 2xE5690 (3.47 GHz) is about 50 video
hours per hour.

Interference resistance
Worst quality - CRF 51
Padded corner 15%
Padded center 10%
White noise 100% 10 degrees rotated
Padded center 5% Grayscale
Padded corner 5% Padded corner 10%
White noise 50%
Cropped from center 15%
5 degrees rotated

Test results
Data set Average precision
%
Average recall
%
Average F-measure
%
5 degrees rotated* 100 93.82 % 96.81 %
10 degrees rotated* 100 18.54 % 31.28 %
White noise 50%* 100 98.09 % 99.03 %
White noise 100%* 100 93.9 % 96.85 %
Padded center 5%* 100 97.04 % 98.5 %
Padded center 10%* 100 48.8 % 65.59 %
Padded corner 5%* 100 97.84 % 98.91 %
Padded corner 10%* 100 89.89 % 94.68 %
Padded corner 15%* 100 41.12 % 58.28 %
Cropped from center 10%* 100 96.54 % 98.24 %
Cropped from center 15%* 100 67.44 % 80.55 %
Constant Rate Factor 51* 100 96.53 % 98.23 %
Grayscale* 100 97.5 % 98.73 %
For each scanned interval in video we
can define one of the following
situations:
● True Positive (TP) — system
found correct matching original
interval
● False Positive (FP) — system
found incorrect matching original
interval
● False Negative (FN) — system
didn’t find matching original
interval, but it does exist

Thank you for your attention!
Questions?
Site: Invarivision.com
Email: maxim.kamensky@invarivision.com
Skype: maxim.kamensky
Phone: +380662346738
EECVC presentation, July 9 2016

Maxim Kamensky - Applying image matching algorithms to video recognition and autonomous robot navigation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Maxim Kamensky - Applying image matching algorithms to video recognition and autonomous robot navigation

Similar to Maxim Kamensky - Applying image matching algorithms to video recognition and autonomous robot navigation (20)

Recently uploaded

Recently uploaded (20)

Maxim Kamensky - Applying image matching algorithms to video recognition and autonomous robot navigation