Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Hardware Architecture for Calculating LBP-Based
Image Region Descriptors
Michał Fularz1 Marek Kraft1
1Poznań University of Technology
Institute of Control and Information Engineering
May 27, 2015
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Table of contents
Goal and motivation
1 Introduction
Table of contents
Goal and motivation
2 The implemented algorithm
Non-redundant uniform local binary patterns (NRULBP)
Region descriptor formation
3 The implemented system
System architecture
NRULBP computation
Local histogram computation
4 Results
Summary of used resources
Processing speed
5 Conclusions
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Table of contents
Goal and motivation
Goal
To implement an efficient hardware architecture, enabling the
computation of LBP-based image region descriptors with
corresponding occurrence histograms
Motivation
LBP-based descriptors can be used in a wide range of
computer vision applications
Computation of features as well as their distributions in the
form of the local occurrence histograms for the whole image is
a compute-intensive task
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Table of contents
Goal and motivation
Sample applications of LBP-based descriptors
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Non-redundant uniform local binary patterns (NRULBP)
Region descriptor formation
Descriptor formation
Regular LBP is formed based on gray level value comparison
operations in 8-neighborhood, giving rise to 256 different LBP
variants
Over 90% of LBPs are ones with less than 3 transitions in the
binary vector - so called uniform LBPs (ULBP) with 59
different variants
Further reduction with non-redundant ULBPs - ULBPs and
their binary negation (e.g. 0b00000110 - 0b11111001)
considered the same, reducing the number of variants to 30
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Non-redundant uniform local binary patterns (NRULBP)
Region descriptor formation
Region descriptor formation
LBPs or their variants can be used to construct region descriptors:
histograms of descriptors are computed within smaller cells
the cells form a grid whose dimensions correspond to the
expected dimensions of described object
individual histograms are concatenated to form the full
descriptor
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor
The input data are the image pixels in progressive scan mode
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor
The input pixels are arranged to form a 3 × 3 window
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor - NRULBP computation
the raw LBP value is computed based on the results from a set
of comparators
the NRULBP is computed from the raw value using a lookup
table
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor - local histogram computation
The resulting NRULBP stream is arranged as a 50 × 1 window,
enabling parallel computation of local histograms
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor - local histogram computation
local histogram modified based on the entry and exit histogram
fully pipelined and systolic approach
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor - data arrangement
Final spatial arrangement ensured by the cell delay blocks
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
System architecture
NRULBP computation
Local histogram computation
Block diagram of the coprocessor
Descriptor type, cell size and arrangement can be easily changed
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Summary of used resources
Processing speed
Summary of programmable logic resources used for implementation
The designations are: FF – flipflops, LUT – lookup tables, BRAM –
BlockRAM memory blocks. The total amount of resources available
in example target devices is given for reference in the bottom row
of the table.
FF LUT BRAM
Resource utilization used % of avail. used % of avail. used % of avail.
10 × 5 13816 12,98 23223 46,63 26 18,57
5 × 5 6966 6,55 12789 24,04 13 9,29
6 × 3 6536 6,14 12163 22,86 16 11,07
XC7Z020 106400 53200 140
XC7Z045 437200 218600 545
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Summary of used resources
Processing speed
Processing speed
Tests were performed using a 10 × 5 grid of 5 × 5 cells on all image
pixels. The power consumption is well below 3W. Clock frequency
is 75MHz.
Number of frames per second that the proposed hardware
accelerator can process for different image sizes:
resolution processing time [ms] frames per second
640 x 480 3,84 260,4
1280 x 720 11,52 86,8
1920 x 1080 25,92 38,6
PC implementation achieves 1,5 FPS on VGA resolution images.
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Conclusions
The system offers high performance at low power in a small
footprint
The architecture can be easily adapted to a range of tasks
requiring various cell size or cell grid size. Change of descriptor
type (e.g. to HoG) requires more work, but is also possible
Future work will be focused on integrating the implemented
architecture with a coprocessor performing a classification task
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
Introduction
The implemented algorithm
The implemented system
Results
Conclusions
Thank you for your attention
The project was financed by the National Science Center under the contract decision number
DEC-2011/03/N/ST6/03022,
New concept of the network of smart cameras with enhanced autonomy for automatic surveillance
systems
M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...

Hardware Architecture for Calculating LBP-Based Image Region Descriptors

  • 1.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Hardware Architecture for Calculating LBP-Based Image Region Descriptors Michał Fularz1 Marek Kraft1 1Poznań University of Technology Institute of Control and Information Engineering May 27, 2015 M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 2.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Table of contents Goal and motivation 1 Introduction Table of contents Goal and motivation 2 The implemented algorithm Non-redundant uniform local binary patterns (NRULBP) Region descriptor formation 3 The implemented system System architecture NRULBP computation Local histogram computation 4 Results Summary of used resources Processing speed 5 Conclusions M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 3.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Table of contents Goal and motivation Goal To implement an efficient hardware architecture, enabling the computation of LBP-based image region descriptors with corresponding occurrence histograms Motivation LBP-based descriptors can be used in a wide range of computer vision applications Computation of features as well as their distributions in the form of the local occurrence histograms for the whole image is a compute-intensive task M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 4.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Table of contents Goal and motivation Sample applications of LBP-based descriptors M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 5.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Non-redundant uniform local binary patterns (NRULBP) Region descriptor formation Descriptor formation Regular LBP is formed based on gray level value comparison operations in 8-neighborhood, giving rise to 256 different LBP variants Over 90% of LBPs are ones with less than 3 transitions in the binary vector - so called uniform LBPs (ULBP) with 59 different variants Further reduction with non-redundant ULBPs - ULBPs and their binary negation (e.g. 0b00000110 - 0b11111001) considered the same, reducing the number of variants to 30 M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 6.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Non-redundant uniform local binary patterns (NRULBP) Region descriptor formation Region descriptor formation LBPs or their variants can be used to construct region descriptors: histograms of descriptors are computed within smaller cells the cells form a grid whose dimensions correspond to the expected dimensions of described object individual histograms are concatenated to form the full descriptor M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 7.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor The input data are the image pixels in progressive scan mode M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 8.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor The input pixels are arranged to form a 3 × 3 window M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 9.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor - NRULBP computation the raw LBP value is computed based on the results from a set of comparators the NRULBP is computed from the raw value using a lookup table M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 10.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor - local histogram computation The resulting NRULBP stream is arranged as a 50 × 1 window, enabling parallel computation of local histograms M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 11.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor - local histogram computation local histogram modified based on the entry and exit histogram fully pipelined and systolic approach M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 12.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor - data arrangement Final spatial arrangement ensured by the cell delay blocks M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 13.
    Introduction The implemented algorithm Theimplemented system Results Conclusions System architecture NRULBP computation Local histogram computation Block diagram of the coprocessor Descriptor type, cell size and arrangement can be easily changed M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 14.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Summary of used resources Processing speed Summary of programmable logic resources used for implementation The designations are: FF – flipflops, LUT – lookup tables, BRAM – BlockRAM memory blocks. The total amount of resources available in example target devices is given for reference in the bottom row of the table. FF LUT BRAM Resource utilization used % of avail. used % of avail. used % of avail. 10 × 5 13816 12,98 23223 46,63 26 18,57 5 × 5 6966 6,55 12789 24,04 13 9,29 6 × 3 6536 6,14 12163 22,86 16 11,07 XC7Z020 106400 53200 140 XC7Z045 437200 218600 545 M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 15.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Summary of used resources Processing speed Processing speed Tests were performed using a 10 × 5 grid of 5 × 5 cells on all image pixels. The power consumption is well below 3W. Clock frequency is 75MHz. Number of frames per second that the proposed hardware accelerator can process for different image sizes: resolution processing time [ms] frames per second 640 x 480 3,84 260,4 1280 x 720 11,52 86,8 1920 x 1080 25,92 38,6 PC implementation achieves 1,5 FPS on VGA resolution images. M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 16.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Conclusions The system offers high performance at low power in a small footprint The architecture can be easily adapted to a range of tasks requiring various cell size or cell grid size. Change of descriptor type (e.g. to HoG) requires more work, but is also possible Future work will be focused on integrating the implemented architecture with a coprocessor performing a classification task M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...
  • 17.
    Introduction The implemented algorithm Theimplemented system Results Conclusions Thank you for your attention The project was financed by the National Science Center under the contract decision number DEC-2011/03/N/ST6/03022, New concept of the network of smart cameras with enhanced autonomy for automatic surveillance systems M. Fularz, M. Kraft A Hardware Architecture for Calculating LBP-Based...