Lecture 15 ryuzo okada - vision processors for embedded computer vision

© 2014 Toshiba Corporation
Vision Processors for
Embedded Computer Vision
Ryuzo Okada
Corporate R&D Center, Toshiba Corporation
July 18, 2014

© 2014 Toshiba Corporation 2
• Cameras become ubiquitous
Surveillance Automobile Smartphone
Embedded computer vision
SUBARU Eyesight
http://www.subaru.jp/about/technology/story/eyesight/eyesight01.html

• High performance
for vision processing
– To provide valuable functions to users
– Real-time processing
• Low power consumption
– To reduce running cost
– Max. few watts for fan-less cooling
• Robustness
– Long term operation: 7-10 years
– Outdoor: -40℃ - 85℃
– Shock-proof
General purpose CPU is not feasible
for embedded computer vision processing
Embedded computer vision: Requirements
High
performance/W
(e.g. GOPS/W)

• Types of vision processors
• Vision processors for automobiles
– Toshiba’s image recognition LSI, TMPV75 series
• Cloud computing and vision processors
– Surveillance camera
• Future direction and summary
Contents

Logic-circuit-embedded image sensor
• Silicon retina [Mead89]
– simulated the neural layers in the retina using analog circuits
– Early vision processing, e.g. smoothing
• Optical Neurochip [Nitta92]
– achieved a neural NW
by optical circuits
– Alphabet recognition
Types of vision processors: (1) vision chip

• Programmable Artificial Retina [Bernard93]
Near Sensor Image Processing [Astrom96]
Sensory Processing Element [Ishii96]
– consist of a photodiode (PD) with
a digital processing element (PE)
– Massively parallel processing
(pixel parallel) realized 1 ms
visual servo control
• IVP MAPP [Johansson03]
Column-parallel vision chip [Nakabo02]
– PE is assigned for each column of PD array
Types of vision processors: (1) Vision chip
Vision chip can provide simple functions,
e.g. smoothing, motion estimation.

Types of vision processors: (2) Discrete
Type
Flexibility
Special purposeGeneral purpose
High Low
Efficiency
(e.g.GOPS/W)
ATOM
Tegra K1
EyeQ2
TMPV75
DaVinci
SH7766Intel
NVIDIA
Texas Instruments
TOSHIBA
ST Microelectrionics
RENESAS
TabletMobile
PC
Smart
phone
Network
camera
Automobile

Architecture comparison
TOSHIBA
TMPV7506XBG
TOSHIBA
TMPV7528XBG
ST Micro
EyeQ2
RENESAS
SH7766
NVIDIA
Tegra K1
TI DaVinci
TMS320DM814x
CPU
Media
Processor
or DSP
SIMD
Engine
Accelerator
MPE
266MHz
MPE
MPE
MPE
Affine Transform
Accelerator
Filter Accelerator 180MHz
64 PEs
Filter Accelerator
64 PEs
Histogram Accelerator
HOG Accelerator
Matching Accelerator
MeP
266MHz
DSP (C674x+)
750MHz
Resizer Accelerator
(x 1/16 to 8)
SH4A
534MHz
ARM
Cortex-A8
1GHz
IMP-X2 266MHz
(IntegralImage etc.)
IMR-X 1ch
(Affine)
IMR-LSX 4ch
(Affine)
MIPS34K
332MHz
MIPS34K
332MHz
VMP
VMP
VMP
Classification
Preprocess Window
Filter (Integral Image)
Disparity Finder
Tracker
Trend: Heterogeneous multicore architecture
MPE
266MHz
MPE
MPE
MPE
Affine Transform
Accelerator
Filter Accelerator 180MHz
64 PEs
Filter Accelerator
64 PEs
Histogram Accelerator
HOG Accelerator
Matching Accelerator
MeP
266MHz
ARM
Cortex-A9
300MHz
ARM
Cortex-A9
ARM
Cortex-A15
2.3GHz
ARM
Cortex-A15
2.3GHz
ARM
Cortex-A15
2.3GHz
ARM
Cortex-A15
2.3GHz
CUDA
192 cores
ISP
ISP
[Tanabe12],
[TMPV]
[TMPV] [EyeQ] [SH] [Tegra] [DaVinci]

Contents

TMPV7506XBG Block Diagram
Speaker
I2S
RGB888 /
565
LED
(7-seg)8
DDR2
DRAM
NOR
Flash
DDR2-533
SDRAM
16-bit x 2
NOR
Flash
CAN
UART / SPI / I2C
camera
camera
camera
camera
Video
Input
I/F
Video Output
I/F
Media Processing Engine (MPE)
#1 #2 #3 #4
Accelerators
Affine
Transform
Filter 1 Histogram
32-bit RISC CPU
Main Memory
Controller
On-chip 2MB RAM
WVGA
LCD Panel
Peripherals
CAN
GPIO
Serial I/F
Timer
PCM I/F
MCU I/F CAN MCU
MediaLB/
MOST
CAN
GPIO
Input Capture/Output
Compare
/PWM
TMPV7506XBG
RGB888 / 666 / 565
YCbCr422
BT.656
Y8 – Y12
8-12bit Bayer
Other
ECU
PCI Express
16-bit
2CS
Matching Filter 2 HOG
Multi-core Architecture
for Multiple (up to 4)
Applications
Pedestrians
Lanes &
Vehicles
Accelerators for high
performance image processing
4 camera
inputs
RGB888 / 565
Traffic Signs

Image Processing Accelerators
DDR2
SDRAM
Controller
DDR2
SDRAM
Controller
NOR Flash
/SRAM
Controller
NOR Flash
/SRAM
Controller
Working
RAM
System
ROM
Serial
I/F
Serial
I/F
Video
Input
I/F
Video
Output
I/F
PCI
Express
MeP
Data
Cache
Inst.
Cache
DMAC
Data
RAM
Inst.
RAM
MeP
Data
Cache
Inst.
Cache
DMAC
Data
RAM
Inst.
RAM
Data
Cache
Data
Cache
Inst.
Cache
Inst.
Cache
DMACDMAC
Data
RAM
Data
RAM
Inst.
RAM
Inst.
RAM
L2 Cache
MPE 0 MPE 1 MPE 2 MPE 3
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
L2 Cache
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Data
Cache
Inst.
Cache
Inst.
Cache
DMACDMAC
IVC2IVC2
Data
RAM
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Data
Cache
Inst.
Cache
Inst.
Cache
DMACDMAC
IVC2IVC2
Data
RAM
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Data
Cache
Inst.
Cache
Inst.
Cache
DMACDMAC
IVC2IVC2
Data
RAM
Data
RAM
Data
Cache
Inst.
Cache
DMAC
IVC2
Data
RAM
Data
Cache
Data
Cache
Inst.
Cache
Inst.
Cache
DMACDMAC
IVC2IVC2
Data
RAM
Data
RAM
MCU
I/F
CAN
CANCAN
CAN
CANCAN
HOG Histogram Filter
Crossbar Switch
Matching Affine
System
RAM
x2
• Heterogeneous multi-core architecture
• Multi-level parallelism
– Data=SIMD / Instruction=VLIW / Module＝Image Processing Accelerator (IPA) /
Thread=Multiple cores
Architecture of TMPV7506XBG
Thread-level parallelism with 4 MPEs,
Instruction level with VLIW, and data level with SIMD
Fast image processing using 5 types of IPAs
Wide-band bus with cross bar switch for parallel processing
Flexible memory access
optimization by internal
memories and DMAs

Media Processing Engine (MPE)
IVC2
Core
Registers
Inst. Decoder
Data Cache
Data RAM
Coprocessor Instr.
Decoder
MeP core
Instruction Buffer
16/32
32bit ALU
Instruction Cache
Coprocessor Registers
Pipe0
Pipe1
ALU
Instruction-
1 Instruction-3
Instruction-
2
ALU
MPE
Coprocessor for media
processing
• 2 instruction pipelines.
• Each pipeline can execute a
SIMD (Single Instruction
Multiple Data) instruction
• 64-bit register can handle
eight 8-bit/four 16-bit/two
32-bit data simultaneously
Media Processing Engine
3 instructions /cycle by
VLIW (Very Long Instruction
Word) technology
Media embedded Processor
• Toshiba original 32-bit RISC
CPU core
• low-power consumption

IPA: HOG module
Function Fast computation of HOG/CoHOG[Watanabe10] image
feature followed by linear SVM classification
Interface In: gradient orientation image
Out: classification result / feature vector
Use case Object (e.g. pedestrian) detection
HOG module
HOG/
CoHOG
f
Linear
SVM wTf+b
f
gradient
orientation
Parameters w, b

Image feature: HOG and CoHOG
HOG
CoHOG
Combination of
gradient orientation
frequency
… … …
… … …
frequency
… …
… …
Combination of
gradient orientation
Originfrequency Gradient
orientation

LBP(subset)
0 6 3 7 4 2
0 1 4 3 6 2
1 7 4 5 2 0
3 4 6 2 0 4
6 5 7 3 4 1
1 2 6 3 4 0
5 6 4 5 2 2
0 1 4 3 6 2
0 6 3 7 4 2
3 4 6 2 0 4
6 5 7 3 4 1
0 1 4 3 6 2
Encoded image
Flexibility of HOG module
HOG module has a flexibility to compute different types of
co-occurrence histogram according to input data.
HOG module
Block division
Co-occurrence histogram for
different pair of pixel positions
Feature
vector
Gradient
orientation
1 5 6 0 2 6
0 0 4 3 6 1
3 7 0 5 2 0
6 4 1 2 0 6
8 3 2 3 2 4
0 7 6 0 4 3
2 6 4 5 2 2
1 1 3 3 6 0
0 6 3 7 0 1
2 4 6 2 0 5
6 5 3 0 4 2
2 1 5 2 3 3
CoHOG
encodes
shape
CoHLBP
[Watanabe13]
encodes
texture
Pixel
combination

IPA: Histogram module
Histogram of intensities
Intensity conversion using a look-up-table
Function Fast histogram generation by parallel voting
Data value conversion by a LUT
Interface In: Data array (e.g. image, 1D data array)
Out: Histogram / Converted data array
Use case Contrast enhancement by histogram equalization
Vote counting for Hough transform

IPA: Filter module
Load/Store unit
PE
1
PE
2
PE
3
PE
64・・・
64 processing elements@200MHz
Function Load local image around reference pixel,
execute user-defined operations, and
replace the reference pixel value with the result
Interface In: Image data array
Out: Converted image
Use case Various local operations: e.g. Gaussian filter, Sobel filter
median filter, Harris feature point extraction, etc.

IPA: Affine module
Arbitrary image deformation
Lens distortion correction
Affine Transformation
Arbitrary deformation
Conversion table
Affine trans. parameters
Lens distortion parameters

IPA: Matching module
• Template matching by SAD
– To find a position that has minimum SAD value
2D search
in a local rectangle
1D search
along with an epipolar line
Left image Right image Disparity
Motion estimation
Stereo disparity estimation
time t time t+1

• Back-over Prevention using stereo cameras
– Collision warning for backing up
by obstacle/pedestrian detection
• Using commercial wide-angle camera
for back-view monitor
– Large lens distortion
Example of optimization: Back-over Prevention
Left image Right image
[okada13]

Processing flow of Back-over Prevention
Image input
Stereo image input Rectified image Disparity map Detection result
Blue = far
Undistortion
Depth
estimation
Obstacle
detection
Pedestrian
detection
Warning
Image Feature
CoHOG
Classifier
Linear SVM

Example of obstacle/pedestrian detection
Crouching person is detected as an obstacle

• Each procedure is assigned to suitable IPA
Implementation on TMPV7506XBG
Undistortion&
Rectification
Luminance
correction
Depth estimation
Depth→Color
Shrink
Gradient
orientation
Obstacle
detection
Pedestrian
detection
Affine
Filter
Matching
Histogram
Affine
Filter
MPE
HOG
Image
correction
Depth
estimation
Obstacle
detection
Pattern
recognition
IPA/MPEProcedure

⓪ Before optimization 1120ms
x25
① Use IPAs (sequential procedure) 45ms
Optimization process (1)
Time
HWs
（Display)
Videorate
TMPV7506XBG
（Display)

② Run independent MPEs/IPAs in parallel 42ms
③ Optimize memory access 33ms
– Cache、DataRAM, WorkRAM, DMAC
④ Introduce pipeline procedure 29ms
– Perform “undistortion”@Affine for upper half image
– When finished, start “luminance correction (zero mean)”@Filter for upper half
image while performing “undistortion”@Affine for lower half image
Optimization process (2) x1.1
LSI Power
consumption is
about 0.75 W
x1.3
x1.1
Videorate
TMPV7506XBG

• Improved pedestrian/vehicle detection
– Pattern recognition introducing color-based image feature
– Multi-class classification
• Obstacle detection using a single camera
– 3D reconstruction (SfM)
• Realized by image processing accelerators
Future direction of TMPV family
Vehicles
Pedestrians
New
Enhance Pattern recognition
3D reconstruction
(SfM)
Next gen.

3D shape (depth) estimation using a camera
3D reconstruction (Structure from Motion)
Multiple images taken
from different view angles
3D shape (Depth)
3D
reconstruction
Camera motion
Single camera

Obstacle detection3D position estimation
(Point cloud)
Camera motion estimation
Obstacle detection based on SfM
Feature point
Feature vector
Motion estimation
Multi-view
stereo matching Obstacle detection
(Every few frames)
Camera
motion
R, t
3D point cloud
Obstacle position
Refine
Feature matching
3D position estimation using
image frames captured at
different moment

Accurate depth information
• Finding point correspondences using
multiple images
⇒ Accurate disparity estimation
• Point correspondences are represented by a
parametric probability distribution [Vogiatzis11]
⇒ Saving memory consumption

Example of obstacle detection
Distance 32m
Height 30cm

Pattern recognition
• Improved recognition accuracy using a new image feature,
Heterogeneous co-occurrence feature [Ito10]
– Extension to CoHOG feature
– Combination of 4 types of color-based image features to describe
shape and texture
Example:
color information can tell us
the boundary of the pedestrian

Contents

• Another frontier of embedded computer vision
• Current camera system
– records video streams from cameras,
and human observers look them over
after something has happed
– detects changes and motions
Surveillance camera system
Network
cameras
Hub
Recorder
2011-2012 Surveillance camera market and business
- CMOS, CCD camera series VOL.1 -
Surveillance camera sales (World)#camera(k)

• What is a suitable system configuration for video
analysis using thousands of cameras?
• Cloud?
Making camera system intelligent
Network camera
Hub
Current camera system
Recorder
Image transfer
Data
center
Processing load
Comm. load
[Pham14]

Embedded vision processing can solve the problems
Intelligent surveillance camera system
Network
camera
Hub
Recorder
Meta data Data
center
Image
Video analysis set-top-box
Vision
processor
顔DB
Face
Recognition
Human
Identification
Vision
processor
Data size
Processing load

• TMPV7506XBG analyzes captured images in the
camera
• Example of application: Multiple object detection
– Four different types of objects are detected simultaneously
Intelligent camera using TMPV7506XBG
Total power
consumption is
5-6 W
TMPV7506XBG

Video analysis set-top-box using TMPV7506XBG
The set-top-box can analyze up to 4 camera images
Video Analysis STB
Camera images Application on cloud
People Count
Trajectory

Contents

• Accelerators are often used for realizing specific
applications
• Some of technologies are introduced to general
purpose processors to achieve higher efficiency
General trend of processor LSIs
Time
Efficiency
(e.g.GOPS/W)
General purpose
processors
•3D graphics
•Image compression
•Super computer
Automotive
Wearable?
GPU
Codec
SIMD

• Heterogeneous multicore architecture stays
dominant
– CPU cores + GPGPU (+ Accelerators)
• Many functions will be realized by software after
2020
Future direction of vision processors
Time
Efficiency
(e.g.GOPS/W)
2010 20152005
Minimum performance
required for practical apps.
2020
Wider application
range of CV will
open up
Limited users

• Type of vision processors
– Vision chip: Logic-circuit-embedded image sensor
– Discrete LSI : Heterogeneous multi-core architecture
– Toshiba’s TMPV family:
• 5 types of image processing accelerators
– Future direction
• Color-based image feature, multi-class classifier, SfM
• Vision processors will make surveillance cameras
intelligent efficiently
– Efficiency is achieved by good balance between on-site
processing and cloud processing
• Future direction
– Progress of LSI technology will widen CV application range
Summary

[Mead89] Carver Mead, Analog VLSI and Neural Systems" Addison-Wesley Pub
[Nitta92] Y. Nitta, et al., Proposal of an Optical Neurochip with Internal Analogue Memory and Its Fundamental Characteristics,
Japanese journal of applied physics. Pt. 2, Letters 31(8B), L1182-L1184, 1992
[Bernard93] T. M. Bernard, Y. Zavidovique and F. J. Devos: A Programmable Artificial Retina,
IEEE J. Solid-State Circuits, vol.28, no.7, pp.789-798, 1993.
[Astrom96] A. Astrom, J.-E. Eklund and R. Forchheimer: Global Feature Extraction Operations for Near-Sensor Image Processing,
IEEE Trans. Image Processing, vol.5, no.1, pp.102-110, 1996.
[Ishii96] I. Ishii, et al., Target Tracking Algorithm for 1ms Visual Feedback System Using Massively Parallel Processing,
Proc. IEEE Int. Conf. Robotics and Automation, pp.2309-2314, 1996
[Nakabo02] Y. Nakabo, et al., 3D Tracking Using Two High-Speed Vision Systems,
Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, pp360-365, 2002
[Johansson03] R. Johansson, L. Lindgren, J. Melander and B. Moller: A Multi-Resolution 1000 GOPS 4 Gpixels/s Programmable CMOS Image
Sensor for Machine Vision, Proc. IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, 2003.
[Tanabe12] Y. Tanabe, et al. A 464GOPS 620GOPS/W Heterogeneous Multi-Core SoC for Image-Recognition Applications,
ISSCC Dig Tech Papers, pp. 15-16, 2012
[Watanabe10] T. Watanabe and et al., Co-occurrence Histogram of Oriented Gradients for Human Detection,
IPSJ Trans. on Computer Vision and Applications, Vol. 2, pp. 39-47, 2010
[Watanabe13] T. Watanabe and S. Ito, Two co-occurrence histogram features using gradient orientations and
local binary patterns for pedestrian detection, Proc. of ACPR, pp. 415-419, 2013
[Okada13] R. Okada, T. Watanabe, M. Nishiyama, A. Seki, T. Kozakaya, M. Banno, Multiple Object Detection using Image
Recognition LSI for Automobiles, Proc. of 20th ITS World Congress, No. 4185, 2013
[Vogiatzis11] G.Vogiatzis, et al., Video-based, real-time multi-view stereo, Image and Vision Computing, Vol.29, No.7, pp.434-441, 2011.
[Pham14] Pham, et al., DIET: Dynamic Integration of Extended Tracklets for Tracking Multiple Persons, Proc. of ICPR, 2014 (To be appeared)
[Ito10] S. Ito and S. Kubota, Object Classification Using Heterogeneous Co-occurrence features, Proc. of ECCV, 2010
[TMPV] http://www.semicon.toshiba.co.jp/eng/product/assp/automotive/infotain/tmpv7500/index.html
[EyeQ] http://www.mobileye.com/technology/processing-platforms/eyeq2/
[SH] http://hk.renesas.com/applications/automotive/adas/surround/sh7766/index.jsp
[Tegra] http://www.nvidia.com/object/tegra-k1-processor.html
[DaVinci] http://www.tij.co.jp/jp/lit/ds/symlink/tms320dm8148.pdf
References

Product names (mentioned herein) may be
trademarks of their respective companies.

Lecture 15 ryuzo okada - vision processors for embedded computer vision

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Lecture 15 ryuzo okada - vision processors for embedded computer vision

Similar to Lecture 15 ryuzo okada - vision processors for embedded computer vision (20)

More from mustafa sarac

More from mustafa sarac (20)

Recently uploaded

Recently uploaded (20)

Lecture 15 ryuzo okada - vision processors for embedded computer vision