6 - Conception of an Autonomous UAV using Stereo Vision (presented in an Indonesian conference)
1. Conception of an Autonomous Driving of Unmanned
Vehicles using Stereo Vision
Youness Lahdili #1
, Afandi Ahmad *2
*
Bioelectronic & Instrumentation Res. Lab. (MiNT-SRC),
#
Embedded Computing System (EmbCos), FKEE,
Universiti Tun Hussein Onn Malaysia
P. O Box 101, 86400 Batu Pahat, Johor, Malaysia
afandia@uthm.edu.my
Abbes Amira
Computer Science and Engineering Department,
College of Engineering, P. O Box 2713, Qatar University
abbes.amira@qu.edu.qa
Abstract— In the last five years, the emergence of unmanned
vehicles as a new class of robots. These range from unmanned
aerial vehicles (UAVs), micro aerial vehicles (MAVs) commonly
called drones, to remotely operated underwater vehicles (ROVs),
through inflatable dirigibles balloons (airships), driverless cars
and all other amateur and hobbyists gadgets in between, like the
popularized quadcopters. They are now more affordable than
anytime before, and are slowly but surely populating our roads,
waters and skies, to the point they will become ubiquitous
elements in the traffic and logistics. But till today, these vehicles
are only being guided remotely by on-ground human operators
who are prone to error and time-conscious. In this study, the
application of unmanned vehicles will be taken to a new level, by
sparing them the need for a human operator, and making them
fully autonomous. This will be possible by harnessing the power
of two computer vision methods that are essential parts in
photogrammetry technology: stereo vision depth and structure
from motion (SfM). Our contribution will allow the unmanned
vehicle to be auto-aware of the dangers and obstructions that will
cross its way without any human intervention.
Keywords— Stereo Vision, Features Detection, UAVs,
Photogrammetry
I. INTRODUCTION
The proposed solution for this problematic of navigation
autonomy resides in the design of an intelligent trajectory
estimation system. This system will predict and trace a safe
itinerary, also known as a planned path, clear from all sorts of
obstacles. The vehicle will then only have to follow the
coordinates of this itinerary until reaching its given destination.
This itinerary must be continuously auto-adapting itself with
the real-time change of environment, traffic movement, and
the occurrence of unexpected obstacles, especially those
moving. So our aim is to make stereo vision depth and SfM
coexist on the same System-on-Chip (SoC), and to apply them
in real autonomous navigation situation by testing them in the
field. In that way, one method can mitigate the error that the
other could have propagated, in the manner of hybrid systems.
This implies building a field programmable gate array (FPGA)
prototype and mounting it on a binocular mid-range drone for
validation and deployment. MATLAB has been selected as
our abstraction tool, and Vivado HLS will be the middleware
between the hardware description language (HDL) and FPGA.
II. PROCESS DESCRIPTIONS
A. Overview
With stereo vision we refer to all cases where the same
scene is observed by two cameras at different viewing
positions. Hence, each camera observes a different projection
of the scene, which allows us to perform inference on the
scene’s geometry. The obvious example for this mechanism is
the human visual system. Our eyes are laterally displaced,
which is why we observe a slightly different view of the
current scene with each. This allows our brain to infer the
depth of the scene in view, which is commonly referred to as
stereopsis. Although it has for long been believed that we are
only able to sense the scene depth for distances up to few
meters, Palmisano et al. [1] recently showed that stereo vision
can support our depth perception abilities even for larger
distances. Using two cameras, it is possible to mimic the
human ability of depth perception through stereo vision. An
introduction to this field has been provided by Klette [2].
Depth perception is possible for arbitrary camera
configurations, if the cameras share a sufficiently large
common field of view. We assume that we have two idealized
pinhole-type cameras C1 and C2 with projection centers O1
and O2, as depicted in Figure 1. The distance between both
projection centers is the baseline distance b. Both cameras
observe the same point p, which is projected as p1 in the
image plane belonging to camera C1. We are now interested
in finding the point p2, which is the projection of the same
point p on the image plane of camera C2. In literature, this
task is known as the stereo correspondence problem, and its
solution through matching p1 to possible points in the image
plane of C2 is called stereo matching.
Fig. 1 Example for the epipolar geometry
Epipolar PlaneImage
Plane 1
Epipolar
Line 1
p1
p2
Baseline
O1
Epipole
s
O2
Image
Plane 2
Epipolar
Line 2
p
2. In order to implement stereo vision depth awareness on
unmanned vehicles, we have first to solve this stereo matching
problem, which goes back to the question of how to make the
FPGA able to tell that two points on two images taken of the
same scene belongs to the same scene feature.
To achieve this result, we have to go through three (3) main
stages. We will elaborate on each of them in the following
sections, while pointing to the limitations observed and how
we intend to tackle them in our proposed research
B. Image Rectification
The common approach to stereo vision includes a
preliminary image rectification step, during which distortions
are corrected. The resulting image after rectification should
match the image received from an ideal pinhole camera. To be
able to perform such a correction, we first require an accurate
model of the image distortions. The distortion model that is
most frequently used for this task today, is the one introduced
by Brown [3]. Using Brown’s distortion model, we are able to
calculate the undistorted image location (ũ, ṽ) that
corresponds to the image location (u, v) in the distorted image.
Existing implementations of the discussed algorithms can be
found in the OpenCV library (Itseez, [4]) or the MATLAB
camera calibration toolbox (Bouguet, [5]), and that is how we
plan to resolve this question of image rectification.
C. Sparse Vision Method
Despite the groundbreaking work by [6]–[11], there is a
gap regarding the speed performance of their systems. Our
examinations of their work revealed that they have employed
dense stereo matching methods which considers search of
matching points in the entire input stereo images, thus
increasing the computational load of their systems. One way
to greatly speed-up stereo matching is to not process all pixel
locations of the input images. While the commonly used dense
approaches find a disparity label for almost all pixels in the
reference image -usually the left image- sparse methods like
in [12] and [13], only process a small set of salient image
features. An example for the results received with a sparse
compared to a dense stereo matching method can be found in
Figures 2 (a) and 2 (b).
Fig. 2 (a) Sparse stereo matching results received with the presented method
and (b) dense results received from a belief propagation based algorithm.
Color scale corresponds to the disparity in pixels [13]
The shown sparse example is precisely what we intend to
apply in this research, which only finds disparity labels for a
set of selected corner features. The color that is displayed for
these features corresponds to the magnitude of the found
disparity, with blue hues representing small and red hues
representing large disparity values. The method used for the
dense example is the gradient-based belief propagation
algorithm that was employed by Schauwecker and Klette [14]
and Schauwecker et al. [15]. The results of this algorithm are
dense disparity maps that assign a disparity label to all pixels
in the left input image.
Although sparse methods provide much less information
than common dense approaches, this information can be
sufficient for a set of applications, including UAV trajectory
estimation and obstacle avoidance such as proposed here in
our research.
D. Feature Detection
In computer vision, a feature detector is an algorithm that
selects a set of image points from a given input image. These
points are chosen according to detector-specific saliency
criteria. A good feature detector is expected to always select
the same points when presented with images from the same
scene. This should also be the case if the viewing position is
changed, the camera is rotated or the illumination conditions
are varied. How well a feature detector is able to redetect the
same points is measured as repeatability, for which different
definitions have been postulated by Schmid et al. [16]; and
Gauglitz et al. [17].
Feature detectors are often used in conjunction with feature
descriptors. These methods aim at providing a robust
identification of the detected image features, which facilitates
their recognition in case that they are re-observed. In our case,
we are mainly interested in feature detection and less in
feature description. A discussion of many existing methods in
both fields can be found in the extensive survey published by
Tuytelaars and Mikolajczyk [18]. Furthermore, a thorough
evaluation of several of these methods was published by
Gauglitz et al. [17].
Various existing feature detectors extract image corners.
Corners serve well as image features as they can be easily
identified and their position can generally be located with
good accuracy. Furthermore, image corners can still be
identified as such if the image is rotated, or the scale or scene
illumination are changed. Hence, a reliable corner detector can
provide features with high repeatability.
One less recent but still popular method for corner
detection is the Harris detector (Harris and Stephens, 1988). A
computationally less expensive method for detecting image
corners is the Smallest Univalue Segment Assimilating
Nucleus (SUSAN) detector that was proposed by Smith and
Brady [19].
A more advanced method that is similar to the SUSAN
detector is Features from Accelerated Segment Test (FAST).
One of the most influential methods in this category is the
Scale Invariant Feature Transform (SIFT) by Lowe [20]. For
3. this method, two Gaussian convolutions with different values
for σ are computed for the input image.
A more time-efficient blob detector that was inspired by
SIFT, is Speeded-Up Robust Features (SURF) by Bay et al.
[21]. Instead of using a DoG for detecting feature locations,
Bay et al. rely on the determinant of the Hessian matrix,
which is known from the Hessian-Laplace detector
(Mikolajczyk and Schmid [22]). Both SIFT and SURF exhibit
a very high repeatability, as it has been shown by Gauglitz et
al. [17]. However, what Gauglitz et al. also have demonstrated
is that both methods require significant computation time.
In this research we are going to address this gap as well, by
designing a slightly modified architecture of FAST corner
detection algorithm.
III. MODELLING THE DESIGN
These three key elements of our trajectory estimation
system will be supplemented by other filters, snippets and
SfM modules namely the Simultaneous Localization and
Mapping (SLAM), as depicted in Figure 3.
Fig. 3 Processing pipeline of the proposed FPGA-implementation
The overall architecture will be synthesized on
reconfigurable hardware, consisting of field programmable
gate arrays (FPGAs) [23], [24]. These platforms promise to be
adequate system building block in the building of
sophisticated devices at affordable cost. They offer heavy
parallelism capabilities, considerable gate counts, and comes
in low-power packages [25]–[28].
Based on the existing work limitations, this project is
concerned with an efficient implementation of trajectory
estimation for autonomous navigation of unmanned vehicles,
with a special interest on aerial ones.
Figure 4 shows the anatomy of the projected overall system
to be implemented. It is our aspired target to realize such
architecture and put it into application in different fields of life,
like in aerial imaging, shipping parcels, search &
reconnaissance mission and many more. Moreover, this
project will avail us of a locally-built solution that will not be
bound to foreign royalties or at risk of patent infringements
claims.
Fig. 4 System implementation of the processing at the MAV physical level
IV.CONCLUSION
Much progress has been achieved in the computer vision
community since the interest in sparse stereo matching
declined. In particular, new feature detectors have since been
published, which can be used for the construction of new
sparse stereo methods, and then applied in MAV navigation.
But given the ever increasing need for highly efficient stereo
matching, it seems logical to choose a sparse stereo method.
The lack of current sparse stereo algorithms that fulfill these
performance requirements, while also delivering accurate
matching results, can became the motivation to design a novel
sparse stereo matching system.
Feature Detection Stereo Matching
Local SLAM EKF Sensor Fusion
PoseEstimation
low-level flight
control software
Microprocessor Board
FPGA SoC
PID Controller
Serial
Link
IMU
Pose
Attitude
PIXHAWK Cheetah
Greyscale Cameras
Propellers of the MAV
I2
C
Bus
USB
Port
Quadrotor
(4) Motors
Controller
Baseline = 11cm
Low-Level Process
High-Level Process
4. ACKNOWLEDGMENT
The authors would like to express their appreciation for the
support of the VASYD Lab in FKEE UTHM, and the
MathWorks Inc. for approving the free usage of the
computational software of theirs.
REFERENCES
[1] S. Palmisano, B. Gillam, D. G. Govan, R. S. Allison, and J. M. Harris,
"Stereoscopic perception of real depths at large distances," Journal of
Vision, vol. 10, no. 6, pp. 19–19, Jun. 2010.
[2] R. Klette, Concise Computer Vision, 2014th ed. London: Springer.
[3] D. C. Brown, "Decentering distortion of lenses," Photometric
Engineering, vol. 32, no. 3, pp. 444–462, 1966.
[4] Itseez, "OpenCV," 2015. [Online]. Available: http://opencv.org.
Accessed: Apr. 2, 2016.
[5] J. Y. Bouguet, "Camera Calibration Toolbox for MATLAB," 2013.
[Online]. Available: http://vision.caltech.edu/. Accessed: Mar. 3, 2016.
[6] M. Achtelik, T. Zhang, K. Kuhnlenz, and M. Buss, "Visual tracking
and control of a quadcopter using a stereo camera system and inertial
sensors," IEEE, 2012, pp. 2863–2869.
[7] D. Pebrianti, F. Kendoul, S. Azrad, W. Wang, And K. Nonami,
"Autonomous hovering and landing of a Quad-rotor micro aerial
vehicle by means of on ground stereo vision system," Journal of
System Design and Dynamics, vol. 4, no. 2, pp. 269–284, 2010.
[8] L. R. García Carrillo, A. E. Dzul López, R. Lozano, and C. Pégard,
"Combining stereo vision and inertial navigation system for a Quad-
Rotor UAV," Journal of Intelligent & Robotic Systems, vol. 65, no. 1-4,
pp. 373–387, Aug. 2011.
[9] T. Tomic et al., "Toward a fully autonomous UAV: Research platform
for indoor and outdoor urban search and rescue," IEEE Robotics &
Automation Magazine, vol. 19, no. 3, pp. 46–56, Sep. 2012.
[10] A. Harmat, I. Sharf, and M. Trentini, "Parallel tracking and mapping
with multiple cameras on an unmanned aerial vehicle," in Intelligent
Robotics and Applications. Springer Science + Business Media, 2012,
pp. 421–432.
[11] M. Nieuwenhuisen, D. Droeschel, J. Schneider, D. Holz, T. Läbe, and
S. Behnke, "Multimodal obstacle detection and collision avoidance for
micro aerial vehicles," IEEE, pp. 12–7.
[12] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, "Vision-based
state estimation for autonomous rotorcraft MAVs in complex
environments," IEEE, 2010, pp. 1758–1764.
[13] K. Schauwecker and A. Zell, "On-board dual-stereo-vision for the
navigation of an autonomous MAV," Journal of Intelligent & Robotic
Systems, vol. 74, no. 1-2, pp. 1–16, Oct. 2013.
[14] K. Schauwecker and R. Klette, "A comparative study of two vertical
road Modelling techniques," in Computer Vision – ACCV 2010
Workshops. Springer Science + Business Media, 2011, pp. 174–183.
[15] K. Schauwecker, S. Morales, S. Hermann, and R. Klette, "A
comparative study of stereo-matching algorithms for road-modeling in
the presence of windscreen wipers," IEEE, 2009, pp. 12–7.
[16] C. Schmid, R. Mohr, and C. Bauckhage, International Journal of
Computer Vision, vol. 37, no. 2, pp. 151–172, 2000.
[17] S. Gauglitz, T. Höllerer, and M. Turk, "Evaluation of interest point
detectors and feature Descriptors for visual tracking," International
Journal of Computer Vision, vol. 94, no. 3, pp. 335–360, Mar. 2011.
[18] T. Tuytelaars and K. Mikolajczyk, "Local invariant feature detectors: A
survey," Foundations and Trends® in Computer Graphics and Vision,
vol. 3, no. 3, pp. 177–280, 2007.
[19] S. M. Smith and J. M. Brady, International Journal of Computer
Vision, vol. 23, no. 1, pp. 45–78, 1997.
[20] D. G. Lowe, "Object recognition from local scale-invariant features,"
vol. 2, IEEE, 1999, pp. 1150–2.
[21] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. ―SURF: Speeded
up Robust Features.‖ Lecture Notes in Computer Science. N.p.:
Springer Science + Business Media, 2006. 404–417. Web.
[22] Mikolajczyk, K., and C. Schmid. Indexing based on scale invariant
interest points. IEEE, 2001. Web. 4 Apr. 2016.
[23] P. Dang, ―VLSI architecture for real-time image and video processing
systems," Journal of Real-Time Image Processing, vol. 1, pp. 57–62,
2006.
[24] T. Todman, G. Constantinides, S. Wilton, O. Mencer, W. Luk, and P.
Cheung, ―Reconfigurable computing: architectures and design
methods," Computers and Digital Techniques, IEE Proceedings, Vol.
152, No. 2, pp. 193–207, 2005.
[25] A. Ahmad, B. Krill, A. Amira, and H. Rabah, ―Efficient architectures
for 3D HWT using dynamic partial reconfiguration," Journal of
Systems Architecture, Vol. 56, No. 8, pp. 305–316, 2010.
[26] A. Ahmad, B. Krill, A. Amira, and H. Rabah, ―3D Haar wavelet
transform with dynamic partial reconfiguration for 3D medical image
compression," in Biomedical Circuits and Systems Conference, 2009.
BioCAS 2009. IEEE, 2009, pp. 137–140.
[27] A. Ahmad and A. Amira, ―Efficient reconfigurable architectures for 3D
medical image compression," in Field-Programmable Technology,
2009. FPT 2009. International Conference on, 2009, pp. 472–474.
[28] B. Krill, A. Ahmad, A. Amira, and H. Rabah, ―An efficient FPGA-
based dynamic partial reconfiguration design flow and environment for
image and signal processing IP cores," Signal Processing: Image
Communication, Vol. 25, No. 5, pp. 377–387, 2010.