spkumar-503report-approved

Vision-Based Turf classification for an Autonomous mower
MEng in Robotics and Autonomous Vehicles, Capstone Project Report
Prasanna Kumar Sivakumar
School of Integrative Systems and Design
University of Michigan, Ann Arbor
spkumar@umich.edu
ABSTRACT
This report presents a vision–based turf classification
method for an autonomous lawn mower. The task of
segmenting the scene into turf and non–turf regions is
divided into two phases, data extraction phase and turf
classification phase. During the data extraction phase,
the scene in front of the mower is divided into a grid of
uniformly spaced blocks and color and texture informa-
tion is extracted from the blocks. The color of a block is
represented by average hue and texture by Histogram
of Oriented Gradients (HOG) descriptor. In the turf
classification phase, scene is segmented by classifying
each block as turf or non-turf. Turf classification using
k–means clustering and Support Vector Machines (SVM)
is presented with a detailed study of various parameter
choices on classifier performance.
I. INTRODUCTION
This report presents a prototype lawn mowing robot
capable of detecting non–turf regions and obstacles in
the scene using an off–the–shelf vision sensor. The
objective for the lawn mowing robot is to maintain
turf health and aesthetics of the lawn autonomously
by performing complete coverage of a desired area. To
safely achieve this goal, the system must determine and
maintain containment inside of the permitted area and
also avoid undesired regions and obstacles that may
appear in it’s path. While a wire carrying RF signal
can be used to mark the boundaries of the lawn [1], it
would be difficult to manually demarcate all undesired
regions inside the boundary and impossible to know the
obstacles before hand.
There have been several research prototypes as well
as manufactured products developed for robotic lawn
mowing [1] - [4]. The manufactured products available
in the market make use of a contact sensor to detect a
non–turf object1
after collision or run over it if the height
of object is less than the position of the sensor. None
1
from this point onwards the term ‘non–turf object’ will be used
to refer to both non–turf regions and obstacles
Figure 1: Autonomous lawn mower with monocular
camera used in this work
of the platforms are equipped with sensors to detect the
objects before hand and avoid them. Vision sensors are
an attractive solution to this problem for their potential to
eliminate time consuming installation of boundary wires
and perform sensing functions beyond obstacle detection
such as determining and diagnosing turf problems. A
method for classifying a scene into turf and non–turf
regions using an off–the–shelf vision sensor is presented
in this report. In this work, a robotic mower from John
Deere which is shown in Figure 1 is used.
The task of classifying a scene into turf and non-turf
regions is divided into two phases, data extraction and
turf classification. In the data extraction phase, the color
and texture information in the image captured by the
vision sensor are extracted by treating the image as a
dense grid of uniformly spaced blocks. For representing
color the Average Hue of the block in used and for
representing texture, Histogram of Oriented Gradients
(HOG) descriptor is used. In the classification phase,
the extracted data is passed to a classifier to classify
each block as turf or non–turf. Classification using un-
supervised k–means clustering and supervised Support
Vector Machines (SVM) are presented. The performance
of the two classifiers are compared and it is shown that
an SVM classifier with a Radial Basis Function (RBF)

Capstone Project Report Prasanna Kumar Sivakumar
kernel produces better performance under varied lighting
conditions. A detailed study of various implementation
choices on classifier performance is also presented.
A. Related work
Autonomous lawn mowing robots have been receiving
attention from the mobile robotics community in the
recent years. Containment inside the boundary and accu-
rate positioning on the lawn have been the most explored
aspects of robotic mowing, so far. Yang et al. [6]
present a vision based localization method to estimate
the position and ensure boundary containment using an
omni–directional camera. Smith et al. [4] present a novel
positioning technique using Ultrasonic beacons. Batavia
et al. [5] present an obstacle detection technique for a
robotic mower using color and 2–D range data. While
their system performs fairly well, even detecting objects
objects at a curvature of 15cm, it uses of the expensive
laser range finder.
Segmentation of a scene into turf and non–turf regions
for a robotic mower is the central theme of this re-
port. So, only scene segmentation methods that have
application in real-time mobile robotics are mentioned
in this section. There have been extensive literature on
color based scene segmentation. See [7] for survey.
Bruce et al. [8] present a novel threshold based fast im-
age segmentation technique using connected component
analysis for a RoboCup application [9]. Their algorithm
operates at 30Hz with fairly high accuracy for a well
structured indoor scene under constant lighting. How-
ever they do not discuss the performance on unstructured
outdoor conditions. Browning et al. [10] discuss a real-
time color based illumination invariant scene segmen-
tation technique for an outdoor soccer robot but they
assume that the scene is made up distinctly colored
objects.
There have also been extensive literature in edge based
object detection techniques in the past decade. See [11]
for a survey. The method presented in this report inherits
features from the sliding–window object detector work
of Dalal et al. [12] which presented the use of Histogram
of Oriented Gradients descriptor. Tu et al. [13] describe
an approach for identifying regions in the scene. Their
approach has been shown to be effective only on text and
faces. Sudderth et al. [14] relate scenes, objects and parts
in a single hierarchical framework, but do not provide
an exact segmentation of the image. Gould et al. [15]
present a hierarchical region–based approach to image
segmentation. However, their method cannot segment
the same object under different backgrounds and often
leaves them segmented into multiple dissimilar pieces.
B. Organization
The rest of the report is organized as follows. Section
II provides an overview of the data extraction process.
Section III provides an overview of turf classification.
Section IV analyses the performance of each of the
classifier and effects of different parameters in real–
time classification. Section V provides implementation
details. The report is finally concluded in section VI with
plans for future work in section VII.
II. OVERVIEW OF DATA EXTRACTION
This section gives an overview of the data extraction
phase which is summarized in figure 2. This method
is based on evaluating local histograms of image gra-
dient orientations and hue level in a dense grid. The
basic idea is that two attributes differentiate turf regions
from a non-turf object, color and texture. Grass is pre-
dominantly green in color and a patch of grass has dense
edge information which most non-grass objects do not.
This is implemented by dividing the image into small
NxN spatial regions (“blocks”), for each block accumu-
lating a 9–D HOG descriptor which is a histogram of
gradient orientations, and a 1–D average hue number.
The combined 10–D entry forms the representation of
the block.
The following two subsections describe the steps in-
volved data extraction process in detail.
A. Image pre-processing
The input image passed by the camera is a 640x480
RGB image. In order to increase the classification speed
(images classified per second) without losing texture
information, the image is downsampled to a factor of the
original size. A Gaussian blur is applied before down-
sampling to remove high frequency edges and avoid
Moire pattern from forming in the downsampled image.
The lawn mowing robot is typically used outdoors, often
times under bright (mid–day) or dark (late–evening)
lighting conditions, which decreases the global contrast
of the image captured. In order to compensate for this
effect, contrast normalization scheme is applied to the
image which effectively spreads out the most frequent
intensity value. Contrast normalization is carried out
after Gaussian smoothing and before downsampling. The
resultant image is downsampled to a smaller size.
B. Data Collection
The feature information is extracted by sliding an
NxN window across the image. Each window location
is transformed to greyscale and HSV color spaces.
From the grayscale block, sum of edge magnitudes at
2

Figure 2: Overview of turf classification chain. The input image is first downsampled to half of its original size
and tiled with a grid of uniformly spaced blocks in which average Hue and HOG vectors are extracted. The
concatenated vectors are fed to a classifier (kmeans/SVM) for classification into turf and non–turf.
9 different oerintations (HOG) is calculated and from
the HSV block, average Hue across the window is
calculated and the two are concatenated to get a 10–D
representation of a block.
III. OVERVIEW OF CLASSIFICATION
A. Dataset
Training dataset consists of images of different types
of turf under different lighting conditions and images of
most common non–turf regions and obstacles as shown
in figure 3. The training image dataset is collected such
that a training image representative of grass contains
only grass and no trace of non–grass and similarly
for non–grass. These training images were collected by
driving the robot manually on different turfs under dif-
ferent lighting conditions and manually cropping regions
that are expected to be best representatives of the two
classes. It is important to note that, only the bottom–
half of an image frame was considered for training.
There are two reasons behind this choice. One, due
to saturation, especially under bright lighting condition,
both the color and the texture information are lost in the
part of the image near horizon and samples collected
from those regions might be misleading and two, since
the autonomous lawn mower is a slow mowing robot, it
takes a longer time to reach the area that are near horizon
in the current image frame. A total of 182 images of
different sizes were collected and for a blocksize of
8x8 148,820 labeled samples were extracted without any
overlap.
B. Dimensionality reduction and scaling
Principal Component Analysis (PCA) is performed
on the training data to identify the N most useful edge
orientations. It is important to note that PCA is applied
on the 9–D HOG data and not the 10–D concatenated
data. The reason behind this choice is using 9–D vector
to represent texture and single number to represent color
skews the data away from color and might increase the
false positive rate. For example, dead grass patch which
has the texture of live grass but not the green color
might be classified as grass. In order to prevent this
from happening, the number of dimensions of texture
data is reduced from 10 to 3. Section IV presents
a detailed analysis of effect of number of Principal
Components on False Positive Rate and Classification
Rate (images/second). The resultant data is normalized
to zero mean and unit variance.
C. k–means clustering
k–means clustering is an obvious unsupervised learn-
ing candidate to solve this problem. There is no training
involved in k–means clustering. Just the result of PCA
is used for dimensionality reduction. k-means clustering
(k=2 in this case) tries to group the samples into two
classes iteratively. The problem with this is, even when
the entire scene is turf, k–means clustering tries to
classify the samples (blocks) obtained from the image
into two classes, thus increasing the false negative rate.
In order to solve this problem, calibration and compar-
ison steps are introduced before and after clustering,
respectively.
Calibration is performed at the beginning of every mow
cycle. During the calibration, the lawn mower is posi-
tioned such that the entire scene in front of it is turf and
one frame of image is grabbed. Samples (dimensionality
reduced) obtained from this image are clustered into two
groups and the corresponding cluster centers are saved.
Since both the clusters contain only turf samples, the
mean of the two centers is assumed to be a representative
of turf cluster center for the current mow cycle. The
resulting value is used as initial turf cluster center for the
first frame of the current mow cycle and from then on,
3

(a) (b) (c) (d) (e) (f) (g) (h)
Figure 3: Sample images from the training dataset. (a to d) Sample images of turf under different lighting conditions
(e to h) sample images of most common non-grass objects. During data extraction for training, an NxN window
sliding through each of the images from a to d will be labeled ‘1’ at every location and ‘-1’ when sliding through
each of the images from e to h.
the cluster centers of a frame are used as initial estimates
for the subsequent frame. This reduces the number of
iterations considerably.
Comparison is carried after clustering a frame. If both
the cluster centers are close to the grass cluster center of
the previous frame then the clusters are dissolved and
all the samples are labeled ‘1’ and if both the cluster
centers are far away from the grass cluster center of
the previous frame, then all the samples are labeled ‘-
1’. If only one of the centers is close, the clusters are
preserved. Pseudocode of the comparison step is given
in algorithm 1.
Algorithm 1 Pseudocode for comparison step in k–
means clustering
Input: Clusters and centers of current frame
Cg, Cng, Clustg, Clustng, Turf center of previous
frame Cgprev, threshold
1: if l2norm(Cg − Cgprev) < & l2norm(Cng −
Cgprev) < then
2: Clustg = Clustg ∪ Clustng
3: Clustng = ø
4: else if l2norm(Cg − Cgprev) > & l2norm(Cng −
Cgprev) > then
5: Clustng = Clustng ∪ Clustg
6: Clustg = ø
7: end if
IV. OVERVIEW OF RESULTS
A. Datasets and Evaluation methodology
Testing dataset: The classifiers were tested on six
different datasets. Each dataset was collected by driving
the robot under different lighting condition and consists
of 50 fully labeled images. Most of the labeled images
contain both turf and non-turf regions in them. Figure
4 shows a sample image from each dataset. Five out
of the six datasets were collected on outdoor turf under
varied natural lighting conditions and one was collected
on artificial turf under incandescent lights. The images
were manually labeled using a interactive labeling tool
in MATLAB.
Evaluation Methodology: Performance of a classifier
is quantified by plotting Receiver Operating Character-
istic (ROC) curves [16]. ROC curves plot False Positive
Rates, FPR (1− TrueNeg
FalsePos+TrueNeg ) against True Positive
Rates, TPR ( TruePos
TruePos+FalseNeg ). Lower values of FPR
are better. Another evaluation metric is the Classification
rate, CR which is number of 320x240 resolution images
classified per second by the classifier (images/second
or Hz). Higher classification rates are better but since,
a robotic lawn mower is a slow moving robot2
, classifi-
cation rate of 5Hz is considered good.
B. k–means clustering vs Support Vector Machines
Classifier FPR CR (Hz)
k–means 0.2594 26.6746
SVM
(RBF kernel, σ = 1)
0.1036 8.1748
Table I: False positive rates and classification rates for
the two classifiers
k–means clustering produces 15% more false positives
than SVM but classifies up to 3 times more frames per
second. Since, SVM with RBF kernel classifies at CR
of more than expected 5Hz and has better precision
than k–means clustering, SVM classifier is chosen to be
implemented on the mower. Rest of the section presents
the effects of different parameter choices for the SVM
classifier.
C. Effects of n Principal components (nPC)
PCA is applied to 9–D HOG vector to identify n most
useful HOG orientations and the resulting transformed
n–D vector is concatenated with 1–D color vector and
used for training. Bar graph in figure 5 shows FPR
2
maximum ground speed of the robot - 13 cm/sec
4

(a) (b) (c) (d) (e) (f)
Figure 4: Sample image from each of the six datasets. a to e were collected outdoor on natural turf and f was
collected on astroturf, indoor, under artificial lighting
Figure 5: FPR and classification time (1/CR) for num-
ber of Principal Components (nPC). FPR decreases from
nPC=1 to 3 and increases afterwards
and time taken to classify a frame of image (1/CR)
for increasing number of principal components. Classi-
fication time increases with the number of components.
The FPR decreases from nPC = 1 to 3 and increases
afterwards. This trend with FPR shows the need for a
proper balance between color and texture information to
handle false positives. When nPC is small (nPC < 3)
the training data is skewed towards color and any non–
turf object which has the green color of lawn but not the
texture (eg: green lawn spread) might be classified as
turf, thus increasing FPR. Similarly, when nPC is large
(nPC > 5) the training data is skewed towards texture
and objects like dead grass patch which has the texture
of live grass might be classified as turf. The right balance
between color and texture is reached at nPC = 3 which
has the lowest FPR of 0.1036.
D. Effects kernel choices
Top plot of figure 6 shows the performance of SVM
classifier for various kernel types. The ROC curves were
plotted for number of Principal Components, nPC = 3.
RBF kernel, defined in equation 1, has the lowest FPR
(0.1036) and it performs significantly better than other
Figure 6: (top) ROC curves for various kernel types and
(bottom) ROC curves for various values of σ for RBF
kernel
kernel types.
K(x, x ) = exp(−
x − x 2
2σ2
) (1)
Bottom plot of figure 6 shows the performance of RBF
kernel SVM for various values of the free parameter, σ.
Although not significant, FPR decreases from σ = 0.25
to σ = 1 and increases after that. σ = 1 has the lowest
FPR of 0.1036.
E. Effects of blocksizes
SVM classifier with RBF kernel (σ = 1 and nPC =
3) was trained and tested for various blocksizes without
5

Figure 7: ROC curves for various block sizes used in
data–extraction
overlap. Plot in figure 7 shows the performance of SVM
classifier for different blocksizes. Blocksize of 8x8 has
the lowest FPR of 0.1036.
F. Effect of feature standardization
During training and testing, the feature data extracted
from the images are standardized to zero mean and unit
variance as defined in equation 2. xij is the jth feature
of ith sample and ¯xj and σxj
are the mean and SD for
feature j. Transforming the data to this scale improves
the classifier performance significantly.
xij =
xij − ¯xj
σxj
(2)
For nPC = 3 an unstandardized sample looks like,
< 0.2385, 103.3826, 987.4378, 678.0125 > and SVM
classifier with RBF kernel (σ = 1) has FPR = 0.3216.
After standardizing using equation 2, the same data
looks like, < 0.1543, 0.0178, 0.7621, 0.1675 > and the
same classifier has FPR = 0.1036.
V. IMPLEMENTATION
The vision hardware used is a video camera from
logitech [21]. It has a 1.3 Megapixel CMOS sensor
and captures VGA standard images at a resolution
of 640x480. The camera is mounted on the front of
the robot using a plastic mount (3D printed) which
allows lateral motion and pan tilt. The computer used
in training and implementation runs on a 2Ghz Intel
Pentium Processor with 4GB of RAM. OpenCV [22]
was used for the purpose of data extraction and libSVM
[23] was used in training the SVM classifier.
VI. CONCLUSION
A vision–based turf classification method for an au-
tonomous lawn mower was presented in this report. A
two–phase scene segmentation technique, using color
and texture information, was proposed. During the first,
data–extraction phase, color and texture information
from the image captured by the vision sensor is extracted
by treating the image as a dense grid of uniformly spaced
blocks. During the second, turf classification phase, data
extracted from each block is classified as turf or non–
turf. Performance of two different classifiers, k–means
clustering and Support Vector Machines was presented
with a detailed parameter study for SVM classifier.
A notable finding of the study is the need for proper bal-
ance between the number of color and texture features.
By performing Principal Component Analysis (PCA) on
texture data extracted from training dataset, it was found
that using excessive or lesser number of texture features
results in higher False Positive Rates. It was found that
using 3 features for texture provides the right balance
between texture and color data. Parameter study on the
SVM classifier showed that a RBF kernel with σ = 1
results in the best performance.
VII. FUTURE WORK
Although the current SVM classifier is reasonably
accurate and efficient – processing 320x240 resolution
images at 8Hz, there is still room for improvement and
optimization. One of the glaring omissions of this work
is the removal of Infrared (IR) filter from the camera.
Since chlorophyll in live grass reflects radiations in near
IR region of spectrum [17], removing IR filter would
greatly improve detection of live grass. Using Local
Binary Pattern [18] in combination with HOG descriptor
for feature representation can improve the classification
accuracy. LBP has found to be a powerful feature for
texture classification [19]. Using auto–encoders with
Convolutional Neural Networks [20] can improve the
accuracy and classification time greatly.
ACKNOWLEDGMENTS
This work was funded by Multi-Disciplinary design
Program (MDP) of University of Michigan and sup-
ported by John Deere. In particular, I thank and ac-
knowledge Mr. David Johnson of John Deere Advanced
R&D for his support and valuable inputs, Prof. Matthew
Johnson Roberson for technical guidance throughout the
project and Mr. Daniel Kline for guiding the completion
of the project from a product development standpoint.
6

REFERENCES
[1] ConsumerSearch, “Best robotic mowers”, [Online]. Available:
http://www.consumersearch.com/robotic-lawnmowers/best-
robotic-mowers
[2] R. W. Hicks II and E. L. Hall, “Survey of robot lawn mowers”,
in Proc. SPIE Intelligent Robots and Computer Vision XIX: Al-
gorithms, Techniques, and Active Vision, Oct. 2000, pp. 262269.
[Online]. Available: http://dx.doi.org/10.1117/12.403770
[3] H. Sahin and L. Guvenc, “Household robotics: autonomous de-
vices for vacuuming and lawn mowing [applications of control]”,
IEEE Control Systems, vol. 27, no. 2, pp. 2096, 2007.
[4] A. Smith, H. Chang, and E. Blanchard, “An outdoor high-
accuracy local positioning system for an autonomous robotic
golf greens mower”, in Proc. IEEE International Conference on
Robotics and Automation, May 2012, pp. 26332639.
[5] P. Batavia and S. Singh, “Obstacle Detection in Smooth High-
Curvature Terrain”, Proc. IEEE International Conference on
Robotics and Automation, May 2002, pp. 1283-1291.
[6] J. Yang, s. Chung, S. Hutchinson, D. Johnson, and M. Kise,
“Vision-Based Localization and Mapping for an Autonomous
Mower”, in Proc. IEEE International Conference on Intelligent
Robots and Systems, November 2013, pp. 20182026.
[7] W. Skarbek ans A. Koschan, “Colour Image Segmentation, A
Survey”, Online Library Technical University of Berlin, Oct.
1994. [Online] Available: http://iristown.engr.utk.edu/ koschan/-
paper/coseg.pdf.
[8] J. Bruce, T. Balch and M. Veloso, “Fast and Inexpensive Color
Image Segmentation for Interactive Robots”, in Proc. IEEE
International Conference on Intelligent Robots and Systems,
November 2000, pp. 10181026.
[9] H. Kitano, Y. Kuniyoshi, I. Noda, M. Asada, H. Matsubara, and
E. Osawa. RoboCup: A challenge problem for AI. AI Magazine,
18(1), pages 7385, 1997.
[10] B. Browning and D. Govindaraju, “Fast, Robust Techniques
for colored object Detection in Variable Lighting Conditions”,
in Proc. IEEE International Conference on Robotics and Au-
tomation, May 2004, pp. 16631669.
[11] D. M. Gavrila. “The visual analysis of human movement: A sur-
vey. Computer Vision and Image Understanding”, International
Journal on Computer Vision, vol. 73, pp. 8298, 1999.
[12] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for
Human Detection”, In Proc. Computer Society Conference on
Computer Vision and Pattern Recognition, February 2005, pp.
1063-1071.
[13] Z. Tu, X. Chen, A. L. Yuille and S.-C. Zhu, “Image parsing:
Unifying segmentation’, detection, and recognition”. In Proc.
International Conference on Computer Vision, May 2003, pp.
928-934.
[14] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky, “De-
scribing visual scenes using transformed objects and parts”,
International Journal on Computer Vision, Vol. 87, pp. 423-431,
2007.
[15] S. Gould, T. Gao and D. Koller, “Region-based Segmentation
and Object Detection”, In Proc. International Conference on
Neural Information Processing Systems, December 2009, pp.
1111-1120.
[16] T. Fawcett, “An Introduction to ROC Analysis”. Pattern Recog-
nition Letters 27 (8): 861874. doi:10.1016/j.patrec.2005.10.010
[17] Science Mission Directorate, “Reﬂected Near-
Infrared Waves”, [Online] Mission:Science, Na-
tional Aeronautics and Space Administration.
http://missionscience.nasa.gov/ems/08 nearinfraredwaves.html.
[18] DC. He and L. Wang (1990), ”Texture Unit, Texture Spectrum,
And Texture Analysis”, Geoscience and Remote Sensing, IEEE
Transactions on, vol. 28, pp. 509 - 512.
[19] X. Wang, T. Han and S. Yan, “An HOG-LBP Human Detector
with Partial Occlusion Handling”, In Proc. International Confer-
ence on Computer Vision, May 2009, pp. 456-464.
[20] Bengio, Y, “Learning Deep Architectures for AI”, Foundations
and Trends in Machine Learning, vol.06, December 2009.
[21] Logitech Webcam C110, [online]
http://support.logitech.com/product/8112.
[22] G. Bradski, “OpenCV, Open Source Computer Vision Library”,
Dr. Dobb’s Journal of Software Tools, Volume 4, 2008.
[23] C. Chang and C. Lin, “LIBSVM: Library for support vec-
tor machines”, ACM Transactions on Intelligent Systems and
Technology, Volume 2, Issue 3, pp. 27:1–27:27, 2011. Software
available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
7

spkumar-503report-approved

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (14)

Similar to spkumar-503report-approved

Similar to spkumar-503report-approved (20)

spkumar-503report-approved