1. Design of a Robot Guide/Usher
Project 10
Drexel University | 3141 Chestnut Street, Philadelphia, PA 19104
1
Adviser
M. Ani Hsieh
Design Team
Tyler Aaron
John Burchmore
Ian DeOrio
Jeff Gao
Drexel University
Mechanical Engineering and Mechanics Senior Design
MEM 493
Spring 14
2. 2
Abstract
Several Mechanical Engineering and Mechanics (MEM) research labs are located within the Science
Center, but access is restricted only to ID card holders. As a direct result, visitors and a few lab members
require the assistance of an ID card holder to gain access to the building every visit. While the option of
employing a dedicated ID card usher is plausible, it just isn’t very feasible and the job itself would be
very menial. Autonomous robots with the ability to identify and guide human visitors would solve this
issue. Designed as a case study with broader implications, this project will develop a robotic guide/usher
to perform this task.
3. Table of Contents
Abstract ......................................................................................................................................................... 2
List of Figures ................................................................................................................................................ 6
List of Tables ................................................................................................................................................. 8
Introduction .................................................................................................................................................. 9
Stakeholders and Needs ............................................................................................................................. 10
Problem Statement ..................................................................................................................................... 11
Methodology ............................................................................................................................................... 12
Background ............................................................................................................................................. 12
Perception Subsystem: Asus Xtion Pro ................................................................................................... 13
Mobility Subsystem: iRobot Create ........................................................................................................ 15
Manipulation Subsystem: Keycard ......................................................................................................... 16
Control Architecture Subsystem............................................................................................................. 17
Mapping ............................................................................................................................................. 17
Background .................................................................................................................................... 17
Process ........................................................................................................................................... 18
Localization ......................................................................................................................................... 22
Theory ............................................................................................................................................ 22
Background ................................................................................................................................ 22
Edge/Corner Detection .............................................................................................................. 23
Blob Detection ........................................................................................................................... 24
Translational (XYZ) ......................................................................................................................... 25
Background ................................................................................................................................ 25
Object Detection ....................................................................................................................... 25
Implementation ......................................................................................................................... 27
Bearing ........................................................................................................................................... 29
Overview .................................................................................................................................... 29
Flat Wall Assumption ................................................................................................................. 29
Template Method ...................................................................................................................... 30
Template Method - Processing ................................................................................................. 31
Obstacle Detection ............................................................................................................................. 36
Process ........................................................................................................................................... 36
Obstacle Avoidance ............................................................................................................................ 39
Theory ............................................................................................................................................ 39
Execution........................................................................................................................................ 40
Elevator Navigation ............................................................................................................................ 42
3
4. Process ........................................................................................................................................... 42
Exit Elevator ................................................................................................................................... 45
Subject Identification ......................................................................................................................... 46
Process Explanation ....................................................................................................................... 46
QR Code System ............................................................................................................................. 47
Code Decryption ............................................................................................................................ 48
Design Overview ..................................................................................................................................... 49
Codes and Standards .......................................................................................................................... 49
Model Concept ................................................................................................................................... 50
Design............................................................................................................................................. 50
Physics ............................................................................................................................................ 52
Parts List ......................................................................................................................................... 54
Simulation / Experimental Validation Plan ................................................................................................. 55
Description of the Test Bed .................................................................................................................... 55
Bearing Localization ........................................................................................................................... 55
Translational XYZ Localization ............................................................................................................ 56
Subject Identification ......................................................................................................................... 56
Elevator .............................................................................................................................................. 57
Object Avoidance/Detection .............................................................................................................. 57
Validation Plan ........................................................................................................................................ 57
Bearing ............................................................................................................................................... 57
Translational XYZ ................................................................................................................................ 58
Subject Identification ......................................................................................................................... 59
Elevator .............................................................................................................................................. 60
Object Avoidance/Detection .............................................................................................................. 60
Validation Data ....................................................................................................................................... 60
Rotational Localization ....................................................................................................................... 60
Translational XYZ ................................................................................................................................ 61
Guest Recognition .............................................................................................................................. 62
Elevator .............................................................................................................................................. 63
Obstacle Avoidance ............................................................................................................................ 63
Context and Impact ..................................................................................................................................... 65
Economic Analysis .................................................................................................................................. 65
Environmental Analysis .......................................................................................................................... 66
Social Impact Analysis ............................................................................................................................. 66
Ethical Analysis ....................................................................................................................................... 67
Discussion/Future Work ............................................................................................................................. 68
4
6. 6
List of Figures
Figure 1 - OpenNI SDK Architecture [3] ...................................................................................................... 13
Figure 2 - Occupancy Grid with obstacles [8] ............................................................................................. 18
Figure 3 - Occupancy grid mapping code .................................................................................................... 19
Figure 4 - Sample hallway map ................................................................................................................... 19
Figure 5 - SetLoc code ................................................................................................................................. 20
Figure 6 - SetGoal code ............................................................................................................................... 20
Figure 7 - AdjustLoc code ............................................................................................................................ 21
Figure 8 - FindDel code ............................................................................................................................... 21
Figure 9 – Microsoft Kinect Subcomponents .............................................................................................. 22
Figure 10 – ASUS Xtion Pro Specifications under OpenNI .......................................................................... 22
Figure 11 - Harris Feature Extraction .......................................................................................................... 23
Figure 12 - SURF Feature Extraction ........................................................................................................... 24
Figure 13 - Sample Code ............................................................................................................................. 25
Figure 14 - Matched Points (Including Outliers) ......................................................................................... 26
Figure 15 - Matched Points (Excluding Outliers)......................................................................................... 26
Figure 16. Localization "checkpoints" ......................................................................................................... 27
Figure 17 - Overlayed images of a template versus current ....................................................................... 28
Figure 18 - Flat wall assumption ................................................................................................................. 29
Figure 19 - Unknown pose .......................................................................................................................... 30
Figure 20 - Example of a template captured by the Xtion Pro.................................................................... 31
Figure 21 - Code application of the functions ............................................................................................. 32
Figure 22 - Example of Harris features extracted from an image ............................................................... 32
Figure 23 - Code application of the functions ............................................................................................. 33
Figure 24 - Result of identifying the Harris features and finding matching features in the images ........... 34
Figure 25 - Visual representation of the relative pixel displacements used to find the function .............. 34
Figure 26 - Mapped Depth Map of a Hallway ............................................................................................. 36
Figure 27 - Respective Image Capture of Depth Capture ........................................................................... 36
Figure 28 - Ground Plane Extraction ........................................................................................................... 37
Figure 29 - Obstacle Extraction ................................................................................................................... 38
Figure 30 - Colored Map with Obstacles ..................................................................................................... 38
Figure 31 - Example of valid and invalid path from grid pathing [16] ........................................................ 39
Figure 32 - Vector field analysis and path of movement from Potential Field method ............................. 40
Figure 33 - Image Acquisition ..................................................................................................................... 42
Figure 34 - Elevator Panel Template ........................................................................................................... 43
Figure 35 - Template Matching with the Captured Image .......................................................................... 43
7. Figure 36 - Cropping Matrix ........................................................................................................................ 44
Figure 37 - First Floor Crop ......................................................................................................................... 44
Figure 38 - Black and White Conversion ..................................................................................................... 44
Figure 39 - NNZ Function ............................................................................................................................ 44
Figure 40 - Standard QR Code ..................................................................................................................... 46
Figure 41 - (a) 10 character code vs (b) 100 character code ...................................................................... 47
Figure 42 – a) 3D Creo Model of Robot System b) Final Proof of Concept Design ................................... 50
Figure 43 – a) Creo Model of Housing b) Physical Model of Housing ...................................................... 51
Figure 44 – a) Close up View of Web Camera b) Close up View of Xtion Pro and platform ..................... 52
Figure 45 – a) Creo Model Close up of housing spacers b) Physical Model Close up of Housing Spacers
.................................................................................................................................................................... 52
Figure 46 - iRobot Create Coordinate System ............................................................................................ 53
Figure 47 - Printed Degree Chart for Validation ......................................................................................... 58
Figure 48 - Robot Localization Testing ........................................................................................................ 59
Figure 49 - Object Avoidance Example ....................................................................................................... 60
Figure 50 - Outline of facial recognition system [24] .................................................................................. 70
Figure 51 - Rotating Elevator Bank Views ................................................................................................... 74
Figure 52 - Flow Chart – Elevator to Lobby and Back ................................................................................. 88
Figure 53 - Flow Chart - Elevator to Room .................................................................................................. 89
Figure 54 - Fall Term Gantt Chart ................................................................................................................ 90
Figure 55 – Winter Term Gantt Chart ......................................................................................................... 91
Figure 56 - Detail Model Drawing ............................................................................................................... 92
7
8. 8
List of Tables
Table 1 - Project Needs ............................................................................................................................... 10
Table 2 - Database Example ........................................................................................................................ 47
Table 3 - Mass Properties of Model Assembly ............................................................................................ 53
Table 4 - Final Parts List .............................................................................................................................. 54
Table 5 - Overall Rotational Localization Data ............................................................................................ 61
Table 6 - Overall Z Depth localization Data ................................................................................................. 61
Table 7 - Overall X Displacement Localization Data .................................................................................... 62
Table 8 - Overall Guest Recognition Data ................................................................................................... 63
Table 9 - Overall Object Avoidance Data .................................................................................................... 63
Table 10 - Parts List for Assembly ............................................................................................................... 80
Table 11 - Budget Table .............................................................................................................................. 81
Table 12 - Decision Matrix .......................................................................................................................... 82
Table 13 - Project Needs (1 of 2) ................................................................................................................. 82
Table 14 - Project Needs (2 of 2) ................................................................................................................. 83
Table 15 - Specifications and Metrics (1 of 2) ............................................................................................. 85
Table 16 - Specifications and Metrics (2 of 2) ............................................................................................. 86
Table 17 - Event Breakdown ....................................................................................................................... 86
Table 18 - Complete Z Depth Localization Trial Data .................................................................................. 93
Table 19 - Complete X Displacement Trial Data (1/2)................................................................................. 94
Table 20: Complete X Displacement Trial Data (2/2) .................................................................................. 94
Table 21 - Complete X Displacement Trial Data (2/2)................................................................................. 95
Table 22 - Complete Rotational Localization Trial Data (1/4) ..................................................................... 96
Table 23 - Complete Rotational Localization Trial Data (2/4) ..................................................................... 97
Table 24 - Complete Rotational Localization Trial Data (3/4) ..................................................................... 98
Table 25 - Complete Rotational Localization Trial Data (4/4) ..................................................................... 99
Table 26 - Complete Obstacle Avoidance Trial Data ................................................................................... 99
Table 27 - Complete QR Code Reader Trial Data ...................................................................................... 100
9. 9
Introduction
Many movies today regarding the future feature robots living, working, and interacting with humans on
a daily basis. Often the programming and design of some of these robots have reached the level of near
complete autonomy, as seen in the movie “iRobot”. Autonomous mobile robots are developed to
perform tasks with little to no help from humans. This means that a robot will have to be able to
complete tasks given to it without becoming stuck either physically or in a programming loop. The use of
these robots has the potential to simplify and improve the quality of life for everyone on the planet.
These types of robots are not just a far off dream. Current work in robotics is progressing to the point
that robots play active roles in the current workforce. Robots are being used for manufacturing,
healthcare, and service industries with plans for application in military and space programs [1]. Although
the robots of today aren’t what one would expect when the term “robot” is used, the tasks performed
are similar.
There have been a growing number of instances where robots have been used to interact with humans
over the years. The Smithsonian Museum of American History installed MINERVA mobile robots that
acted as tour guides for certain exhibits. MINERVA was designed to interact with groups of guests and
provide information on the current exhibits [2]. California hospitals are purchasing RP-VITA telepresence
robots for doctors to be able to check on patients from home. The doctor connects to the robots
monitor and can then automatically drive to rooms or take manual control of to get closer views.
Amazon’s shipping warehouses utilize Kiva robots to transport shelves and boxes at the request of a
human worker. The human will be at a command terminal giving prompts to the robot while they
perform the tasks. All these robots have demonstrated the continuing development of the robotics field.
A problem with some of the current market robots is the minimal human interaction with the robots
while they carry out their tasks. The Kiva robots work very well in factory settings, however human
interaction is kept to a command terminal basis. The telepresence robots are good for long distance
communication of doctors and patients however many accounts say that it took some time to get used
to having their doctor just be a screen in front of them. Additionally a nurse or assistant practitioner has
to follow these robots in case extra interaction with the patients is needed. The MINERVA tour guides
showed that robots could adapt and show a general understanding of simple emotion in a group. This
robot did work well, but the outward design was on the functional side and not aesthetically pleasing.
Now that the technology allowing robots to perform simple tasks is growing, development on the
human interaction side is needed to make them more compatible in normal society. Responding to
voice commands instead of computer prompts could be the precursor to simple speech. The design of
an autonomous robotic guide in a building can clearly exemplify the steps necessary for successful
human robot interaction.
10. 10
Stakeholders and Needs
Needs are created from stakeholder input and are used to guide the project in the correct direction. The
current stakeholders include Science Center MEM Lab members, visitors and building management,
however as the project progresses, more stakeholders may become involved. Table 1 below shows four
of the overarching needs relevant to the project.
Table 1 - Project Needs
Design Category Need
S1 Safety Potential for injuries during normal operation mitigated
Op1 Operation Minimize need for human input
P1 Programming Robust guest detection
P2 Programming Modular software components
The needs listed above are all important because they are key driving factors that shape the design.
Safety is of the utmost concern whenever humans interact with robots, with is addressed by S1. The
goal of the project is to create an autonomous system, therefore minimizing the need for superfluous
human input, which is represented by OP1. Guest verification is a need expressed directly by the
stakeholders. P2 is a need expressed by team members to introduce modularity and increase operating
efficiency. A full list of needs can be found at Table 13 and Table 14 in Appendix A.
11. 11
Problem Statement
Certain secure buildings have workers take their minds off of their work to retrieve/sign in guests that
arrive. This wastes precious time and effort that the workers could be putting towards finishing up
projects. Also, many secure buildings now have multiple security checkpoints that further slow the
process of picking up guests. If this task could be given to a robot instead, company workers could
continue working until the guests are delivered to the designated area.
The core mission of this senior design project is to design and build a proof of concept for a fully
autonomous robotic guide/usher. The robot will have to be able to work its way through hallways,
elevators, and lobbies while avoiding stationary obstacles and variably moving objects. Elevator controls
will have to be operable through some means. The robot will also have to be able to recognize the
appropriate guests that will be delivered the specific destination. There could be two or three different
groups of guests waiting to be picked up for different departments in said building. Once delivery of
guests occurs, the robot should move to a designated wait location for its next command.
12. 12
Methodology
Background
The robotic guide/usher in question will be an autonomous robot; comprised of perception, mobility,
manipulation, and control architecture systems. The perception subsystem is used to distinguish and
extract information from within the robot’s “visual” range. The mobility subsystem controls the robot’s
locomotion in terms of heading and velocity. The manipulation subsystem in this particular case consists
of the keycard system. The control architecture subsystem would be used as the central management
and control console with regards to mobility and guest verification. These four subsystems in
combination with each other form the structure that the autonomous robot relies on to complete the
tasks assigned to it.
13. Perception Subsystem: Asus Xtion Pro
Modern day machine vision systems employ multiple vision sensors working concurrently to capture
depth, image, and audio for post-processing and general interpretation. The most basic suite of sensors
utilized in order to enable vision for autonomous systems typically include a depth camera and a RGB
(Red/Green/Blue) camera. Depth cameras come mainly in two different variants. Time-of-flight (ToF)
depth cameras are based on utilizing the phase difference between the emitted and reflected IR signal
in order to calculate the depth of a targeted object. Structured pattern depth cameras emit a structured
IR pattern onto target objects and then utilize triangulation algorithms to calculate depth. Standard RGB
cameras are used to provide color vision of the environment. Industrial grade depth cameras are often
very expensive and cost anywhere from $3000 to $9000. RGB CMOS color cameras are an established
and mature technology and therefore inexpensive. The integration of the Xtion Pro sensor to the system
provides a cheap and established solution as a complete vision package. The Asus Xtion Pro is a motion
sensing input device for use with the Windows PCs and is built from the same hardware as the Microsoft
Kinect. From a developmental standpoint, utilizing the Asus Xtion Pro as the system’s vision suite is
extremely cost efficient due to the fact that the Asus Xtion Pro contains an RGB camera, depth sensor,
and a multi-array microphone all within one package.
The integration of the Asus Xtion Pro is a complex and multilayered issue that involves a multitude of
techniques, and modern engineering tools for successful integration. MATLAB is the desired Integrated
Development Environment (IDE) because of staff and team familiarity. For integration of the Xtion Pro
vision system, several software components need to be put in place to facilitate accurate data capture.
Figure 1 - OpenNI SDK Architecture [3]
13
14. In Figure 1 above, the Xtion Pro sensor is represented by the “Depth Physical Acquisition”. This step of
the Software Development Kit (SDK) architecture constitutes the physical sensor’s ability to capture raw
data. Next, the PrimeSense SoC operates the underlying hardware by performing functions such as
depth acquisition dedicated calculations, matching depth and RGB images, down sampling, and various
other operations. Then, the OpenNI framework takes over as the most popular open source SDK for use
in the development of 3D sensing middleware libraries and applications. From there, a C++ wrapper was
found which would allow the use of MATLAB as the primary IDE.
The proper method for creating a data processing system for the Xtion Pro involves numerous steps. A
Kinect MatLAB C++ wrapper was developed by Dirk-Jan Kroon from Focal Machine Vision en Optical
Systems in January 31st, 2011. [4] This particular MATLAB C++ wrapper is utilized alongside OpenNI
2.2.0, NITE 2.2.0, Microsoft Kinect SDK v1.7, and Microsoft Visual C++ compiler to create the functional
data processing system.
14
15. Mobility Subsystem: iRobot Create
An important part of a mobile autonomous robot is the components that allow for locomotion. This
motion system changes the system’s velocity and trajectory via a closed-loop control. There are two
specific types of motion systems that need to be considered. Limb based locomotion and wheel based
locomotion are the two most common types of motion systems.
The iRobot Corporation, the makers of the iRobot Roomba, created the iRobot Create Programmable
Robot as a mobile robot platform for educators, students, and developers as a part of their contribution
and commitment to the Science, Technology, Engineering, and Math (STEM) education program.
Utilizing the iRobot Create as the motion system is the most straightforward cost efficient solution. The
iRobot Create features three wheels, a designated cargo bay, 6-32 mounting cavities, an omnidirectional
IR receiver, sensors, a rechargeable 3000mAh battery, and a serial port for communication. By using this
platform, the overall focus of the project can be turned toward developing new functionalities between
the iRobot Create and the Asus Xtion Pro without having to worry about mechanical robustness or low-level
15
control.
The iRobot Create uses an Open Interface (OI) comprised of an electronic interface as well as a software
interface for programmatically controlling the Create’s behavior and data collection capabilities. The
Create communicates at 57600 baud via a numeric command system. For example, the command code
that commands the Create to drive in reverse at a velocity of -200mm/s while turning at a radius of
500mm is [137] [255] [56] [1] [244] [5]. It becomes clear that this command code structure is unintuitive
and unwieldy. In order to resolve and simplify this issue, MATLAB can be used to better facilitate
communication between a laptop and the iRobot Create via a RS-232 to USB converter. MATLAB is the
desired Integrated Development Environment (IDE) due to staff and team familiarity. Therefore,
MATLAB will be used to issue commands to alter the robot’s heading and velocity. The MATLAB Toolbox
for the iRobot Create (MTIC) replaces the native low-level numerical drive commands embedded within
the iRobot Create with high level MATLAB command functions that act as a “wrapper” between MATLAB
and the Create [6].
16. Manipulation Subsystem: Keycard
The Science Center building is a secure facility. Therefore in order to operate the elevators, a keycard
must be swiped over a reader to grant access. A human being will always be accompanying the robot
when it enters the elevator, eliminating the need for a complex swiping mechanism. Instead, the robot
will utilize a heavy duty retractable key reel system. When entering the elevator, the robot will prompt
the human via an audio cue, to swipe and press the corresponding elevator button. The human
accompanying the robot will be able to pull the card to the proper location to swipe, then release the
card, which will retract back into its starting location. In order to prevent possible theft, the reel will use
heavy duty Kevlar wire fixed to robot. This design be the fastest method to swipe, and will prevent
delays for any other people in the elevator.
16
17. Control Architecture Subsystem
17
Mapping
Background
Mapping is a fundamental issue for a mobile robot moving around in space. A map is relied on for
localization, path planning, and obstacle avoidance. A robot needs a map when performing localization
in order to give itself a location on that map. Otherwise, the robot could be at any location in an infinite
space. According to DeSouza and Kak’s article on vision-based robot navigation systems, three broad
groups in which indoor vision-based robot navigation systems can be categorized into are map-based
navigation, map-building-based navigation, and map-less navigation. Map-based navigation systems are
reliant on geometric modeling to element modeling of the environment, which the navigation system
can rely on. Map-building navigation systems utilize the onboard sensors to actively generate the
modeling of the environment for use with active navigation. Map-less navigation systems are typically
more sophisticated systems based on recognizing objects found in the environment in order to orient
and continue along their path.
Several challenges arise when mapping. To start, the area of all possible maps is infinitely large. Even
when using a simple grid there can be a multitude of different variables. This can lead to very large and
complex maps. Noise in the robot’s sensors also poses a problem. As a robot is navigating around errors
accumulate, which can make it difficult to obtain an accurate position. Also, according to the
Probabilistic Robotics textbook [7], when different places look very similar, for instance in an office
hallway, it can be very difficult to differentiate between places that have already been traveled at
different points in time, otherwise known as perceptual ambiguity.
Challenges aside, depending on the type of navigation system chosen, the complexity and sophistication
of sensors needed varies. 3-dimensional mapping can be more complicated and is not always needed
depending on the application. Instead a 2-dimensional map or occupancy grid can be created. “The basic
idea of the occupancy grids is to represent the map as a field of random variables, arranged in an evenly
spaced grid [7]“. The random variables are binary and show if each specific location is occupied or
empty.
The main equation that makes up occupancy grid mapping calculates the posterior over maps with
available data, as seen in Equation 1.
푝(푚 |푧1:푡 , 푥1:푡 )
Equation 1
18. Here, m is the overall map, z1:t is the set of all measurements taken up to time t, and x1:t is the set of all
the robots poses over time t. The set of all poses make up the actual path the robot takes.
The occupancy grid map, m, is broken up into a finite number of grid cells, mi as seen in Equation 2.
푚 = Σ푚푖
푖
Equation 2
Each grid cell has a probability of occupation assigned to it: p(mi=1) or p(mi=0). A value of 1 corresponds
to occupied, and 0 corresponds to free space. When the robot knows its pose and location on the
occupancy grid, it is able to navigate due to the fact that it knows where obstacles are since they are
marked as a 1. Figure 2 shows a basic occupancy grid with a robot located at the red point, the white
points are empty space, and the black spaces are walls or obstacles.
Figure 2 - Occupancy Grid with obstacles [8]
The obstacles on the occupancy grid are considered to be static only. These obstacles are considered to
be stationary for the particular iteration that they are being used. Dynamic obstacles are considered to
be within this classification; however the robustness of the algorithm to calculate the optimal path
around these obstacles needs further investigation.
Process
In order to implement the 2D occupancy grid into MATLAB, a simple 2D array is needed. As an example a
simple straight hallway can be mapped. The empty space is made up of a unit 8 variable type array of
0’s, while the walls are made up of an array of 1’s. The map, m, is comprised of both the space and the
walls. Example code for a simple map can be seen in Figure 3.
18
19. Figure 3 - Occupancy grid mapping code
In this basic code, an empty hallway is mapped using the basis of a 2D occupancy grid as described
previously. This hallway is about 5.5 feet (170cm) by about 10 feet (310cm). Here the walls are 15cm
thick, which was just chosen to give the walls some thickness. The unit8 variable type is used to
minimize memory usage while running the code. This is important since a much larger occupancy grid
will have many more data points. A basic example of what the occupancy grid of zeros and ones would
look like for a straight hallway can be seen in Figure 4.
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
Figure 4 - Sample hallway map
Now that the basic example of the occupancy grid is defined, further steps can be taken to convert a
fully 3D space into a functional 2D occupancy grid. This occupancy grid can be used to keep track of
goals, positions, and obstacles. In order to clean up the syntax for map operations, several functions
must be defined.
The first function is the setLoc.m function. The code for this is shown on the next page.
19
20. Figure 5 - SetLoc code
The SetLoc function accepts three inputs and outputs the modified m variable. The first input is the
initial map variable, the second input is the value of the y location in centimeters, and finally the last
input is the value of the x location in centimeters. Next, the function locations the last known x and y
position of the robot and resets this location to 0. And finally, it uses the two input location variables to
set the new position variable before updating the m output variable. This function allows for efficient
and clean redefinition of the robot’s position on the map.
The next function is the setGoal.m function. The code for this particular function is shown below.
Figure 6 - SetGoal code
The SetGoal function accepts three inputs and outputs the modified m variable. The first input is the
initial map variable, the second input is the value of the y goal location in centimeters, and finally the
last input is the value of the x goal location in centimeters. Similar to the SetLoc function, the function’s
next step is to locate the previous goal location and to reset this location’s occupancy grid value to 0.
And finally, it redefines the new goal position’s occupancy grid value to 3. This designates the goal.
20
21. The next function is the AdjustLoc function. The code for this particular function is shown below.
Figure 7 - AdjustLoc code
The AdjustLoc function again accepts three inputs and outputs the modified m variable. The first input is
the initial map variable to be adjusted, the second input is the centimeter value of adjustment of the y
position, and the third input is the centimeter value of adjustment of the x position. Similar to the
SetLoc function, the function’s next step is to locate the previous goal location and then to reset the
previous location’s occupancy grid value to 0. Finally, it calculates the new position by summing the old
position with the adjustments specified in the inputs. This allows for adjustment of the position of the
robot due to localization algorithms that will be defined later on.
The last function for performing operations on the map is the FindDel.m function. The code for this
particular function is shown below.
Figure 8 - FindDel code
The FindDel function simply calculates the distance in Y and X between the goal and the current position
of the robot. It then converts this distance from centimeters to millimeters so that the input to the
Roomba is simplified.
21
22. 22
Localization
Theory
Background
“Mobile robot localization is the problem of determining the pose of a robot relative to a given map of
the environment [9]”. Otherwise known as position tracking or position estimation, the problem of
determining the position and pose of a robot is a fundamental problem that needs to be solved before
any level of autonomy can be claimed. When given a map of the surrounding environment, the robot
must be able to traverse the environment to the desired goal. However, without accurate and timely
position data, the robot can quickly accumulate error leading to undesirable actions.
However, there are many steps before successful self-localization can be accomplished. Because the
Asus Xtion Pro has been selected as the platform from which the data from the surrounding
environment will be collected, there are immediate parameters and values from which things can be
worked around are needed.
Figure 9 – Microsoft Kinect Subcomponents
Figure 10 – ASUS Xtion Pro Specifications under OpenNI
23. Utilizing the IR Depth Sensor, the IR Emitter, and the Color Sensor along with their corresponding
components, the data captured by the various sensors can be used. It is important to mention that both
position and pose need to be approximated using the information derived from the sensors. As such, the
next two sections will detail the various techniques used to calculate this information with respect to
the global coordinate system. However, before position and pose can be solved for, the basic algorithms
that facilitate this capability need to be detailed first. It is important to note that the ASUS Xtion Pro is
the developer’s version of the Microsoft Kinect. It is a compact version with the exact same hardware.
Edge/Corner Detection
One of the feature extraction methods utilized is the corner detection algorithm named Harris &
Stephens’s method. This particular algorithm is a further improvement on Moravec’s corner detection
algorithm introduced in 1977. Moravec’s algorithm operates on detecting the intensity contrast from
pixel to pixel within a black and white image. By taking the sum of squared differences (SSD) of the
intensity from pixel to pixel, patches with a large SSD denote regions of interest whereas patches with a
small SSD denote regions of minimal interest. Harris & Stephens take this concept a step further by
taking the differential of the intensity score with respect to direction into account. The general form of
this weighted SSD approach can be found in Equation 3, where an image region (퐼) of area (푢, 푣) is
shifted by (푥, 푦), a value representing the weighted SSD is denoted by 푆(푥, 푦).
푆(푥, 푦) = ΣΣ푤(푢, 푣)(퐼(푢 + 푥, 푣 + 푦) − 퐼(푢, 푣))2
23
푢 푣
Equation 3
Harris & Stephens method’s implementation is built into MATLAB’s Computer Vision Toolbox and the
results are shown in Figure 11.
Figure 11 - Harris Feature Extraction
24. Blob Detection
Blob detection methods are algorithms that are used in computer vision systems to detect and
subsequently describe local features in images. These detection algorithms are very complex ways of
detecting points of interest. These points of interest are then filtered for stability and reliability based on
their various properties. A robust local feature detector named SURF was first presented by Herbert Bay
in 2006. SURF (Speeded-Up Robust Features) is a feature extraction algorithm that was inspired by SIFT
(Scale-invariant feature transform). Although SURF is the algorithm that ultimately was utilized, it is
important to describe the concept behind SIFT which SIFT uses to extract important features from an
image. SIFT is an algorithm published by David Lowe. Lowe's method is based on the principle that
distinct features can be found in areas of an image that are located on high contrast regions known as
an edge. However, beyond edge detection the SIFT method also accounts for the relative positions of
these detected points of interest. The SURF method’s implementation is built into MATLAB’s Computer
Vision Toolbox and the results are shown in Figure 12. [10] [11] [12] [13] [9]
Figure 12 - SURF Feature Extraction
24
25. Translational (XYZ)
Background
Self-localization is the act of determining the position and pose of a mobile body relative to a reference
map. Self-localization is a very important component in almost every function of an autonomous robot.
As previously mentioned, both the position and the pose of a mobile body are important pieces of
information to track. For ideal organizational division of labor, this section will detail the methods in
which the problem of self-localization with respect to placement is solved.
By utilizing the Asus Xtion Pro, measurements of the local environment can be made and used toward
solving this problem. However, pure measurements without any distinctive properties do not serve
much use when a translation or rotation relative to the map is performed. As such, effort must be taken
to withdraw distinctive features of the surrounding environment. A multitude of computer vision
systems rely on corner detection or feature extraction algorithms to detect various points of interest
that can be associated to the environment rather than just an individual image.
Object Detection
Either of the aforementioned feature extraction methods can be used to quickly ascertain the similarity
of each image based their features. Once the strongest feature points from each image are gathered,
they are compared with each other using the following code.
Figure 13 - Sample Code
The strongest feature points from each image are matched for similarities. The feature points with
matching distinct identical features can be plotted. An example of this can be seen in Figure 14.
25
26. Figure 14 - Matched Points (Including Outliers)
The next step is to eliminate any erroneous data points from misaligning or skewing of the object by
conducting a geometric transformation relating the matched points. The final result is shown below.
Figure 15 - Matched Points (Excluding Outliers)
Once the erroneous data points have been eliminated, the pixel location of the object in the scene can
be approximated and finally data can be taken. By using OpenNI1, the exact pixel location can be used
calculate the x, y, and z coordinates of the object. The OpenNI framework utilizes two different
coordinate systems to specify depth. The first coordinate system is called Depth. This coordinate system
is the native coordinate system as the X, and Y values represent the pixel location relative to the frame
of the camera. The Z value in the depth coordinate system represents the depth between the camera
plane and the object. Using a few functions, OpenNI allows for the ability to quickly translate the Depth
coordinate system to the Real World coordinate system. Here, the x, y, and z values represent the more
familiar 3D Cartesian coordinate system. Where the camera lens is the origin, and the x, y, and z values
represent the distance in those dimension the object is away from the origin. With the knowledge that
these XYZ real world coordinates are from the camera’s location, the relative location of the camera in
accordance with the global map can be calculated. If the object’s placement is previously given in the
global map; the raw difference in X and Z can be used to calculate the actual location of the camera
compared with the approximate location. [10] [11] [13]
26
27. Implementation
This method is not infallible by any means; the object matching utilizing a template requires a very high
resolution camera. While the Xtion Pro has a decent RGB camera, the effective range of the RGB camera
is about one meter. Therefore, when selecting landmarks for localization, care must be taken to select
large landmarks with distinctive features that are unique to that specific landmark. For example, the use
of a door would prove to be inadequate due to the fact that a door would typically not have distinctive
features when compared with another door in the environment.
Because of the aforementioned difficulties, the usage of QR Codes alongside the concept of localization
“checkpoints” were used to perform positional localization with the degree of accuracy that is desired.
Figure 16. Localization "checkpoints"
These QR Codes serve two different purposes. By utilizing unique QR Codes at every checkpoint, the
difficulty of distinguishing different checkpoints from one another is eliminated. The next goal of using
QR Codes is to provide a distinct object from which measurements can be made.
The localization file used to obtain a fix on the relative position of the robot with respect to the map is
named Appendix. This localization subroutine enables the ability for the robot to obtain an accurate x
and y dimension reading with respect to the map. Because the QR codes are strategically placed ninety
degrees to the ideal path toward the goal, the x position of the robot can be calculated and corrected
simply by measuring the depth of the wall that the QR code is posted on. If an x position of 80cm is
desired, the localization subroutine will simply iterate the logic until a depth of 80cm is recorded. The
27
28. next objective of the localization subroutine is to obtain the y position of the robot with respect to the
map. Before navigation can be conducted, the system must have a template image associated to that
specific QR code and localization checkpoint to use as a reference.
Figure 17 - Overlayed images of a template versus current
The image above is an example of utilizing the template image to compare the current position of the
robot with the ideal position of the robot. The red circles represent feature points detected by the SURF
method. These are representative of the feature points that will be tracked. Next, the green plus marks
are those same feature points but at a different location. The translation can be calculated directly. This
allows for calculation of the y position by comparing the template image and the current image and
leveraging the QR code as a point of reference. Templates are created using Appendix Code.
28
29. Bearing
Overview
The localization subroutine used to determine the pose of the robot has undergone a multitude of
changes over the course of the project. The two main methods that emerged from the development
phase that became the two primary methods of determining pose. The first method utilized a template
matching concept in order to match the exact viewing angle that the template was created within. The
next method involved using simple trigonometry as well as the fact that each localization checkpoint
directly faces a large flat wall.
Flat Wall Assumption
The simplest and most consistent method of determining pose involves assuming that every localization
checkpoint can be placed within a section of the wall that has a wide and flat surface.
Figure 18 - Flat wall assumption
The green block represents the wall, the white space represents the open space, the blue grid
represents the map grid, the red circle represents the robot, the arrow indicates our current
heading/pose, and finally the dotted line represents the plane from which the Xtion Pro takes
measurements. Figure 18 assumes that our heading is exactly facing the wall, and the dotted line which
represents the measurement plane and the wall are exactly parallel. However, this assumption cannot
be immediately made with some initial calculations and measurements.
Let the pose scenario for the robot be presented in the figure below. The measurement plane is skewed
an unknown angle, and the pose of the robot is now unknown. As such, the correction angle is also
unknown. The correction angle can actually easily be solved for with some simple trigonometry.
29
30. Figure 19 - Unknown pose
Measurements are taken from the measurement angle for depth directly to the green wall. If the robot’s
measurement plane is truly parallel to the wall, both measurements will be equal. Otherwise, the
difference between the two measurements can be utilized to calculate the correction angle needed to
adjust the pose to the desired state.
ΔY
Δ푋
푡푎푛−1 (
30
) = 휃
Equation 4. Correction Angle
The code listed in Appendix uses this exact methodology to calculate for the correction angle if needed.
Template Method
In order to properly localize itself, the system must have a template image to use as a reference. This
reference image should be a photo taken from a position where the robot would be ideally lined up.
Whether it is aimed at a specific landmark or just down a hallway, it needs to be an image of its
destination orientation. To optimize the likelihood of success of the system, an image with some activity
should be used, activity meaning items hung up on walls or doors; something that is not just a plain
hallway. Figure 20 below shows a strong example image of a template.
31. Figure 20 - Example of a template captured by the Xtion Pro
While technically an image from any camera could be used, given it shoots in the same resolution,
ideally the rgb camera on the Xtion should be used because the comparison images will be shot from it.
To facilitate the creation of a template image, the script shown in Appendix – QRCodeDetection.m, is
used to initialize the camera, snap a picture and then store the outputted image in the folder. This
function allows for templates to be created easily, which is beneficial for testing, but as well for if the
environment in which the robot is functioning, changes.
Template Method - Processing
For the system to orient itself, it compares the current image the camera is seeing with the precaptured
template image. The script analyzes the two images finds a correlation between the two and then
rotates the system respectively. If the script finds no relation between the two pictures, it rotates 58°
(the field of view of the camera), to give itself an entirely new image to try. Once it has a relation
between the two, the script repeats itself until the system is lined up with the template.
Thanks to MATLAB’s Computer Vision Toolbox, the script is fairly straightforward. The script relies
heavily on the toolbox’s built in function’s corner point finding abilities. While the toolbox comes with a
variety of different feature detection functions including “BRISKFeatures”, “FASTFeatures”, and
“SURFFeatures”; the chosen method for this project was “HARRISFeatures” which utilized the Harris-
Stephens algorithm. This method was chosen because of its consistent positive performance in testing.
The “detectHARRISFeatures()” function takes a 2D grayscale image as an input and outputs a
cornerPoints object, ‘I’, which contains information about the feature points detected. Not all of the
points found by the Harris-Stephens algorithm however and further processing needs to be done to find
viable points. The “extractFeatures()” function is then used and takes an input of the image as well as
31
32. the detected cornerPoints to find valid identification points. The function takes a point (from ‘I’) and its
location on the image and examines the pixels around the selected point. The function than outputs
both a set of ‘features’ and ‘validPoints’. The ‘features’ object consists of descriptors, or information
used that sets the points apart from other points. ‘validPoints’ is an object that houses the locations of
those relevant points; it also removes points that are too close to the edge of the images for which
proper descriptors cannot be found [14]. Figure 21 below shows how the code is executed whileFigure
22 Figure 22 shows the results of using these functions to identify Harris features.
Figure 21 - Code application of the functions
Figure 22 - Example of Harris features extracted from an image
32
33. This process is run for both the template image and the comparison image, which brings about the
importance of using the same camera for both images. Different cameras, although same resolution, can
capture an image in different color, making the feature matching process more difficult. With the two
images processed, they must then be compared. Using the built in function, “matchFeatures()”, the two
sets of features can be compared to and determined as to whether or not they represent the same
feature, but from a different angle. The output consists of an index of points that contain features most
likely to correspond between the two input feature sets [15]. Plugging the locations of those specific
points back into the validPoints allows for an object consisting solely of the coordinates of matched
pairs. Figure 23 shows the applications of the functions in the code while Figure 24 shows the result of
comparing the Harris features on the two images. From the image, it is clear to see the translations of
the different points from template to comparison, as represented by the line connecting the two
different datasets. This translation is crucial to the script because that is the key to determining the
amount the robot must rotate to properly line up.
Figure 23 - Code application of the functions
33
34. Figure 24 - Result of identifying the Harris features and finding matching features in the images
The pixel difference between a specific set of points is relative to the angle the robot must rotate to line
up. Because there is usually some discrepancies when it comes to comparing all the different pixel
displacements, an average, ‘d”, is taken of the lot to get a better overall displacement. A multiplier is
created, using the average displacement over the horizontal resolution as a ratio. That ratio is multiplied
by the sensors field of view to get the necessary turn angle, as represented by θ in Equation 5.
Figure 25 - Visual representation of the relative pixel displacements used to find the function
34
35. 35
푑
푥푟푒푠
∗ 퐹푂푉 =
푑
640
∗ 58 = 휃
Equation 5
The robot is then turned the calculated value of θ degrees to try and line up in the right direction. The
entire process is set in a loop to repeat until the robot is as close to lined up as can be. This is made
possible by utilizing the average displacement data from between the valid points. The code is set to
repeat itself, capturing a new comparison image, finding features and comparing them, until the average
difference between points is below a certain number. That chosen number is 20, which relates to an
angle of 1.82 degrees. Most tests, with the upper bound set to 20, would result in an average
displacement much less than that actually being the result of the rotation. Testing also showed that
when the chosen number was lower, the system would try and make it into the bounds and wind up
oscillating until it was no longer remotely lined up. The complete code used to perform the comparison
can be found in Appendix – CreateTemplate.m at the end of the paper.
36. 36
Obstacle Detection
Obstacle detection is extremely important because without it, the system cannot be considered
autonomous. It allows the system to differentiate between what is a clear and travelable location and
what is obstructed and needs to be avoided, on the fly. Once the obstacles are detected, they are able
to be transformed to the map, allowing the robot to move safely and freely.
Process
Before any scripts can be run, a depth image is first captured of the system’s current field of view. The
image is stored as a 3D matrix that holds real world XYZ coordinate data, for each pixel of the camera’s
resolution, i.e. a 640 x 480 x 3 matrix. The XYZ data represents the length, height and depth for each
pixel, in millimeter units, hence the 3D matrix. Figure 26 below shows a graphical representation of the
depth map; where every pixel that is not black, has XYZ data. Figure 27 provides the respective visual
image of the depth capture for clarification purposes. It is important to note that the smooth reflective
tile floor caused issues with depth collection, shown by the close black pixels.
Figure 26 - Mapped Depth Map of a Hallway
Figure 27 - Respective Image Capture of Depth Capture
37. Once the depth image is captured, a ground plane extraction can be run on the image. Due to the low
resolution of the camera, only the bottom 100 pixels of the image were analyzed; anything above those
pixels was too inaccurate and caused issues with extraction.
The ground plane extraction runs on the principle that each step taken down a hallway takes you further
than where you started. This means that each point on the floor should have a point behind it, further
away. To do this, the depth portion (Z) of the depth map is analyzed and each cell is examined. If the
cells vertical neighbor is a larger number, meaning further from the camera, it is not considered an
obstacle. Any cell that is not considered an obstacle, has its location saved to a new matrix. Plotting that
matrix on top of the original image produces Figure 28 below.
Figure 28 - Ground Plane Extraction
The lack of pixels in certain areas is again due to a combination of having a low resolution camera as well
as the material that the floor is made of. However, it is still easy to see in the figure, where the ground is
clear; marked by the green pixels. The pixel selection is then inverted, causing the unselected to now
become selected and vice versa. This is done in order to now highlight and detect what is an obstacle.
In order to try and increase the green pixel density, the data is smoothed using MATLAB’s built in
smoothing function. Further manual smoothing is then performed. Any column of the image that only
has one pixel is removed, as it doesn’t provide enough data. Each column that has two or more green
pixels, is filled with green from the columns minimum value to its highest. This manually removes all the
small missing pixel areas. Lastly, if a column of green has no green columns on either side of it, then it is
also removed because it is more inaccurate data. With the ground plane extracted, all of the opposite
pixels are selected and made into an “obstacle” array. Figure 29 below shows the results of the obstacle
extraction. Green means that the area obstructed, while no marking means the path is clear.
37
38. Figure 29 - Obstacle Extraction
Each marked pixel’s locations (on the image) are updated in the obstacle array. Because the obstacle
array was created to overlay on top of the image, the marked pixel locations are directly related to XYZ
data. The real world location for each pixel is then extracted from the depth image, which is then
written to the already existing map. Because it is a top down two dimensional map the length values (X)
are used as the horizontal distances and the depth values (Z) are used as the vertical distance. All the
values are written from the origin, which is wherever in the map the robot currently is. The process of
writing values to the map consists of assigning the specific cell, determined from the coordinates, a
value of 4, indicating at that location there is an obstacle. Figure 30 below shows a zoomed out matrix
image of the map after the obstacles have been added.
Figure 30 - Colored Map with Obstacles
Because of the zoom, the values of zero are not shown. Anything visible has a value of 1; the black lines
are the walls, red are the detected walls and green indicates the trashcan. The origin of the figure,
where the points are measured from, is the bottom center of the image. The full code can be found in
Appendix – ObstacleDetection.m.
38
39. 39
Obstacle Avoidance
Theory
Robotic pathing, or motion planning, is the ability of a robot to compute its desired movement
throughout the known area by breaking down the movement into smaller steps. The path is calculated
such that the robot could complete travel to its end destination in one large motion. Starting position
and heading of the robot are assumed to be known from either specific placement of the robot or
localization algorithms. A map is imported into the system showing the end destination of the intended
motion along with all known obstacles within the map area. A specific pathing algorithm is then chosen,
using the map as the basis of the path calculation.
Grid-Base Search pathing sees a similar functionality to the occupancy grid in that the map is broken up
into grid that is represented as being filled with either free or occupied points. The robot is allowed to
move to adjacent grid points as long as the cell has a value of free associated with it. Constant checking
from a collision avoidance program is needed for this type of pathing to make sure the grid points along
the planned path are truly free. One consideration with this type of pathing is the size of the grid
overlaid on the map. An adequate resolution has to be chosen depending on the availability of the area
of motion as free. Courser grids with cells representing a larger area offer quicker computational time
but sacrifice precision. A cell that is deemed to be free might actually be occupied in 40 percent of the
cell area. This could lead to potential collision in narrow sections of the map. The other side of the coin
has a fine grid with more precise cell values. More time is needed to complete the path before
operation could start.
Figure 31 - Example of valid and invalid path from grid pathing [16]
40. Potential Fields method sees the use of pseudo-charges to navigate to the destination. The end
destination of the path is given a positive, or opposite, charge that the robot would like to move to.
Obstacles along the robots motion toward the goal will assigned negative, or like, charges. The path is
then determined as the trajectory vector from the addition of all the charges. The potential field method
has advantages that the trajectory is a simple and quick calculation. The problem lies in becoming
trapped in the local minima of the potential field and being unable to find a path. Local minima can be
planned and adjusted for by a number of methods. Adding small amount of noise into the system will
bump out of the minima area and back into useable calculation. Obstacles could also be given a
tangential field to again move the robot out of the minima to recalculate the path [17] .
Figure 32 - Vector field analysis and path of movement from Potential Field method
Execution
As seen in the Obstacle Detection section, several subroutines allow for the robot to map the
unforeseen obstacles within its path onto the map. These obstacles are represented by numerical 1s. As
previously described in the Theory section of Obstacle Avoidance, there are a multitude of techniques
can that be utilized to dynamically calculate the path to avoid these obstacles. The method that was
chosen for this particular project is called the Potential Field method. Adopting behaviors from nature,
obstacles and goals are modeled with pseudo-charges and the contribution of each pseudo-charge is
calculated and the resulting velocity vector is used to drive the robot.
As described in Michael A. Goodrich’s tutorial for the implementation of potential fields method, the
functional behavior of goals and obstacles within the operating environment of the robot are carefully
defined with a few parameters. For 2D navigation, (푥푔, 푦푔) define the position of the goal, and (x, 푦)
define the position of the robot. The variable r will define the radius of the goal. The direct distance as
the angle between the goal and the robot is calculated. By utilizing a set of if statements various
40
41. behaviors can be defined with respect to the operating boundaries of the robot and its interaction with
the goal.
41
{
Δ푥 = Δ푦 = 0
Δ푥 = 훼(푑 − 푟) cos(휃) , Δ푦 = 훼(푑 − 푟)sin (휃)
Δ푥 = 훼 cos(휃) , Δ푦 = 훼 sin(휃)
Where 훼 is the scaling factor of the goal seeking behavior, d is the distance between the robot and the
goal, r is the radius of the goal, and theta is the angle between the goal and the robot. The first equation
is used if the distance between the goal and the robot is less than the radius of the goal. This defines the
behavior of the robot when the goal has been reached. The second equation defines the behavior of the
robot when the radius of the goal is less than or equal to the distance between the goal of the robot and
that is less than the influence field of the goal plus the radius of the goal. This condition is met when the
robot is nearing the goal, and the equation will scale the velocity vector that the robot experiences until
the goal is reached. And finally, the last equation defines the behavior of the robot when it is nowhere
near the goal.
The obstacles are defined in the exact opposite manner. Both the distance and the angle between the
robot and the obstacle are calculated and used in the following equations. These equations are
contained within a set of if statements. These if statements define the behavior of the robot in various
scenarios involving these obstacles.
{
Δ푥 = −(cos(휃))∞, Δ푦 = −(sin(휃))∞
Δ푥 = −훽(푠 + 푟 − 푑) cos(θ) , Δy = −β(s + r − d)sin (θ)
Δ푥 = Δ푦 = 0
Where 훽 is the scaling factor of the obstacle avoiding behavior, d is the distance between the robot and
the goal, r is the radius of the goal, and theta is the angle between the goal and the robot. The first
equation is used if the robot’s position and the obstacle’s position are infinitely small. As such, the
robot’s behavior will be to move away from the obstacle at very large velocity vector. Again, the second
equation is to scale the velocity vector of the robot as it approaches the obstacle, and finally the robot
experiences no contributions to its velocity vector if it is nowhere near the obstacle.
The overall velocity vector is a summation of the contributions of the attractive and repulsive forces
exerted by the goals and obstacles within the robot’s operating environment. This exact method is
utilized can be found in Appendix – PotFieldDrive.m.
42. 42
Elevator Navigation
The process behind the elevator navigating component involves a modified and extended template
matching system. The logic behind the system is that if it can tell that it’s in front of the panel, it can
discern between the different buttons and therefore monitor which are pressed are which are not. The
system first scans to capture the current panel and determines whether or not the desired floor button
is pressed. If the button is not pressed, it prompts the user to press it and will continue scanning until
the button is detected as pressed. The exact detection process will be detailed in the following section.
With the button verified as pressed, the system then checks constantly until it sees that the button is no
longer pressed. This signifies that the correct floor has been reached and the robot will proceed to exit
the elevator.
Process
The first step in the process is capturing a template image. This process is only done once and can be
applied to all of the elevators that use the same panel. This image is captured using the Image
Acquisition Toolbox with the webcam selected. Figure 33 below shows how the call is made to the
webcam. The color space is set to gray scale to simplify the template matching process as well as the
frames per trigger set to one to make it so the program only captures one image every time it’s called.
Figure 33 - Image Acquisition
The template is created and cropped to feature only the actual panel itself, as shown in Figure 34. This
template image is stored and is used for every run; it does not change.
43. Figure 34 - Elevator Panel Template
With the template captured, the process is able to run. With the robot in place facing the panel, it
begins to capture new images. Each image is matched against the template image and compared to find
similar points, using the same method as the localization. However in order to analyze the current
image, a geometric transformation is performed. This automatically scales and crops the captured image
to match that of the template [18]. Figure 35 shows MATLAB actually matching the different images.
Figure 35 - Template Matching with the Captured Image
This is done so that the individual buttons can be analyzed. The image is sectioned off into different
rectangles, one for each button. This is done by creating a matrix of coordinates consisting of the pixel
locations of rectangles that encompass each button. Doing this allows the function to look for a certain
flow, and know how to crop the image accordingly. The cropping matrix for this elevator panel is shown
in Figure 36 and only works on this template.
43
44. Figure 36 - Cropping Matrix
Setting the desired floor to one, so that the system gets off at the lobby, causes the program to crop the
image like as shown in Figure 37.
Figure 37 - First Floor Crop
With the image cropped correctly, the actually button needs to be analyzed to be determined whether
or not it is pressed. In order to do this, the image is converted to strictly black and white. The
illuminated button creates a large amount of white pixels as shown in Figure 38 , which are nonexistent
when the button is not pressed.
Figure 38 - Black and White Conversion
With the white pixels showing, it is easy to count the amount and use that number to determine
whether or not the button is pressed. This is done by using the number of non-zero (nnz) function built
into MATLAB as demonstrated in Figure 39. The nnz function analyzes a matrix, in this case the black and
white image, and counts the number of elements that are non-zero, in this case white pixels [19]. If that
number is large enough, it tells the program that the button is pressed and continues running.
Figure 39 - NNZ Function
44
45. This process is nested within itself and after running and determining that the button is pressed, it
repeats itself. It uses the same exact process to determine when the button is no longer pressed,
signaling that it is time to exit the elevator. The full code can be found in Appendix – ButtonDetection.m.
Exit Elevator
Part of navigating the elevator involves being able to physically leave. The system is set to drive out
using prewritten commands. Originally, depth measurements were going to be used, however other
visitors in the elevator threw off the ability to find the walls and therefore the only way to constantly
move was with scripted steps. The full code can be found in
45
46. 46
Subject Identification
The use of a QR code to determine a visitor’s identity is the current choice for verification. The use of QR
code was chosen for multiple reasons. Because it is basically a barcode, QR codes are very simple to
generate, especially with the help of different internet sites. Now that most people carry around
smartphones that can decode the codes with the use of an app, they are becoming a more and more
popular method to share information. Figure 40 below shows an example of a standard code that is
used.
Figure 40 - Standard QR Code
A code, in this case a static id, can be inputted and a high resolution image of the code can quickly be
created and emailed to a client. The QR code was also chosen for ease of use, mainly with respect for
the client. With more well-known methods, such as facial recognition, a database of client faces would
need to be created. Not only would this be extremely tedious, it could make clients who do not want
their photograph taken feel awkward. Because it is a barcode, the QR Code can easily be decoded into
strings that can be stored in a simple database, much easier than having to sort through a gallery of
photographs. The last reason it was chosen was due to the availability of free software that is
compatible with MATLAB. The current working code for the QR scanning can be found in the Appendix –
QRCodeDetection.m.
Process Explanation
The system is designed to be as user friendly as possible. Due to the lack of a video display, audio cues
are used to guide the visitor through the process. Once the robot reaches the lobby, it begins to scan for
a QR code. The visitor, being told prior to the visit of the process, displays the code on their phone (or a
piece of paper) in front of the camera. The system then scans and either confirms who they are or
rejects the code. If it accepts, it will prompt the user with an audio cue, greeting them by name and
asking to follow to the elevator. If it rejects, it will prompt the user to call and schedule an appointment,
or to call for assistance.
47. QR Code System
The versatility of the QR code allows for unique codes to be easily generated. This project takes
advantage of this and allows someone to send out unique codes to all their expected visitors. These
unique codes not only allow for personalization for visitors, but also allows for an added layer of
security. The same code, while it could scan for anyone, could easily raise a flag when double scanned.
The process of setting up the code is very simple in order to make sure that the entire ordeal is no more
difficult than sending out a normal email. If it takes forever to set up the entire system, you wind up
losing time just like having to have someone go greet the visitor in person, defeating the purpose of the
project.
First a code is created, whether it be randomly or by design, and assigned to someone’s name, like as
shown in Table 2 below. The codes can be normally up to a length of 160 characters, but as shown in
Figure 41, the larger the code the more complex the image becomes. While some cameras can
accommodate the small image, the RGB camera on the Xtion Pro cannot consistently, so smaller 10
character codes are used. The spreadsheet, which could be hosted on a website, or in a private folder on
a service such as Dropbox, or for this proof of concept directly on the laptop, holds these codes and
names.
Table 2 - Database Example
Tyler Aaron 13689482
John Burchmore 2432158
Ian DeOrio 59824
Jeff Gao 209
Figure 41 - (a) 10 character code vs (b) 100 character code
47
49. 49
Design Overview
Codes and Standards
The field of robotics is a new and emerging field and standards are still being put in place. Two very
relevant ISO standards are below.
“ISO/FDIS 13482: Safety requirements for personal care robots
ISO/AWI 18646-1: Performance criteria and related test methods for service robot -- Part 1: Wheeled
mobile servant robot”
It is important to note that these standards are still in early stages of development and are constantly
being updated and modified. Just like the project, they are a work in progress, and new standards and
codes will be used as they are instituted.
50. 50
Model Concept
Design
Figure 42a shows a Creo Parametric 3D model of the planned design for the robot. Figure 42b shows the
completed assembly. The goal of the design was simple yet secure. The model features a 1:1 scale
iRobot Create1 and Xtion Pro2. The aluminum support bar3 can also be changed to a different size if need
be. The assembled robot design changed slightly so the height of the Xtion Pro could be changed if
needed. A web camera was also places on top of the aluminum support bar for QR code scanning and
elevator detection.
Figure 42 – a) 3D Creo Model of Robot System b) Final Proof of Concept Design
The current housing, which can be seen in Figure 43, is designed to fit most small sized
laptops/netbooks, but can be modified if the laptop to be used changes. The housing is round not only
to match more aesthetically with the Create, but also to minimize sharp corners poking out from the
system. The open design was chosen because the reduction in materials will keep the weight at a
1 Courtesy of Keith Frankie at http://www.uberthin.com/hosted/create/
2 Courtesy of GrabCAD user Kai Franke at https://grabcad.com/library/asus-xtion-pro-live
3 Courtesy of 80/20 Inc. at http://www.3dcontentcentral.com
51. minimum as well as because the fabrication of the system will be much easier without having to create a
round wall. Each peg is screwed into the top and bottom plates. There are four smaller pegs that screw
into the iRobot base and the bottom acrylic piece. The gap between all the pegs is roughly six inches, not
enough to remove a laptop. Because of this, the back peg can be easily removed, the laptop slid in and
replaced.
Figure 43 – a) Creo Model of Housing b) Physical Model of Housing
The laptop housing is made of two 0.25” thick clear acrylic pieces with a diameter of about 12”. The top
and bottom pieces are essentially the same aside from the opening on the bottom piece for easy access
to the iRobot Create’s buttons. The six pegs are made from 2” long threaded aluminum standoffs. This
allows the pegs to be screwed on between the acrylic pieces. On the top piece of the acrylic housing the
aluminum t-slotted extrusion extends about 36”, which is attached using two 90-degree brackets. On
the t-slotted frame another 90-degree bracket is used to attach the Xtion 0.25” thick acrylic platform.
This allows for varying the height of the Xtion camera. A close-up of the Xtion and its platform can be
seen in Figure 44. The web camera on top of the aluminum extrusion is attached with a secured 90-
degree bracket to allow for easy removal. The aluminum extrusion is also used for cord management.
51
52. Figure 44 – a) Close up View of Web Camera b) Close up View of Xtion Pro and platform
Between the iRobot Create and the bottom side of the laptop housing are four small nylon standoffs to
allow for some spacing for cords and modularity, which can be seen in Figure 45.
Figure 45 – a) Creo Model Close up of housing spacers b) Physical Model Close up of Housing Spacers
Physics
In order to calculate the center of gravity a couple assumptions need to be made. To start, the Asus
Xtion Pro is assumed to be a uniform solid piece. In reality, it is not completely solid and its actual center
of gravity could be different depending on how the internals are oriented. For ease of calculation, since
its total weight and dimensions are known, the volume and in turn density of the camera can be
calculated for the “solid uniform” piece. A similar assumption needs to be made for the iRobot Create.
Since the exact weights and positions of all the internals of the robot are unknown, the robot is also
assumed to be a uniform solid piece. Models of these two parts were created in PTC Creo and put into
the assembly in order to do the overall center of gravity calculation.
The known densities and calculated mass properties of all the materials used in the model can be seen
in Table 3. PTC Creo uses these densities to calculate the volume and mass of each component. After
running an analysis on the mass properties of the entire assembly, Creo outputs the x, y, and z
coordinates for the center of mass of each component, as well as an overall center of gravity.
52
53. Table 3 - Mass Properties of Model Assembly
53
Part
Density
(lb/in^3)
Mass
(lb)
Center of Gravity (w/ respect to
Create_Robot_Axis)
X Y Z
iRobot Create 0.0353 8.8000 0.0000 1.0831 0.8225
Xtion Pro 0.0249 0.4784 0.0000 18.6422 0.0000
Aluminum
Support 0.0979 1.3553 0.0000 11.4080 0.0002
Xtion Platform 0.0434 0.1049 0.0000 17.5326 0.0000
Laptop Housing 0.0434 2.4994 0.0000 4.1820 -0.0074
Base Spacer 1 0.0434 0.0010 4.3884 2.6580 -0.1022
Base Spacer 2 0.0434 0.0010 4.3884 2.6580 -2.8522
Base Spacer 3 0.0434 0.0010 -4.3884 2.6580 -0.1022
Base Spacer 4 0.0434 0.0010 -4.3884 2.6580 -2.8522
The center of gravity coordinates listed above are with respect to the main axis of the iRobot Create
model. That axis is located at (0, 0, 0) directly in the bottom center of the create model as seen in Figure
46.
Figure 46 - iRobot Create Coordinate System
54. According to the Creo analysis, the calculated center of gravity for the entire assembly is located at: (X,
Y, Z)(0.0000, 3.4898, 0.5448). These numbers do seem to make sense when looking at the model.
With the direction of the x-axis being in the absolute center and all components being centered on the
model, that x-coordinate would be zero. The y-coordinate is expected to be just slightly above the
Create due to the extra weight above, and since the majority of the weight is in the Create itself. The z-coordinate
should be slightly biased toward the front of the robot since more weight is located there.
Parts List
All of the major components and raw materials used in the design of the robot are readily available to
consumers. The iRobot Create, Asus Xtion Pro, and Acer Ferrari are the three major components that are
physically off the shelf products. The Creative Live Web Camera is also readily available off the shelf. The
laptop housing and Xtion support materials required fabricating in the machine shop to the desired
specifications. The acrylic pieces were all modeled in AutoCad and laser cut to the required
specifications. A complete list of all parts and materials can be seen in Table 4.
Table 4 - Final Parts List
54
55. Simulation / Experimental Validation Plan
55
Description of the Test Bed
Bearing Localization
The Xtion Pro is to take a template photo about 80cm from the wall in the SAS lab; this is the goal
orientation of the system. For the template photo, the robot is aligned with the wall such that the Xtion
camera plane is parallel with the wall. This original orientation of the robot corresponds with a heading
of zero degrees. With the template taken, MATLAB is then used to turn the robot to a random angle in
the positive and/or negative direction. Because the field of view of the camera is 58 degrees, that
bounds the random number. Using the function shown in the Localization-Bearing Template Method
section, the robot then captures a new image to compare to the template image. Feature detection is
used to determine any differences from the template and new images. The displacement of matched
features is then related to an angle, and the robot will turn that angle back towards the zero degree
heading. After the turn has been made, a new comparison image is taken and the comparison process
repeated. If the measured pixel displacements between features are small enough, as explained in the
validation section of the report, the process will end. If the resulting displacement is too large, the
displacement will again be turned into an angle and the robot turned. The process is repeated until the
system reaches that zero degree heading; within a certain level of uncertainty.
Using the flat wall assumption program described in the Localization-Bearing section of this report,
another method of determining the pose of the robot is performed. The robot is set at a distance of
80cm from the wall, with an orientation so that the plane of the Xtion camera is parallel to the wall. This
validation will only work efficiently if the robot is facing a wall without any major protrusions. The robot
is then commanded to turn a random number of degrees in the positive and/or negative direction. As
described earlier, the program takes measurements of two different points, one on either side of the
robot. It then uses trigonometry to calculate the angle to turn in order to make both those
measurements equal. The program runs in a loop until the measurements are determined to be equal,
and the validation is over. The corrected angle is output in MATLAB, which can be compared to the
initial angle of offset.