Transcript of "Monocular simultaneous localization and generalized object mapping with undelayed initialization"
1.
國立臺灣大學電機資訊學院資訊工程學系 碩士論文 Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science National Taiwan University Master Thesis 以單一攝影機完成同步定位、地圖建置與物體追蹤之 非延遲初始化演算法Monocular Simultaneous Localization and Generalized Object Mapping with Undelayed Initialization 蕭辰翰 Chen-Han Hsiao 指導教授：王傑智 博士 Advisor: Chieh-Chih Wang, Ph.D. 中華民國 99 年 7 月 July, 2010
4.
摘要 已有不少基於卡爾曼濾波器的研究結果展示了使用單一相機來進行同步定位、建立地圖(SLAM)的可行性。然而，較少研究探討 SLAM 在動態環境中的可行性。為了能在動態環境中同時建立靜態與動態地圖，我們提出一個基於卡爾曼濾波器的演算法架構及新的參數表示法來整合移動物體。藉由新的參數表示法，我們的演算法能同時估測環境中的靜態物體及動態物體，而達到廣泛物體的地圖建製(SLAMwith generalized objects)。這樣的參數表示法繼承了倒數深度表示法(Inversedepth parametrization)的優點，像是較大範圍的距離估測、較佳的線性化參數表示。目前關於 SLAM 在動態環境中的研究，皆需要數筆測量以確保物體的靜止性質，再延遲的進行物體初始化。而我們的參數表示法允許無延遲的物體初始化，使得我們的演算法能利用每一筆的測量而獲得更好的估測。同時，我們也提出了一個低運算量的動態、靜態物體分類演算法。模擬實驗顯示了我們演算法的準確性。而真實環境實驗也顯示了我們的演算法能在室內動態環境成功的進行廣泛物體的地圖建製(SLAM with generalized objects)。
5.
MONOCULAR SIMULTANEOUS LOCALIZATION AND GENERALIZED OBJECTMAPPING WITH UNDELAYED INITIALIZATION Chen-Han Hsiao Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan July 2010 Submitted in partial fulﬁlment of the requirements for the degree of Master of Science Advisor: Chieh-Chih Wang Thesis Committee: Chieh-Chih Wang (Chair) Li-Chen Fu Yung-Yu Chuang Han-Pang Huang Ta-Te Lin c C HEN -H AN H SIAO, 2010
6.
ABSTRACTABSTRACTR ECENT works have shown the feasibility of the extended Kalman ﬁltering(EKF) approach on simultaneous localization and map- ping (SLAM) with a single camera. However, few approaches have addressed the solutions for the insufﬁciency of SLAM todeal with dynamic environments. For accomplishing SLAM in dynamicenvironments, we proposed a uniﬁed framework based on a new para-metrization for both static and non-static point features. By applying thenew parametrization, the algorithm is able to integrate moving features andthus achieve monocular SLAM with generalized objects. The new para-metrization inherits good properties of the inverse depth parametrizationsuch as the ability to adopt large range of depths and better linearity. Inaddition, the new parametrization allows undelayed feature initialization.Contrary to the existing SLAM algorithms with delayed initialization ap-proach which takes some measurements for the classiﬁcation usage, ourSLAM with generalized objects algorithm and undelayed initialization al-gorithm would utilize each measurement on point features for ﬁltering andhas a better estimation of the environment. A low computational classiﬁca-tion algorithm to distinguish static and moving features is also presented.Simulations shows high accuracy of our classiﬁcation algorithm and esti-mation about features. We also demonstrate the success of our algorithmwith real image sequence captured from an indoor environment. ii
8.
LIST OF FIGURESLIST OF FIGURES2.1 rWC denotes the camera position, and qWC denotes quaternion deﬁning orientation of camera. Moving Object is coded with the dynamic inverse depth parametrization. . . . . . . . . . . . . . . . 63.1 Velocity convergency of 3 target features under observable condition 103.2 Velocity convergency of 3 target features under unobservable condition . . . . . . . . . . . . . . . . . . . . 144.1 Effect of different classiﬁed threshold on the classiﬁcation result . 174.2 Convergency of our SLAM algorithm shown with boxplot. The lower quartile, median, and upper quartile values of each box shows the distribution of the estimation error of all the objects in each observed frame. (a)estimation error of the camera increases when exploring(frame 1 to frame 450) and decreases when close- loop(frame 450 to frame 600) (b)estimation error of the static features decreases with numbers of observed frame (c)estimation error of the moving features decreases with numbers of observed frame. . . . 214.3 The NTU PAL7 robot. Real data experiment platform of monocular SLAM with generalized object. . . . . . . . . . . . . . . . . . . . . 224.4 The basement of the CSIE department at NTU. Green/grey dots show the map built using the laser scanner. . . . . . . . . . . . . . 234.5 The image sequence collected in the basement and the corresponding monocular SLAMMOT results. Figures 4.5(a), 4.5(c), 4.5(e), 4.5(g), 4.5(i), 4.5(k), 4.5(m), and 4.5(o) show the results of feature extraction and association. Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l), 4.5(n), and 4.5(p) show the monocular SLAM with generalized object results in which black and grey triangles and lines indicate the camera poses and trajectories from monocular SLAM with generalized object and LIDAR-based SLAMMOT. Gray points show iv
9.
LIST OF FIGURES the occupancy grid map from LIDAR-based SLAM. All the estimation of visual features are inside the reasonable cube. . . . . . . . . . . 274.6 The result of the SLAM part of monocular SLAM with generalized object. The deﬁnitions of symbols are the same as Figure. 4.5. There are 107 stationary features in the state vector of monocular SLAM with generalized object. . . . . . . . . . . . . . . . . . . . . . . . . 29 v
10.
LIST OF TABLESLIST OF TABLES4.1 Total classiﬁcation result of 50 Monte Carlo simulation in the observable condition with the threshold ts = 1.6 . . . . . . . . . . . . . . . . . 194.2 Total classiﬁcation result of real experiment . . . . . . . . . . . . . 23 vi
11.
1.1 INTRODUCTION CHAPTER 1 INTRODUCTION1.1. INTRODUCTION Recently, SLAM using a monocular or stereo camera as the only sen-sor has been proven feasible and thus become popular in robotics (Davisonet al., 2007; Lemaire et al., 2007). To overcome the weakness of the XYZencoding system in Davison et al.’s approach, Montiel et al. proposed aninverse depth parameterization approach (Montiel et al., 2006; Civera et al.,2008). Montiel et al.’s approach shows a better Gaussian property for theEKF algorithm, and a non-delayed initialization procedure increasing thespeed of convergence. The inverse depth parameterization also providesthe feasibility for estimating a feature at potentially inﬁnite. However, theinverse depth parameterization is only deﬁned for positive depth. The in-verse depth of a feature may converge to negative value and therefore causea catastrophic failure (Parsley & Julier, 2008). Several attempts have been made to solve the SLAM problem in dy-namic environments. Sola discussed the observability issue of bearing-onlytracking and proposed using two cameras to solve SLAM and moving ob-ject tracking with some heuristics for detecting moving object in speciﬁcscenarios (Sola, 2007). Wangsiripitak and Murray (Wangsiripitak & Murray, 1
12.
1.1 INTRODUCTION2009) presented an approach to recover the geometry of known 3D movingobjects and avoid the effect of wrongly deleting the occluded features. Intheir approach, manual operations such as deleting features on non-staticobjects are needed. Migliore et al. (Migliore et al., 2009) demonstrated amonocular SLAMMOT system with a separated SLAM ﬁlter and a mov-ing object tracking ﬁlter. The classiﬁcation for the moving object is basedon the Uncertain Projective Geometry (Hartley & Zisserman, 2004). Vidal-Calleja et al. analyzed the observability issue of bearing-only SLAM sys-tems and identiﬁed the motion for maximizing the number of observablestates (Vidal-Calleja et al., 2007). However, the current approaches aboutSLAMMOT decouple the tracking part and the SLAM part. Also, the clas-siﬁcation between moving objects and static objects takes several steps. Ex-isting approaches adapted delayed initialization and thus the estimation ofSLAM cannot beneﬁt from the observation information during the classiﬁ-cation steps. In this thesis, we propose a framework for monocular SimultaneousLocalization and Generalized Object Mapping. This research presents anun-delayed initialization approach and a better classiﬁcation method. Boththe simulation and real experiment will be demonstrated and evaluated.The static environment assumption will not be needed in our approach. In chapter 2, we deﬁne the state vector for EKF SLAM with SLAM withgeneralized object, especially the proposed parametrization for landmarksin dynamic environments. The un-delayed initialization method is also il-lustrated. Chapter 3 gives the detail of our classiﬁcation algorithm for dis-tinguishing static features and moving features. In addition, the observabil-ity issue for classiﬁcation is discussed. In chapter 4, both the simulation 2
13.
1.1 INTRODUCTIONand real experimental results are provided to show the performance of ourapproach. 3
14.
2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT CHAPTER 2STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT2.1. STATE VECTOR DEFINITION IN SLAM WITH GEN- ERALIZED OBJECT2.1.1. State Vector Deﬁnition To build a feature-based map, we applied the Extended Kalman Filter(EKF) based Simultaneous Localization and Mapping (SLAM) algorithm.Following the standard EKF SLAM, we maintain a state vector containing apose of the camera and locations of features. ⊤ ⊤ χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ k k k k (2.1)The variable xk is composed of rW camera position, qW quaternion deﬁningorientation, vW velocity and ω C angular velocity. rW qW xk = W v (2.2) ωC The constant velocity and constant angular velocity motion model de-rived from Montiel’s approach is applied in our monocular system (Mon-tiel et al., 2006). For the generalized object oi , the encoding parametrization k 4
15.
2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECTcould be the inverse depth parametrization for static features or the dy-namic inverse depth parametrization for both static and moving features,which is composed of the position and the velocity. The proposed parame-trization will be introduced in section 2.1.2.2.1.2. Dynamic Inverse Depth Parametrization Landmarks in dynamic environments may not be stationary. Thus, para-metrization containing only position is not enough for the non-static land-marks. To represent a dynamic environment, we come up with dynamicinverse parametrization combining the inverse depth parametrization andthe 3-axis velocities to model each landmark. Each landmark is coded withthe 9-dimension state vector. ⊤ ⊤ oi = k i ok vi k ⊤ = xk yk zk θk φk ρk vx vy vz k k k (2.3) iok is the 3D location of the i-th landmark with the inverse depth parame- ytrization, and vi = (vx vk vz )⊤ denotes the 3-axis velocities in the world k k kcoordinate system. The 3D location of the feature with respect to the XYZcoordinate is: Xi xk Yi = loc(oi ) = yk + 1 × G (θk , φk ) (2.4) Zi k zk ρk In the prediction stage of the EKF algorithm, the features are predictedby applying the constant velocity model as illustrated in Figure. 2.1. Theprediction state of the features can be calculated in a closed form: 5
16.
2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECT k+1 = loc(ok ) + vk · ∆t oi i i (2.5) 1 = ri + Gk + vi · ∆t ρk k 1 = ri + Gk+1 ρk+1 where Gk and Gk+1 are the directional vectors. Figure 2.1. rWC denotes the camera position, and qWC denotes quaternion deﬁning orientation of camera. Moving Ob- ject is coded with the dynamic inverse depth parametri- zation. In the update stage of the EKF algorithm, the measurement model ofthe features is also derived from Montiel et al.’s approach. In our approach,each feature is either coded with the inverse depth parametrization or the 6
17.
2.1 STATE VECTOR DEFINITION IN SLAM WITH GENERALIZED OBJECTdynamic inverse depth parametrization. Position of all the features is repre-sented by the inverse depth parametrization, thus we follow the measure-ment model proposed in Montiel et al.’s approach.2.1.3. Undelayed Feature Initialization As the dynamic inverse depth parametrization is an extension of theinverse depth parametrization. The initial values for the position of a newfeature can be calculated from rWC , qWC , h = ( u v )⊤ , ρ0 as in Montiel et ˆ ˆal.’s approach. For the initial value in velocity, v0 is set to be 0. And forthe covariance value in velocity, σv is designed to cover its 95% acceptanceregion [−|v|max , |v|max ]. So: |v|max σv = (2.6) 2 The initial state of an observed feature is y(ˆWC , qWC , h, ρ0 , v0 ) = (xi , yi , zi , θi , φi , ρi , v0 )⊤ ˆ r ˆ ˆ ˆ ˆ ˆ ˆ ˆ (2.7)After adding the feature into the state vector, the state variance Pk|k becomes ˆ Pk|k 0 0 0 0 Rj 0 0 ⊤ ˆ new Pk|k = J J 0 0 σρ 0 2 0 0 0 σv 2 I 0 J= ∂y ∂y ∂y ∂y ∂y , ∂ rWC ∂ qWC , 0, . . . , 0, ∂h, ∂ρ , ∂v By using the dynamic inverse depth parametrization, we are able toadd a new feature into the state vector at the ﬁrst observed frame. Throughthe undelayed feature initialization, our monocular system would uses eachmeasurement on the point features to estimate both the camera position andthe feature locations and gets better estimations. 7
18.
3.1 STATIC AND MOVING OBJECT CLASSIFICATION CHAPTER 3 STATIC AND MOVING OBJECT CLASSIFICATION3.1. STATIC AND MOVING OBJECT CLASSIFICATION Retaining stationary features in the map would be needed for gettingbetter estimation or for the usage of close loop. Hence, classiﬁcation is nec-essary. We propose a classiﬁcation method of low-computation cost basedon the estimation of the velocity state of the features in this chapter.3.1.1. Velocity Convergency We run simulation experiments with the dynamic inverse depth para-metrization and the undelayed feature initialization technique discussed in2.1.1 to verify the velocity convergency of the features in dynamic environ-ments. In the environment, there were 40 static landmarks and 2 movinglandmarks. 39 static landmarks were added to the state vector using the in-verse depth parametrization as known features. One static landmark (target1) and two moving landmarks (target 2 and target 3) were initialized at theﬁrst observed frame with the dynamic inverse depth parametrization andadded to the state vector. The camera trajectory was designed as a helix.We checked the velocity distribution of these 3 landmarks (target 1, target 2 8
19.
3.1 STATIC AND MOVING OBJECT CLASSIFICATIONand target 3) coded in the dynamic inverse depth parametrization after 150EKF steps. Figure. 3.1 shows the velocity convergency. In this simulation example, we found that velocity distribution of thefeatures converged and thus providing useful information for classifyingthe type of these features. Hence we developed a classiﬁcation algorithmbased on the estimation of the velocity distribution.3.1.2. Deﬁne Score Function for Classiﬁcation 3.1.2.1. score function for classifying static objects. To classify thefeatures as static or moving, we deﬁne a score function mapping a velocitydistribution to a score value. Then we use the score to determine the featureas static or moving. Given a 3-dimension velocity distribution X = N (µ , Σ),the score function is deﬁned as: 1 Cs (X) = fX (0) = (3.1) −1 (2π )3/2 |Σ|1/2e 2 (0−µ )⊤ Σ−1 (0−µ )fX is the probability density function of Gaussian distribution X. That is,the score function calculates the probability density function value of thevelocity distribution at (0, 0, 0)⊤ . The score reveals the relative likelihood ofthe velocity variable to occur at (0, 0, 0)⊤ . For a static feature oi , the velocity vi is expected to converge closed to k k(0, 0, 0)⊤ . The score would thus increases and exceeds threshold ts and thenhelps the monocular system to classiﬁed static objects. 3.1.2.2. score function for classifying moving objects. For classify-ing each object as either static type or moving type, we further deﬁne thescore function for classifying moving objects. Given a 3-dimension velocitydistribution X = N (µ , Σ), the score function is deﬁned as: Cm (X) = DX (0) = 2 (0 − µ )⊤ Σ−1 (0 − µ ) (3.2) 9
20.
3.1 STATIC AND MOVING OBJECT CLASSIFICATION(a) Target 1 (static object marked with green circle) under observable condition. Ground-truth velocity of the target v = (0, 0, 0)(b) Target 2 (moving object marked with green circle) under observable condition. Ground-truth velocity of the target v = (1, 0, 0)(c) Target 3 (moving object marked with green circle) under observable condition. Ground-truth velocity of the target v = (0, 0, 0.5) Figure 3.1. Velocity convergency of 3 target features under observ- able condition 10
21.
3.1 STATIC AND MOVING OBJECT CLASSIFICATIONDX is the Mahalanobis distance function under distribution X. Mahalanobisdistance could be used to detect outlier. Thus, we can check whether thepoint (0, 0, 0)⊤ is an outlier of the distribution. For a moving feature oi , the velocity vi is expected to converge away k kfrom (0, 0, 0)⊤ . The score would thus increases and exceeds threshold tmand then helps the monocular system to classiﬁed moving objects.3.1.3. Classiﬁcation State With the dynamic inverse depth parametrization and the proposed clas-siﬁcation algorithm, SLAM with generalized object could be implement asfollowing. Each feature is initialized at the ﬁrst observed frame with the dy-namic inverse depth parametrization and labeled as unknown state. In eachof the following observed frame, we examine the estimated distribution ofthe feature using the two score functions. 3.1.3.1. from unknown state to static state. If we ﬁnd that the scorevalue Cs (X) of a unknown-state feature exceeds the threshold ts at a certainframe, we immediately classify the feature as static object and label the fea-ture as static. Also, the velocity distribution of the feature is adjusted tosatisﬁed the property of a static object. The velocity is set to (0, 0, 0)⊤ andthe correspond covariance covariance is set to 0. After the feature is classi-ﬁed as static, we also assume the feature is static and make not predictionat the prediction stage to ensure the velocity of the object ﬁxed at (0, 0, 0)⊤ .The trasition process can be expressed as a function: f ( xk yk zk θk φk ρk vx vk vz ) y k k (3.3) = xk yk zk θk φk ρk 0 0 0 11
22.
3.1 STATIC AND MOVING OBJECT CLASSIFICATION In fact, the feature coded in the 9-dimension dynamic inverse depthparametrization contribute to the estimation process as the same as a staticfeature coded in 6-dimension inverse depth parametrization. Also, the tran-sition from an unknown state to a static state makes the covariance matrixbecome sparse and thus reduce the computation cost in the implementationof SLAM with generalized object. The computation complexity for a staticfeature coded in the dynamic inverse depth parametrization with zero ve-locity is the same as the computation complexity for a static feature codedin the inverse depth parametrization. 3.1.3.2. from unknown state to moving state. If we ﬁnd that thescore value Cm (X) of an unknown-state feature exceeds the threshold tm at acertain frame, we immediately classify the feature as a moving object and la-bel the feature as moving. Note that the feature has been initialized with thedynamic inverse depth parametrization, both the position and the velocityare already being estimated thus there is no need to adjust the distributionand the motion model. ⊤ ⊤ Finally, the state vector χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ is composed of three k k k ktypes of features (unknown, static, moving). Different type of feature is ap-plied different motion model. Unknown-type and moving-type features areapplied motion model with acceleration noise while the static-type featuresare applied stationary assumption. Thus a generalized object mapping ap-proach is achieved.3.1.4. Issue on unobservable situations 3.1.4.1. Non-converged velocity distribution under unobservable sit-uations. Under unobservable situations, the monocular system cannotaccurately estimate the location of a moving feature. Thus, the system also 12
23.
3.1 STATIC AND MOVING OBJECT CLASSIFICATIONcannot accurately estimate the velocity of the feature. We check the effect ofunobservable situations on the proposed classiﬁcation algorithm with sim-ulation. The scenario in simulation is set as the same as the scenario in Figure.3.1. Same moving objects with same moving patterns are in the scenario ex-cept that the camera moves at constant speed. Note that under such cameratrajectory, the projection of target 1 (static object) is the same as the projec-tion of target 3 (moving object). This condition matches the observabilityissue which means the disability of monocular system to ﬁnd an uniquetrajectory of an object under the constant-velocity assumption on movingobjects. We checked the velocity distribution of the 3 landmarks coded inthe dynamic inverse depth paramatrization after 150 EKF steps. From Figure 3.2, we can ﬁnd that velocity distribution of these threetarget objects do not converged. Three velocity distributions cover largearea so that we can not ensure the velocity of these objects. 3.1.4.2. ambiguation of a static object and a parallel-moving object.The location distribution and velocity distribution of both target 1 and tar-get 3 are the same. While target 1 is static and target 3 is moving, we cannotdistinguish the state of them according to the velocity distribution. In fact,the projection points of target 1 and target 3 are the same during these 150frames. The estimation of these two objects must be the same. Thus, it isimpossible to classify target 1 and target 3 as static or moving under themonocular system. This ambiguation could be extended to any static ob-ject since we can ﬁnd a corresponding moving object whose projections arethe same to the static object under unobservable situations. Note that suchcorresponding moving object must moves parallel to the camera. From the 13
24.
3.1 STATIC AND MOVING OBJECT CLASSIFICATION(a) Target 1 (static object marked with green circle) under unobservable condition. Ground-truth velocity of the target v = (0, 0, 0)(b) Target 2 (moving object marked with green circle) under unobservable condition.Ground-truth velocity of the target v = (1, 0, 0)(c) Target 3 (moving object marked with green circle) under unobservable condition.Ground-truth velocity of the target v = (0, 0, 0.5) Figure 3.2. Velocity convergency of 3 target features under unobservable condition 14
25.
3.1 STATIC AND MOVING OBJECT CLASSIFICATIONsimulation, we can understand the disability of classiﬁcation under unob-servable situations. 3.1.4.3. Non parallel-moving object. However, velocity distributionof target 2 reveals another fact. 95% conﬁdence region of the velocity es-timation do not cover origin point (0, 0, 0)⊤ . The zero velocity was ﬁlteredout with our SLAM with generalized object algorithm. Thus, the movingobject would be classiﬁed as a moving object even under unobservable sit-uations. In fact, no static object would have the same projection as the nonparallel-moving objects, which mean there is no ambiguation between staticobjects and non parallel-moving objects. Thus, we could ﬁnd out the possi-ble range of velocity distribution by ﬁltering technique and classiﬁed thosenon parallel-moving objects as moving under unobservable situations. 3.1.4.4. Assumptions for solving the unobservable issue. Althoughwe have seen the disability of classiﬁcation under unobservable situations,especially the ambiguation of a static object and a parallel-moving objectdue to the same projection in the monocular system, assumptions may helpus to classify the object under the unobservable. For example, for an en-vironments that has little moving object moved parallel to the camera, am-biguation between static object and a parallel-moving object is solved. Clas-siﬁcation could be done using our algorithm. 15
26.
4.1 EXPERIMENTAL RESULTS CHAPTER 4 EXPERIMENTAL RESULTS4.1. EXPERIMENTAL RESULTS4.1.1. Simulation 4.1.1.1. Effect of different threshold on the classiﬁcation result. Toevaluate the effect of different threshold on the classiﬁcation, simulation ex-periments were conducted. The accuracy of classiﬁcation will be comparedunder different thresholds. Since there are three possible states (unknown,static and moving) of an estimating feature, the wrongly classiﬁed error andthe misclassiﬁed error are deﬁned: wrongly classiﬁed error: A feature would be added with an unknown state. If the feature is ﬁnally classiﬁed as a different type as it should be, we say the feature is wrongly classiﬁed. For example, a static feature is classiﬁed as a moving feature or a moving feature is ﬁnally classiﬁed as a static feature. misclassiﬁed error: A feature would be added with an unknown state. If the feature is wrongly classiﬁed or not be classiﬁed as either static or moving type, we say the feature is misclassiﬁed. The simulated scenarios are shown in Figure. 4.1(a) and Figure.4.1(b).In Figure. 4.1(a), the camera moved with a non-constant speed on the circle 16
27.
4.1 EXPERIMENTAL RESULTS to avoid the unobservability situation. In Figure. 4.1(b), the camera moved with a constant speed on four connected lines to test the performance under an unoberservable situation. 300 static landmarks and 288 moving land- marks were randomly located in a 3D cube with a width of 30 meters in each scenario. 50 Monte Carlo simulations of each scenario were run and evaluated. 25 25 20 20 15 15 Z[m] 10 Z[m] 10 5 5 0 0 −5 5 −5 5 0 0 −5 −5Y[m] 25 Y[m] 25 20 20 15 15 10 10 X[m] 5 X[m] 5 0 0 −5 −5 (a) Observable scenario (b) Unobservable scenario misclassified ratio[moving object] misclassified ratio[moving object] misclassified ratio[static object] misclassified ratio[static object] wrongly classified [static object] wrongly classified [static object] 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 ts ts (c) Misclassiﬁed ratio in observable scenario (d) Misclassiﬁed ratio in unobservable sce- nario Figure 4.1. Effect of different classiﬁed threshold on the classiﬁcation result Instead of using a ROC curve to present the performance, the relations between threshold and classiﬁed ratio are shown directly in Figure. 4.1(c) 17
28.
4.1 EXPERIMENTAL RESULTSand 4.1(d). In both scenarios, the misclassiﬁed ratio of static features in-creases when the threshold ts increases, while the misclassiﬁed ratio of mov-ing features decreases. For example, misclassiﬁed ratio of static features isincreasing from 0 to 0.5 when ts = 1 is increasing to ts = 3.5 under observ-able situations. Meanwhile, misclassiﬁed ratio of static features is increas-ing from 0.1 to 1 when ts = 1 is increasing to ts = 3.5 under unobservablesituations. And misclassiﬁed ratio of moving features is decreasing from 0.03 to0.01 when ts = 1 is increasing to ts = 3.5 under observable situations. Mean-while, misclassiﬁed ratio of moving features is decreasing from 0.03 to 0.01when ts = 1 is increasing to ts = 3.5 under unobservable situations. This ﬁnding satisﬁed our expectation that a larger threshold ts would re-sult in less features classiﬁed as static feature. Thus, when a larger thresholdts is chosen, the misclassiﬁed ratio of static features would increase and mis-classiﬁed ratio of moving features would decrease. The trade-off betweenthese two misclassiﬁed ratios should be considered according to the usageof the monocular system. Further, the classiﬁcation performance is better under an observable sit-uations. By comparing Figure. 4.1(a) and Figure.4.1(b), it can be seen thatmisclassiﬁed ratio of static features is smaller under observable situationsthan the ratio under unobservable situations. The results match the dis-cussions in Sec 3.1. Thus, an observable situations would be necessary forbetter classiﬁcation performance. However, we should note that a small portion of misclassiﬁed featuresare caused by wrongly classiﬁcation. That means the classiﬁcation algo-rithm does not provide incorrect information. It retains the unknown state 18
29.
4.1 EXPERIMENTAL RESULTSof a feature when the information is insufﬁcient and still letting the un-known state features contribute to our monocular system. By choosing the threshold ts = 1.6, we can obtain a good classiﬁcationperformance. Table 4.1 shows the classiﬁcation result under the thresholdts = 1.6 in the observable condition. Thus, we choose the threshold ts = 1.6in the following experiments for checking the convergency of our SLAMalgorithm and the real experiment. classiﬁcation state static moving unknown Static 9480 16 1 Moving 71 6734 30 Table 4.1. Total classiﬁcation result of 50 Monte Carlo simulation in the observable condition with the threshold ts = 1.6 4.1.1.2. Convergency of our SLAM algorithm. We further checkedthe convergency of our SLAM algorithm in the observable scenario. Theestimation error on the camera, static features and moving features of 50Monte Carlo simulation results are shown with boxplot in Figure. 4.2. Theestimation error on the camera increases when the robot is exploring theenvironment from frame 1 to frame 450. The camera starts to close loopfrom frame 450, thus the error decreases. The mean of the errors in Figure.4.2(a) reveals the ﬁnding. During the procedure of SLAM, the estimationerrors do not diverged. As presented in Figure. 4.2(b) and Figure. 4.2(c),estimation errors of both static features and moving features decrease whenthe number of estimated frame increases. 19
30.
4.1 EXPERIMENTAL RESULTS estimation error of the camera 2 1.8 1.6camera estimation error [m] 1.4 1.2 1 0.8 0.6 0.4 0.2 0 60 120 180 240 300 360 420 480 540 600 frame (a) estimation error of the camera estimation error of the static objects 15 static object estimation error [m] 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 frame (b) estimation error of the static objects 20
31.
4.1 EXPERIMENTAL RESULTS estimation error of the moving objects 15 moving object estimation error [m] 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 frame (c) estimation error of the moving objectsFigure 4.2. Convergency of our SLAM algorithm shown with box- plot. The lower quartile, median, and upper quartile values of each box shows the distribution of the esti- mation error of all the objects in each observed frame. (a)estimation error of the camera increases when explor- ing(frame 1 to frame 450) and decreases when close- loop(frame 450 to frame 600) (b)estimation error of the static features decreases with numbers of observed frame (c)estimation error of the moving features decreases with numbers of observed frame. 21
32.
4.1 EXPERIMENTAL RESULTS4.1.2. Real Experiments Figure 4.3. The NTU PAL7 robot. Real data experiment platform of monocular SLAM with generalized object. A real experiment with 1793 loop closing image sequence was run andevaluated. Figure. 4.3 shows the robotic platform, NTU-PAL7, in which aPoint Grey Dragonﬂy2 wide-angle camera was used to collect image datawith 13 frames per second, and a SICK LMS-100 laser scanner was usedfor ground truthing. The ﬁeld of view of the camera is 79.48 degree. Theresolution of the images is 640 × 480. The experiment were conducted inDepartment of Computer Science and Information Engineering (CSIE), Na-tional Taiwan University (NTU). Figure. 4.4 shows the basement (15.2 ×11.3 meters) in which we veriﬁed the overall performance of SLAM withgeneralized object such as loop closing, classiﬁcation and tracking. Duringthe experiments, a person moved around in the environment and appeared3 times in front of the camera. In the experiments, there are 107 static features and 12 moving features.Each time the moving person appeared, 4 features on the person are gener-ated and initialized. Table 4.2 shows the performance of our classiﬁcation 22
33.
4.1 EXPERIMENTAL RESULTS 15 Wall 2 ↓ 10 Desks 2 ↓ ← Desks 1 5 Wall 3 → Z[m] Checker ↑ ← Wall 1 0 Board Desks 3 → −5 Wall 4 ↑ −10 20 15 10 5 0 −5 −10 −15 −20 X[m] Figure 4.4. The basement of the CSIE department at NTU. Green/grey dots show the map built using the laser scanner.algorithm. None of the feature is wrongly classiﬁed. 107 static features inthe environment are all classiﬁed correctly as static. And 12 moving featuresare also classiﬁed correctly as moving. classiﬁcation state static moving unknown Static 107 0 0 Moving 0 12 0 Table 4.2. Total classiﬁcation result of real experiment 23
34.
4.1 EXPERIMENTAL RESULTS The estimation with time is shown in Figure. 4.5. We estimated thetrajectory of the camera, built the map and estimated the moving objectsuccessfully using our proposed algorithm. Figure. 4.5 (f), (j) and (n) showsthe estimation of the moving person. Clearly, the algorithm did not failwhile the image sequence was captured from a dynamic environment.(a) Frame 10. In the beginning of our SLAM (b) Top view: Frame 10. Squares indicate thewith generalized object algorithm, each fea- stationary features and blue shadows indi-ture are assumed static and added into the cates the 95% acceptance regions of the es-state vector. The ellipses show the projected timates. The possible location of the features2σ bounds of the features. have not converged.(c) Frame 220. The robot start to explore the (d) Top view: Frame 220. The possible loca-environment. tion of the features start to converged. 24
35.
4.1 EXPERIMENTAL RESULTS(e) Frame 330. The person appeared in front (f) Top view: Frame 330. Red shadows indi-of the camera the ﬁrst time. 4 feature are lo- cates the 95% acceptance regions of the mov-cated and initialized in the state vector. Red ing features.ellipses show the projected 2σ bounds of themoving features. Note that some static fea-tures are occluded by the person. Cyan el-lipses show the projected 2σ bounds of thenon-associated features. (g) Frame 730 (h) Top view: Frame 730 25
36.
4.1 EXPERIMENTAL RESULTS(i) Frame 950. The person appeared in front (j) Top view: Frame 950. Green shad-of the camera the second time. And as the ows indicates the 95% acceptance regions ofrobot exploring the environment, some new the newly initialized features with unknownfeature would be added as unknown feature. stateGreen ellipses show the projected 2σ boundsof those newly initialized features with un-known state. (k) Frame 1260 (l) Top view: Frame 1260 26
37.
4.1 EXPERIMENTAL RESULTS(m) Frame 1350. The person appeared in (n) Top view: Frame 1350front of the camera the third time. (o) Frame 1560 (p) Top view: Frame 1560 Figure 4.5. The image sequence collected in the basement and the corresponding monocular SLAMMOT results. Figures 4.5(a), 4.5(c), 4.5(e), 4.5(g), 4.5(i), 4.5(k), 4.5(m), and 4.5(o) show the results of feature extraction and association. Figures 4.5(b), 4.5(d), 4.5(f), 4.5(h), 4.5(j), 4.5(l), 4.5(n), and 4.5(p) show the monocular SLAM with generalized object results in which black and grey triangles and lines indicate the camera poses and trajectories from monoc- ular SLAM with generalized object and LIDAR-based SLAMMOT. Gray points show the occupancy grid map from LIDAR-based SLAM. All the estimation of visual features are inside the reasonable cube. 27
38.
4.1 EXPERIMENTAL RESULTS The ﬁnal map is shown in Figure. 4.6 in both top view and side view.Compared with the gray map built with laser-SLAMMOT algorithm, themap built with our monocular SLAM with generalized object algorithm isclosed to the location of the environments. All the estimated features arelocated within a reasonable cube. No estimation of features goes outsidethe reasonable range. 28
39.
4.1 EXPERIMENTAL RESULTS (a) Top view: Frame 1690 (b) Side view: Frame 1690Figure 4.6. The result of the SLAM part of monocular SLAM with generalized object. The deﬁnitions of symbols are the same as Figure. 4.5. There are 107 stationary features in the state vector of monocular SLAM with generalized object. 29
40.
5.1 CONCLUSION AND FUTURE WORK CHAPTER 5 CONCLUSION AND FUTURE WORK5.1. CONCLUSION AND FUTURE WORK We have illustrated the procedures of our proposed algorithm, includ-ing the state vector deﬁnition, the motion model for moving objects, featureinitialization and the classiﬁcation algorithm. The proposed dynamic in-verse depth parametrization achieves the undelayed feature initialization.The parametrization is able to encode both static features and moving fea-tures. The algorithm beneﬁts from the undelayed initialization and thushave a better estimation about the camera and both static features and mov-ing features. Also, the parametrization provides a way for tracking a mov-ing feature. Both the location and the velocity information are estimatedinside the state vector. Further, the little computation cost classiﬁcation al-gorithm provides the feasibility for real-time SLAM in dynamic environ-ments. For dealing with more dynamic environments, applying the ConstantAcceleration Model would be a possible solution for dealing with high-degree motion pattern objects. We plan to investigate the tracking perfor-mance for the moving objects with more complicated motion pattern in thefuture. Also, approaches for dealing with move-stop-move objects will be a 30
41.
5.1 CONCLUSION AND FUTURE WORKfurther research interest. In addition, SLAM with generalized object usinga stereo camera could be further studied. 31
42.
BIBLIOGRAPHYBIBLIOGRAPHYCivera, J., Davison, A. J., & Montiel, J. M. M. (2008). Inverse depth para- metrization for monocular SLAM. IEEE Transactions on Robotics, 24(5), 932–945.Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067.Hartley, R. & Zisserman, A. (2004). Multiple View Geometry in Computer Vi- sion. Cambridge University Press.Lemaire, T., Berger, C., Jung, I.-K., & Lacroix, S. (2007). Vision-based slam: Stereo and monocular approaches. International Journal of Computer Vision, 74(3), 343–364.Migliore, D., Rigamonti, R., Marzorati, D., Matteucci, M., & Sorrenti, D. G. (2009). Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments. In ICRA Work- shop on Safe navigation in open and dynamic environments: Application to au- tonomous vehicles.Montiel, J. M. M., Civera, J., & Davison, A. J. (2006). Uniﬁed inverse depth parametrization for monocular slam. In Robotics: Science and Systems, Philadelphia, USA. 32
43.
BIBLIOGRAPHYParsley, M. P. & Julier, S. J. (2008). Avoiding negative depth in inverse depth bearing-only SLAM. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (pp. 2066–2071)., Nice, France.Sola, J. (2007). Towards Visual Localization, Mapping and Moving Objects Track- ing by a Mobile Robot: a Geometric and Probabilistic Approach. PhD thesis, Institut National Polytechnique de Toulouse.Vidal-Calleja, T., Bryson, M., Sukkarieh, S., Sanfeliu, A., & Andrade-Cetto, J. (2007). On the observability of bearing-only slam. In IEEE International Conference on Robotics and Automation (ICRA), (pp. 4114–4119)., Roma, Italy.Wangsiripitak, S. & Murray, D. W. (2009). Avoiding moving outliers in vi- sual slam by tracking moving objects. In IEEE International Conference on Robotics and Automation (ICRA), (pp. 375–380)., Kobe, Japan. 33
44.
BIBLIOGRAPHYDocument Log: Manuscript Version 1 — 19 July 2010 Typeset by AMS-L TEX — 19 August 2010 A C HEN -H AN H SIAO T HE R OBOT P ERCEPTION AND L EARNING L AB ., D EPARTMENT OF C OMPUTER S CI -ENCE AND I NFORMATION E NGINEERING , N ATIONAL TAIWAN U NIVERSITY, N O .1, S EC .4, R OOSEVELT R D ., D A - AN D ISTRICT, TAIPEI C ITY, 106, TAIWAN , Tel. : (+886) 2-3366-4888 EXT.407 E-mail address: r97922120@ntu.edu.tw Typeset by AMS-L TEX A 34
Be the first to comment