2.
A. Representation in Monocular SLAM with GO For generalized objects, the constant velocity model is applied and the location of a generalized object at the next Following the EKF-based SLAM framework, a state vector frame can be calculated in a closed form:χ in SLAM with GO consists of the pose and velocity ofthe camera/robot and locations and velocities of generalizedobjects. oi k+1 = loc(oi ) + vk · ∆t k i (7) 1 i = ri + Gk + vk · ∆t ⊤ ⊤ χ = (x⊤ , o1 , o2 , . . . , on ⊤ )⊤ (1) ρk k k k k 1where xk composes of the camera position rW in the world = ri + Gk+1 ρk+1coordinate system, the quaternion deﬁning the camera orien- where Gk and Gk+1 are the directional vectors.tation qW in the world coordinate system, and the camera In the observation update stage of EKF, generalized ob-velocity vW in the world coordinate system and the camera jects are transformed to the camera coordinate system andangular velocity ω C in the camera coordinate system. W then projected on the camera image plane. Let Rc be the k r rotation matrix deﬁned by the camera orientation qc . The k qW points are transformed to the camera coordinate by: xk = W v (2) hoi x ωC hoi = hoi = h(oi , xk ) (8) k y koi denotes the i-th generalized object which can be stationary hoi z kor moving in the framework of SLAM with GO. The existing 1 = Rc ri + k k Gk (θk , φk ) − rW k (9)parametrization for only stationary objects is insufﬁcient ρkto represent generalized objects. In [10], the inverse depth The predicted measurements on the image plane are:parametrization is added with 3-axis speeds in the world ocoordinate system to represent generalized objects. Each f hxi u u0 − dx o ho igeneralized object is coded with the 9-dimension state vector. zoi k = = P roj(hoi ) k = z hyi (10) v v0 − f oi dy hz ⊤ ⊤ ⊤ where P roj is the project function, (u0 , v0 ) is the camera oi k = oi k i vk (3) center in pixels, f is the focal length, dx and dy represent the pixel size. The monocular SLAM with GO state vector ⊤ is updated by the EKF algorithm. ok = xk yk zk θk φk ρk (4) C. Undelayed Initializationwhere ok is the 3D location of a generalized object presented The location initialization of a new generalized object isusing the inverse depth parametrization. xk yk zk is the the same as the approach in [3]. To initialize the velocitycamera location when this object is ﬁrst observed. of a generalized object, v0 is set to be 0 in this work. The covariance value of the velocity estimate σv is designed to vk = x vk y vk z vk ⊤ (5) cover its 95% acceptance region [−|v|max , |v|max ].where vk denotes the 3-axis velocities of the generalized |v|maxobject in the world coordinate system. σv = (11) 2 The 3D location of a generalized object w.r.t. to the worldcoordinate system can be computed as: |v|max is set to 3 m/sec in this work. A generalized object is augmented to the state vector of SLAM with GO xk straightforwardly and the new covariance of the SLAM with 1 loc(oi ) = yk + k × G(θk , φk ) (6) GO state is updated accordingly. ρk zk By using the modiﬁed inverse depth parametrization, awhere the direction vector G(θ, φ) deﬁnes the direction of new generalized object is augmented to the state vector at thethe ray and ρ is the inverse depth between the feature and ﬁrst observed frame. Through this undelayed initialization,camera optical center. the proposed monocular SLAM with GO system uses all measurements to estimate both the camera pose and theB. EKF-based SLAM with GO locations generalized objects. In the prediction stage of the EKF algorithm, the constant D. Classiﬁcation in SLAM with GOvelocity and constant angular velocity motion model [3] is In SLAM with GO, generalized objects are represented in-applied for predicting the camera pose at the next time step stead of representing stationary and moving objects directly.as the camera is the only sensor used to accomplish SLAM From a theoretical point of view, it could be unnecessary towith GO in this work. have a stationary and moving object classiﬁcation module in 4061
3.
SLAM with GO. However, from a practical point of view,stationary objects could further improve the accuracy andconvergence of the state estimates under the static objectmodel. After classiﬁcation, generalized objects can be easilysimpliﬁed to stationary objects for reducing the size of thestate vector or can be maintained as moving objects usingthe same parametrization of generalized objects. In addition,stationary objects could contribute to loop detection morethan moving objects in most cases. Stationary and moving (a) Target 1 (a static object marked with green circle). The true velocity of Target 1 is (0, 0, 0)object classiﬁcation still plays a key role in SLAM with GO.III. S TATIONARY AND M OVING O BJECT C LASSIFICATIONUSING V ELOCITY E STIMATES FROM MONOCULAR SLAM WITH GO Stationary and moving object classiﬁcation from a movingcamera is a daunting task. In this section, the proposed clas-siﬁcation approach using velocity estimates from monocularSLAM with GO is described in detail. (b) Target 2 (a moving object marked with green circle). The true velocity of Target 2 is (1, 0, 0)A. Velocity Convergency For using velocity estimates from SLAM with GO to clas-sify stationary and moving objects, it should be demonstratedthat velocity estimates can be converged and are sufﬁcientto differentiate stationary and moving objects. A simulationusing the proposed monocular SLAM with GO is discussedto show the feasibility of the proposed classiﬁcation algo-rithm. In this simulated scene, there were 40 static landmarksand 2 moving landmarks. 39 static landmarks were added (c) Target 3 (a moving object marked with green circle). The true velocity ofto the state vector using the inverse depth parametrization Target 3 is (0, 0, 0.5)as known features. One static landmark (Target 1) and two Fig. 1. Velocity convergency of 3 targets under an observable condition.moving landmarks (Target 2 and Target 3) were initialized The estimates using the modiﬁed inverse depth parametrization are con-as generalized objects at the ﬁrst observed frame using the verged to the true values.modiﬁed inverse depth parametrization and were added tothe state vector. The camera trajectory was designed as ahelix. Fig. 1 shows the velocity estimates of Targets 1, 2 (0, 0, 0)⊤ . The score reveals the relative likelihood of theand 3 after 150 EKF updates. velocity variable to occur at (0, 0, 0)⊤ . In this simulation, the estimates using the modiﬁed inverse If a generalized object oi is static, its velocity vk is k i ⊤depth parametrization are converged to the true values in expected to converge closed to (0, 0, 0) . This score wouldterms of both locations and velocities whether the target thus increases and exceeds a threshold ts .is moving or stationary. The simulation result shows that 2) Score function for detecting moving objects: Givenmoving object classiﬁcation using velocity estimates from a 3-dimension velocity distribution X = N (µ, Σ) of aSLAM with GO should be feasible. generalized object, the score function is deﬁned as: 2B. Thresholding Classiﬁcation Cm (X) = DX (0) = (0 − µ)⊤ Σ−1 (0 − µ) (13) As velocity estimates directly show the motion properties where DX is the Mahalanobis distance function under dis-of generalized objects, two simple score functions are deﬁned tribution X. Mahalanobis distance is often used for data ihere for thresholding classiﬁcation. association and outlier detection. For a moving feature ok , i ⊤ 1) Score function for detecting static objects: Given the its velocity vk is expected to converge away from (0, 0, 0) .3-dimension velocity distribution X = N (µ, Σ) of a gener- The score would thus increases and exceeds threshold tm ifalized object, the score function is deﬁned as: the generalized object is moving. 1Cs (X) = fX (0) = 1 (12) C. Classiﬁcation States in SLAM with GO (2π)3/2 |Σ|1/2e− 2 (0−µ)⊤ Σ−1 (0−µ) There are three classiﬁcation states in SLAM with GO:where fX is the probability density function of Gaussian unknown, stationary and moving. Each new feature is initial-distribution X. This score function calculates the proba- ized at the ﬁrst observed frame using the modiﬁed inversebility density function value of the velocity distribution at depth parametrization and the classiﬁcation state is set to 4062
4.
25 25 1 1 20 misclassified ratio[moving object] misclassified ratio[moving object] 20 0.9 misclassified ratio[static object] 0.9 misclassified ratio[static object] 15 wrongly classified [static object] wrongly classified [static object] 15 0.8 0.8 Z[m] 10 Z[m] 10 5 0.7 0.7 5 0 0.6 0.6 error ratio error ratio 0 −5 0.5 0.5 5 −5 5 0.4 0.4 0 0 0.3 0.3 −5 −5 25 Y[m] 0.2 0.2 25 20 20 15 15 0.1 0.1 10 10 X[m] 5 X[m] 5 0 0 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 −5 −5 t t s s (a) Observable scenario (b) Unobservable scenario (a) Misclassiﬁed ratio in the observ- (b) Misclassiﬁed ratio in the unob- able scenario servable scenarioFig. 2. The simulation scenarios to evaluate the effects of differentthresholds on the classiﬁcation results. Grey dots denote stationary objects Fig. 3. Effects of the different ts on the classiﬁcation results. tm is ﬁxedand black dots with lines denote moving objects and their trajectories. at 3.5830.unknown. In each of the following observed frames, the two error are accordingly deﬁned:score functions are computed based on the estimated velocity Wrongly Classiﬁed Error: If a feature is classiﬁed asdistribution for determining if the feature is stationary or a different type as it should be, this feature is wronglymoving. classiﬁed. For instance, a static feature is classiﬁed as moving 1) From Unknown to Stationary: If the score value Cs (X) or a moving feature is classiﬁed as stationary.of a new feature exceeds the predetermined threshold ts at Misclassiﬁed Error: If a feature is wrongly classiﬁed ora certain frame, this new feature is immediately classiﬁed as not be classiﬁed as either static or moving, this feature isstationary. The velocity distribution of this feature is adjusted misclassiﬁed.to satisfy the property of a static object in which the velocity a) Effects of ts : Fig. 3(a) and Fig. 3(b) show the effectsis set to (0, 0, 0)⊤ and the corresponding covariance is also of ts under an observable and an unobservable condition,set to 0. There will be no motion prediction at the prediction respectively. In both scenarios, the misclassiﬁed ratio ofstage to ensure the velocity of this object ﬁxed at (0, 0, 0)⊤ . stationary features increases when the threshold ts increases, 2) From Unknown to Moving: If the score value Cm (X) while the misclassiﬁed ratio of moving features decreases.of a new feature exceeds the predetermined threshold tmat a certain frame, this feature is immediately classiﬁed as The misclassiﬁed ratio of moving features is decreasingmoving. As the feature has been initialized with the modiﬁed when ts is increasing under the observable situation. Mean-inverse depth parametrization and both the position and the while, the misclassiﬁed ratio of moving features is decreasingvelocity are already being estimated, there is no need to when ts is increasing under the unobservable situation.adjust the state distribution and the motion model. These results satisfy the expectation that a larger threshold ts would result in less features classiﬁed as static. Thus,D. Simulation Results when a larger threshold ts is chosen, the misclassiﬁed ratio The proposed classiﬁcation approach is ﬁrst evaluated of static features would increase and misclassiﬁed ratio ofusing Monte Carlo simulations with the perfect ground truth moving features would decrease. The trade-off between thesein this section. The effects of different thresholds and moving two misclassiﬁed ratios could be considered according to theobject speeds on classiﬁcation are shown and discussed. usage of the monocular SLAM with GO system. 1) Effect of Different Thresholds under Observable and Furthermore, the classiﬁcation performance is better underUnobservable Conditions: While the observability issues observable situations than under unobservable conditions byof SLAM and bearings-only tracking are well understood, comparing Fig. 3(a) and Fig.3(b). The observable situationsSLAM with GO also inherits the observability issues of could be critical to achieve better classiﬁcation performanceSLAM and bearings-only tracking. In other words, velocity based on the proposed approach.estimates of generalized or moving objects may not be b) Effects of tm : Fig. 4(a) and Fig. 4(b) show the effectsconverged in unobservable conditions. Two scenarios, one of tm . In both scenarios, the misclassiﬁed ratio of staticis under an observable condition and the other is under an features decreases when the threshold tm increases, while theunobservable condition, were designed as depicted in Fig. misclassiﬁed ratio of moving features increases. This ﬁnding2(a) and Fig. 2(b). In Fig. 2(a), the camera moved at a non- satisﬁes our expectation that a larger threshold tm wouldconstant speed on a circle to avoid unobservable situations. result in less features classiﬁed as moving. Accordingly,In Fig. 2(b), the camera moved at a constant speed on four when a larger threshold tm is chosen, the misclassiﬁed ratioconnected lines to test the classiﬁcation performance under of static features would decrease and misclassiﬁed ratio ofan observable situation. 300 static landmarks and 288 moving moving features would increase. However, it should be notedlandmarks with different speed were randomly located in a that only a small portion of misclassiﬁed features are caused3D cube with a width of 30 meters in each scenario. Each by wrongly classiﬁcation. This means that the proposedscenario has 50 Monte Carlo simulations. classiﬁcation algorithm does not provide incorrect results. As there are three possible states (unknown, static and The situations should be more about insufﬁcient data formoving), the wrongly classiﬁed error and the misclassiﬁed classiﬁcation. 4063
5.
1 1 estimation error of the camera misclassified ratio[moving object] misclassified ratio[moving object] 2 0.9 misclassified ratio[static object] 0.9 misclassified ratio[static object] wrongly classified [static object] wrongly classified [static object] 0.8 0.8 1.8 0.7 0.7 0.6 0.6 1.6 error ratio error ratio 0.5 0.5 camera estimation error [m] 0.4 0.4 1.4 0.3 0.3 1.2 0.2 0.2 0.1 0.1 1 0 0 1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4 t t m m 0.8 (a) Misclassiﬁed ratio in the observ- (b) Misclassiﬁed ratio in the unob- 0.6 able scenario servable scenario 0.4Fig. 4. Effect of different tm on the classiﬁcation results. ts is ﬁxed at1.6. 0.2 0 1 30 60 120 180 240 300 360 420 480 540 600 0.9 misclassified ratio wrongly classified ratio frame 0.8 25 (a) The estimate errors of the camera pose. Exploring: Frame 0.7 1 to Frame 450), Revisiting: Frame 450 to Frame 600. classification steps 0.6 20 error ratio estimation error of the static objects estimation error of the moving objects 0.5 12 12 0.4 15 0.3 moving object estimation error [m] 10 10 static object estimation error [m] 0.2 10 8 8 0.1 0 5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 6 6 moving speed moving speed 4 4 (a) Classiﬁcation performance on (b) Number of frame needed for moving objects with different speeds classiﬁcation 2 2 0 0 6 12 18 24 30 36 42 48 54 60 6 12 18 24 30 36 42 48 54 60 Fig. 5. Effects of speed variation of moving objects on classiﬁcation. frame frame (b) The estimate errors of the static (c) The estimate errors of the mov- objects ing objects 2) Effects of Speed Variation of Moving Objects: In this Fig. 6. Convergency of the proposed SLAM with GO algorithm shownsimulation, the effects of speed variation of moving objects with boxplot. The lower quartile, median, and upper quartile values of eachare evaluated. Fig. 5(a) shows that the classiﬁcation error box shows the distribution of the estimate errors of all the objects in each observed frame.decreases when the speeds of moving objects increase underthe setting of ts = 1.6 and tm = 3.5830. Regardingstationary objects, the error ratio is near 0 which means that A. Unobservable Situationsalmost all stationary objects are correctly detected. Fig. 5(b) shows the number of frames needed for classi- As shown in the previous section, the proposed clas-ﬁcation. Recall that the number of frames for classiﬁcation siﬁcation approach could be less reliable in unobservableis data-driven and not ﬁxed in the proposed approach. The situations than in observable situations. A simulation wasresult shows that the number of frames needed decreases designed to analyze the effects of unobservable situations.when the speeds of moving objects increase. The scenario here is the same as the scenario in Fig. 1 except These two ﬁndings satisfy the expectation that moving that the camera moves at a constant velocity. Fig. 7 showsobjects at higher speed can be detected within fewer frames the velocity distribution of these 3 targets using the modiﬁedand the classiﬁcation results are more accurate. inverse depth parametrization after 150 EKF steps. 3) Convergency of our SLAM algorithm: The conver- The results show that the estimate uncertainties of thesegency of the proposed monocular SLAM with GO algorithm targets could be too large to reliably determine if these targetsin the observable scenario is checked. The estimate errors of are static or moving. In addition, the observations of Targetthe camera, static features and moving features of 50 Monte 1 are the same as Target 3 under this camera trajectoryCarlo simulation results are shown in Fig. 6. The errors of the even though Target 1 is static and Target 3 is moving. Itcamera pose estimates increase when the robot is exploring is impossible to correctly classify Target 1 and Target 3the environment from Frame 1 to Frame 450. The camera using the observations from a camera moving at a constantstarts to close loop from Frame 450 and the errors decrease velocity. This means that we can ﬁnd a correspondingwhich is depicted in Fig. 6(a). Fig. 6(b) and Fig. 6(c) show moving object whose observations are the same as a speciﬁcthat the estimate errors of static features and moving features stationary object in unobservable situations. Note that suchdecrease when the number of the frames increases. corresponding moving objects must moves parallelly to the camera and at some speciﬁc speeds. IV. C LASSIFICATION FAILURE IN U NOBSERVABLE However, the velocity distribution of Target 2 reveals S ITUATIONS another fact. The 95% conﬁdence region of the velocity In this section, we discuss the failures of the proposed estimate of Target 2 does not cover the origin point (0, 0, 0)⊤ .stationary and moving object classiﬁcation in unobservable This means that Target 2 can be correctly classiﬁed assituations. moving using the proposed approach even in unobservable 4064
6.
(a) Target 1 (static object marked with green circle). The true velocity of Target Fig. 8. The NTU-PAL7 robot.1 is (0, 0, 0) all classiﬁed correctly as static, and 12 moving features are also classiﬁed correctly as moving. classiﬁcation state static moving unknown Static 107 0 0 Moving 0 12 0(b) Target 2 (moving object marked with green circle). The true velocity of TABLE ITarget 2 is (1, 0, 0) T OTAL CLASSIFICATION RESULT OF REAL EXPERIMENT Fig. 9 shows some of the input images and corresponding SLAM with GO results. At the beginning of the experiment, the checkerboard with known sizes was used for estimating the scale. It is demonstrated that the camera poses are prop- erly estimated, the 3D feature-based map is constructed using the proposed monocular SLAM with GO algorithm, and(c) Target 3 (moving object marked with green circle). The true velocity of moving features are correctly detected using the proposedTarget 3 is (0, 0, 0.5) velocity estimate-based classiﬁcation approach. Fig. 7. Velocity convergency of 3 targets in an unobservable condition. VI. CONCLUSION AND FUTURE WORK We proposed a simple yet effective static and movingsituations. We argue that no static object would have the same object classiﬁcation method using the velocity estimatesprojection as the non-parallelly moving objects. Therefore, directly from monocular SLAM with GO. The promisingit is feasible to determine thresholds to correctly classify results of Monte Carlo simulations and real experimentsnon-parallelly moving objects as moving under unobservable have demonstrated the feasibility of the proposed approach.situations. The modiﬁed inverse depth parametrization and the proposed classiﬁcation method achieves undelayed initialization in monocular SLAM with GO. We also showed the interesting V. E XPERIMENTAL R ESULTS issues of classiﬁcation in unobservable situations. Fig. 8 shows the robotic platform, NTU-PAL7, in which The constant acceleration model and other more advanceda Point Grey Dragonﬂy2 wide-angle camera was used to motion models should be applied for tracking moving objectscollect image data with 13 frames per second, and a SICK with high-degree motion patterns. Evaluating the trackingLMS-100 laser scanner was used for ground truthing. The performance of moving objects with more complicated mo-ﬁeld of view of the camera is 79.48 degree. The resolution of tion pattern using SLAM with GO is of our interest. Inthe images is 640 × 480. The experiment was conducted in addition, solutions to move-stop-move maneuvers should bethe basement of our department. 1793 images were collected developed.for evaluating the overall performance of monocular SLAMwith GO such as loop closing, classiﬁcation and tracking. In R EFERENCESthis data set, there is a person moving around and appearing [1] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam:3 times in front of the camera. Real-time single camera slam,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, June 2007. There are 107 static features and 12 moving features. [2] T. Lemaire, C. Berger, I.-K. Jung, and S. Lacroix, “Vision-based slam:When the person appeared, 4 features on the person were Stereo and monocular approaches,” International Journal of Computergenerated and initialized. Table I shows the performance of Vision, vol. 74, no. 3, pp. 343–364, September 2007. [3] J. M. M. Montiel, J. Civera, and A. J. Davison, “Uniﬁed inversethe proposed classiﬁcation algorithm. None of the feature is depth parametrization for monocular slam,” in Robotics: Science andwrongly classiﬁed. 107 static features in the environment are Systems, Philadelphia, USA, August 2006. 4065
7.
(a) Frame 10. (b) Frame 330. The person appeared (c) Frame 950. The person appeared (d) Frame 1350. The person appeared at the ﬁrst time. 4 feature are located at the second time. at the third time. and initialized in the state vector. (e) Top view: Frame 10. (f) Top view: Frame 330. (g) Top view: Frame 950. (h) Top view: Frame 1350.Fig. 9. (a)-(d) show the examples of input images and the results of feature extraction and data assoication. (e)-(f) are the top views of the correspondingSLAM with GO results. In (a)-(d), blue squares are static features, red dots are moving features, green dots are new initialized features with unknownstates, and cyan dots are non-associated features. Ellipses show the projected 2σ bounds of the features. In (e)-(f), black and grey triangles and linesindicate the camera poses and trajectories from monocular SLAM with GO and LIDAR-based SLAM with DATMO. Gray points show the occupancy gridmap from the SLAM part of LIDAR-based SLAM with DATMO. All the estimation of visual features are inside the reasonable cube. Squares indicate thestationary features and blue shadows indicates the 95% acceptance regions of the estimates. [4] J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse depth ference on Robotics and Automation (ICRA), Kobe, Japan, May 2009, parametrization for monocular SLAM,” IEEE Transactions on pp. 375–380. Robotics, vol. 24, no. 5, pp. 932–945, October 2008. [8] D. Migliore, R. Rigamonti, D. Marzorati, M. Matteucci, and D. G. [5] C.-C. Wang, C. Thorpe, S. Thrun, M. Hebert, and H. Durrant-Whyte, Sorrenti, “Use a single camera for simultaneous localization and map- “Simultaneous localization, mapping and moving object tracking,” The ping with mobile object tracking in dynamic environments,” in ICRA International Journal of Robotics Research, vol. 26, no. 9, pp. 889– Workshop on Safe navigation in open and dynamic environments: 916, September 2007. Application to autonomous vehicles, 2009. [6] J. Sola, “Towards visual localization, mapping and moving [9] R. Hartley and A. Zisserman, Multiple View Geometry in Computer objects tracking by a mobile robot: a geometric and Vision. Cambridge University Press, March 2004. probabilistic approach.” Ph.D. dissertation, Institut National [10] K.-H. Lin and C.-C. Wang, “Stereo-based simultaneous localization, Polytechnique de Toulouse, February 2007. [Online]. Available: mapping and moving object tracking,” in IEEE/RSJ International http://homepages.laas.fr/jsola/JoanSola/eng/JoanSola.html Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010. [7] S. Wangsiripitak and D. W. Murray, “Avoiding moving outliers in visual slam by tracking moving objects,” in IEEE International Con- 4066
Be the first to comment