IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012                                        ...
900                                                         IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. ...
902                                              IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, ...
904                                                 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. ...
906                                                       IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13...
908                                                         IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. ...
910                                                        IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 1...
Real time computer vision dgps-aided inertial navigation system for lane-level vehicle navigation
Real time computer vision dgps-aided inertial navigation system for lane-level vehicle navigation
Upcoming SlideShare
Loading in …5

Real time computer vision dgps-aided inertial navigation system for lane-level vehicle navigation


Published on

services on......
If ur intrested in these project please feel free to contact us@09640648777,Mallikarjun.V

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Real time computer vision dgps-aided inertial navigation system for lane-level vehicle navigation

  1. 1. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012 899 Real-Time Computer Vision/DGPS-Aided Inertial Navigation System for Lane-Level Vehicle Navigation Anh Vu, Student Member, IEEE, Arvind Ramanandan, Anning Chen, Jay A. Farrell, Fellow, IEEE, and Matthew Barth, Senior Member, IEEE Abstract—Many intelligent transportation system (ITS) appli- I. I NTRODUCTIONcations will increasingly rely on lane-level vehicle positioning thatrequires high accuracy, bandwidth, availability, and integrity.Lane-level positioning methods must reliably work in real timein a wide range of environments, spanning rural to urban areas. V EHICLE positioning is becoming a critical component in many intelligent transportation system (ITS) [1] and advanced vehicle control and safety system (AVCSS) [2] appli-Traditional positioning sensors such as the Global Navigation cations where a number of parameters related to the vehicle’sSatellite Systems may have poor performance in dense urbanareas, where obstacles block satellite signals. This paper presents a state such as its position, velocity, acceleration, orientation,sensor fusion technique that uses computer vision and differential and angular rates are required by applications related to topicspseudorange Global Positioning System (DGPS) measurements to such as vehicle control, vehicle navigation, safety systems, andaid an inertial navigation system (INS) in challenging environ- traffic management. These anticipated applications (particu-ments where GPS signals are limited and/or unreliable. To sup- larly applications that are related to vehicle control and safetyplement limited DGPS measurements, this method uses mappedlandmarks that were measured through a priori observations (e.g., systems) continuously require lane-level position informationtraffic light location data), taking advantage of existing infrastruc- (high accuracy) at a high sample rate in diverse driving environ-ture that is abundant within suburban/urban environments. For ments (availability) [3]. It is possible to achieve these criteriaexample, traffic lights are easily detected by color vision sensors by adding more infrastructure and onboard sensors; however,in both day and night conditions. A tightly coupled estimation the cost of the solution should also be taken into account if itprocess is employed to use observables from satellite signals andknown feature observables from a camera to correct an INS that is will be widely adopted. Traditional low-cost approaches thatformulated as an extended Kalman filter. A traffic light detection utilize an inertial navigation system (INS), which integratesmethod is also outlined, where the projected feature uncertainty high-rate inertial measurements from a microelectromechani-ellipse is utilized to perform data association between a predicted cal systems (MEMS)-based inertial measurement unit (IMU)feature and a set of detected features. Real-time experimental or wheel encoder measurements, can provide continuity andresults from real-world settings are presented to validate theproposed localization method. availability in most driving scenarios. However, their accuracy deteriorates over time, unless periodically calibrated by other Index Terms—Advance vehicle control systems, advance vehicle sensors. Other low-cost approaches use a single low-bandwidthsafety systems, aided navigation, feature detection, image/video aiding, inertial navigation, intelligent transportation systems sensor such as a Global Positioning System (GPS) or feature-(ITS), land transportation, localization, sensor fusion. based approaches combined with a map (e.g., vision sensors, lidar, or radar). Such single-sensor approaches may provide high-accuracy position estimates but are not continuously avail- able and may not provide the full vehicle state. Because these low-cost sensors are complementary, information from these Manuscript received February 25, 2011; revised July 12, 2011 and sensors can be combined to provide the levels of accuracy,October 17, 2011; accepted December 28, 2011. Date of publication March 9,2012; date of current version May 30, 2012. This work was supported in part continuity, and availability required by many ITS applications,by the U.S. Department of Transportation Federal Highway Administration particularly applications that are related to AVCSS applications.(FHWA) through the Exploratory Advanced Research Program under Grant Various sensor fusion approaches between GPS and INSDTFH61-09-C-00018, the University of California through the MulticampusResearch and Program Initiatives (MRPI) Program under Award 142787, and have been explored in the literature [13]–[18], which lookthe California Department of Transportation (Caltrans) under Grant 65A0261. at both tightly coupled and loosely coupled GPS-aided INSThe contents of this paper reflect the views of the authors, which do not reflect systems for vehicle positioning applications. Although thesethe official views of FHWA, University of California MRPI, and Caltrans. TheAssociate Editor for this paper was P. Grisleri. solutions have been demonstrated and many approaches have This paper has supplementary downloadable material available at been proven to work in real time, their performance can degrade, provided by the authors. in suburban/urban areas where the GPS receiver does not have The authors are with the Department of Electrical Engineering, Univer-sity of California, Riverside, CA 92521 USA (e-mail:; a sufficient diversity of satellite signals. Sensor fusion;;; barth@ee. a vision sensor and INS, on the other hand, has the to perform better in dense urban areas, because there are an Color versions of one or more of the figures in this paper are available onlineat abundance of structured features that can be surveyed and Digital Object Identifier 10.1109/TITS.2012.2187641 detected by computer vision algorithms. Many of the proposed 1524-9050/$31.00 © 2012 IEEE
  2. 2. 900 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012 that are intended to be highly visible to approaching vehicles but not from other directions. In addition, most traffic lights currently use light-emitting diodes (LEDs), which, in the future, could be switched at high rates to communicate information (e.g., ID, location, and timing) to the vehicle. This approach is an elegant solution to the positioning problem, which fuses multiple low-cost sensors to allow for a positioning system with high degrees of accuracy, continuity, and availability. In ad- dition, this solution takes advantage of existing infrastructure, which can be used by many vehicles already equipped with this set of onboard sensors. This paper is organized as follows. Section II provides a brief review of the GPS-aided INS and vision-aided INS literature. Section III describes the sensor setup, defines the reference frames, and describes the camera parameters and calibration procedure. Section IV describes the estimation, including the high-rate sensor propagation and low-rate sensor measure- ments. Section V describes the traffic light detection. Section VI presents the experimental results. Concluding remarks are pro- vided in Section VII.Fig. 1. Bird’s-eye-view illustration of the vehicle localization in an urbansetting where GPS pseudorange measurements (TOA) are used to correct the II. BACKGROUNDlongitudinal position, and traffic light feature measurements (AOA) are used tocorrect the lateral position. The blue circle represents the sensor setup with a The focus of this paper is on using measurements of knownGPS receiver, an IMU, and a forward-looking camera. landmarks to aid the INS. There are two similar approaches: 1) structure-from-motion (SfM) and 2) simultaneous localiza-vision-aided INS solutions [19]–[24] and, to some extent, GPS tion and mapping (SLAM). SfM and SLAM represent twocombined with vision [25] have been demonstrated, but they alternative modes of estimation using the same suite of sensorshave a number of problems that are associated with real-time that derive most of their information from vision sensors. SfMperformance, operate only in the daytime, and are not targeted is typically a batch method whose main objective is to optimizeat land vehicle navigation applications (for example, most and reconstruct a rover’s trajectory and scene structure from aapproaches are geared toward spacecraft, aircraft navigation, or sequence of sensor observations (for example, see [4] and [5]).augmented reality). It requires that features are extracted and their correspondence In this paper, we propose a tightly coupled localization (data association) is known before the estimation occurs. Themethod using information from a differential pseudorange estimation optimizes the entire sequence of data; therefore, theGlobal Positioning System (DGPS) receiver and known image computation time does not scale well as the number of obser-features of traffic lights from a forward-looking camera to aid vations increases. Due to these constraints, SfM has largelyan INS. In addition, we present a new method of processing im- been used as a post processing tool. The framework used byages for detecting traffic lights in complex scenes. The localiza- SLAM is similar to the proposed localization method, excepttion method uses satellite pseudorange time-of-arrival (TOA) that, in SLAM, the landmarks’ positions are augmented to themeasurements, Doppler measurements between satellites and state vector and jointly estimated along with the vehicle statethe GPS antenna, and previously mapped visual landmarks [6]–[11]. SLAM does not need previously mapped landmarks,on an image taken by a camera that measures the angle-of- but real-time processing is limited as the state vector grows. Inarrival (AOA) to correct the INS. This method has the potential our application, we are more concerned with real-time perfor-for working in challenging dense urban areas, as illustrated mance and the three criteria of high accuracy, continuity, andin Fig. 1. availability; therefore, the following sections will focus on re- In a typical urban scenario, satellite signals along the ve- lated research on landmark-aided INS, particularly applicationshicle’s longitudinal direction are not obstructed by buildings using GPS and computer vision to aid the INS. Although bothand trees, and their pseudorange measurements can be used to vision and GPS-aided methods are reviewed in this section,reduce the uncertainty in the vehicle’s state in the longitudinal more emphasis will be placed on vision-aided INS, becausedirection. Mapped traffic lights, on the other hand, can be de- GPS-aided INS is very well documented in [13]–[18].tected by a forward-looking camera, and the AOA informationcan be used to correct the vehicle’s state along the lateral A. GPS-Aided INSdirection. Moreover, the combinations of these measurementshelp improve the state estimate in the vertical direction. Traffic Sensor fusion between GPS and INS has been around forlights are particularly useful. They have a known well-defined more than a decade, and it has extensively been studied andshape and light colors that are easy to detect and identify in both refined over time [13]–[18]. An extended Kalman filter (EKF)day and night. They are mounted on fixed structures at locations is often implemented. To correct the INS estimate whenever
  3. 3. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 901measurements from GPS are available, other Bayesian esti- less than 2 m. Attitude errors are consistently less than 5◦ . Real-mation frameworks have also been suggested, such as the time results are also reported, where the PMF provides absoluteunscented Kalman filter (UKF) [17] and the particle filter (PF) position aiding to the EKF at 7 Hz.[18]. Estimated position and orientation are computed in a Durrie et al. [20] proposed a vision-based navigation solutionglobal coordinate reference frame such as Earth-centered Earth- for unmanned aircraft operations on airfield surfaces in GPS-fixed (ECEF) or a fixed local tangent frame. These fusion denied environments. It uses a PF to estimate the rover’stechniques are divided into the following two types: 1) loosely absolute position by extracting edges in the image and thencoupled and 2) tightly coupled. comparing these with predicted edges from a lane-marking 1) Loosely Coupled GPS-Aided INS: For loosely coupled database. The absolute position computed from the PF is thensensor integration, the GPS receiver computes the antenna’s used to correct the INS, which is implemented in the form ofposition and velocity in an absolute reference frame, and the an EKF. Results of the sensor fusion are reported to run inmeasurements used by the correction framework include the real time and compared with a GPS output. The GPS-aidedprocessed absolute position and velocity of the GPS receiver. INS position output is used as a reference. The vision-basedWhen the number of visible GPS satellites is less than four, the navigation integrated with INS is reported to run in real timecorrection step cannot be carried out, because the GPS receiver and with position error consistently below 15 m.needs to track at least four satellites to compute the absolute Cappelle et al. [21] presented a GPS/vision-aided INS thatposition and velocity. This solution is considered less robust makes use of a 3-D cartographical model database for vehiclecompared to the tightly coupled solution, because it requires the applications. This method assumes that the vehicle moves on aline-of-sight to at least four satellites. This approach also yields planar surface that coincides with a tangent plane. The authorssuboptimal state estimation, because the GPS receiver does not estimate the position, velocity, yaw, and accelerometer biasesprovide the cross correlation between the estimated positions. for a 2-D motion model. The sensor measurements used to 2) Tightly Coupled GPS-Aided INS: In the tightly coupled correct the INS is an absolute position and attitude vectorarrangement, raw GPS measurements (pseudorange, Doppler obtained by image processing. The authors reported a maxi-shift, and carrier phase) from the receiver, for each satellite, mum error of 3 m when the vision-aided method is applied inare used to correct the INS. The correction step is useful experimentation.whenever the receiver tracks at least two satellites. Although the 2) Tightly Coupled Image Feature-Aided INS: In tightlyposition is not observable when the number of tracked satellites coupled approaches, image features that correspond to mappedis less than four, the GPS pseudorange, Doppler, and carrier- landmarks are extracted in the image plane. The residualphase (if available) measurements will provide information between the detected and the predicted feature locations to-to correct certain portions of the state vector whenever such gether with the feature measurement function are utilized inmeasurement information is available. The solution computed a Bayesian framework (typically in the form of an EKF) tofrom the tightly coupled framework provides a comparable or correct the INS. Shuang et al. [22] implemented a simulationbetter result than the loosely coupled framework. that demonstrates this approach for space vehicle landing. Simulated results show that the final-landing three-sigma un- certainty is less than 5 m when at least three landmarks areB. Vision-Aided INS available. Trawny et al. [23] demonstrated a system similar to As the processing capability of low-cost computers in- [22], showing results from data sets that they have collected oncreases, feature-based aiding through vision sensors is becom- both spacecraft and a rotorcraft. The approach ran offline oning more attractive, because many computer vision operations data recorded during spacecraft and rotorcraft landings usingcan now be processed in real time (for example, vehicle man- craters and fiducial markers as mapped landmarks. It is reportedufacturers currently use computer vision for lane-departure that the system can run in real time. The squared Mahalanobiswarning). Similar to GPS-aided INS, vision-aided INS can be distance test is employed to find feature correspondence andimplemented in a Bayesian framework and can be separated to reject outliers. Their results show that position standardinto the following two groups: 1) loosely coupled and 2) tightly deviations are below 5 m upon landing.coupled. Miller and Campbell [24] presented a PF approach for aug- 1) Loosely Coupled Image Feature-Aided INS: Estimation menting visual features, GPS, and inertial navigation to esti-approaches that belong in this category often use an imaging mate a vehicle’s position and attitude on a 2-D plane. Mappedsensor to compute the absolute position of the rover and then features include lane markings and stop lines, which are inte-use the computed absolute position as a measurement to correct grated into the PF. The results are compared to a carrier-phasethe INS. Conte and Doherty [19] applied a vision-aided INS differential pseudorange Global Positioning System (CDGPS)method based on visual odometry and georeferenced image solution with 10-cm accuracy, where the vision/GPS-aided INSmatching for aerial vehicles when there is GPS outage. Their positional accuracy is consistently less than 3.5 m and ran invision-aided component uses a point mass filter (PMF) to fuse real time on their test bed vehicle.visual odometry measurements together with the georeferenced Another related work by Bai and Wang [25] does not utilizeimage matching and an altimeter sensor to provide absolute inertial measurements while using multiple PFs to fuse videoposition aiding to an EKF. The authors compared the vision- and GPS sensors, as well as using a Geographical Informa-aided INS approach using the GPS-aided INS approach as a tion System (GIS) as a priori knowledge for navigation andreference, and they obtain position errors that are consistently augmented-reality applications. Although the visual feature
  4. 4. 902 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012tracking results from line features show good potential for thismethod, the paper lacks quantitative results to show the perfor-mance of the proposed method for navigation applications (e.g.,accuracy, availability, continuity, and whether this method canrun in real time). III. S ENSOR S ETUP, F RAME D EFINITION , AND C ALIBRATION This section describes our sensor setup, frame definitions,and calibration of camera parameters. For the rest of this paper,{A} denotes a frame in the A-coordinate system (A can be Fig. 2. Sensors are rigidly mounted on a platform that is mounted on top of the vehicle. The small black box in the middle contains a 6-DoF IMU and aa sensor, a rigid object, or a point attached to the earth). GPS receiver. A color camera with an optical axis nominally aligned with theA R denotes a 3 × 3 rotation matrix from {A} to {B}. TheB IMU’s u-axis is mounted on one of the corners of the plate.symbol C T AB denotes the 3 × 1 vector described in the {C}-coordinates pointing from {A} to {B}. [a×] represents a 3 × 3skew-symmetric matrix representation of a 3 × 1 vector a. Theangular rate vector A ω BC that represents the rate of rotationof {C} with respect to {B}, represented in {A}, may alsobe represented as the 3 × 3 skew-symmetric matrix A ΩBC =[A ω BC ×].A. Sensor Setup The sensor suite on the rover consists of the followingthree sensors: 1) a six-degree-of-freedom (6-DoF) MEMS IMU(accelerometers and gyros measure accelerations and rotationrates along the three orthogonal axes u, v, and w); 2) a GPSreceiver that can output pseudorange, Doppler, and carrier-phase measurements; and 3) a color camera that can be hard-ware/software triggered. This set of sensors is shown in Fig. 2.A body frame {B} is defined at the GPS antenna’s phase center,where the orthogonal u-, v-, and w-axes coincide with the threeorthogonal axes on the IMU. The IMU is placed close to theantenna’s phase center so that the inertial measurements from Fig. 3. Relations between frames are illustrated. Quantities in black arethe IMU can be interpreted as the measured inertial readings at surveyed and known, quantities in blue can be calibrated offline, and quantities in red are the rover’s pose, which will be estimated online.the origin of {B}, where the lever arm effect is ignored. Thecamera is a standard digital camera that communicates overIEEE 1394a or USB 2.0 interfaces. The color camera can be optical axis. The camera’s internal parameters, as showntriggered by an external transistor–transistor logic signal, or it in Fig. 3, are f (focal length), [cx , cy ] (principal point),can be set up to be software triggered using a function from the and [k1 , k2 , p1 , p2 ] (the radial and tangential distortionapplication programming interface. coefficients). • A camera calibration object that could be a 3-D object or a planar pattern, with frame {G} attached to the object,B. Frame Definitions where identifiable features on the object are defined with respect to {G}’s origin. The following frames are illustrated in Fig. 3. • A navigation frame denoted by {N } is defined to coincide C. Camera Calibration with a local tangent frame, with the axes pointing along the north, east, and down (NED) directions. A camera model is necessary to project mapped landmarks • The body frame {B} is defined at the GPS antenna’s phase onto the image. This model is a function of the camera inter- center, which maps the IMU’s u-, v-, and w-axes along the nal and external parameters, which can be calibrated offline. XB -, YB -, and ZB -axes. This section describes the camera parameters, and it discusses • The camera frame is centered at the focal point and the calibration of various quantities to allow vision aiding of is denoted by {C}, where ZC represents the camera’s the INS.
  5. 5. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 903 1) Camera Model and Internal-Parameter Calibration: The parameters relate the measurements from the camera to thecamera measurement model is body frames, which allow camera measurements to correct ⎡C ⎤ ⎡N ⎤ quantities in the state vector. X X C P = ⎣ C Y ⎦ = C R ⎣ N Y ⎦ + C T CN N C N IV. E STIMATION P ROCESS Z Z = C RN P + C T CN N (1) This section describes the real-time estimation, which uses C the kinematic model, the error-state model, EKF update, and X = C X/C Z measurement models (e.g., pseudorange (TOA), Doppler mea- C Y = C Y /C Z (2) surements from the GPS, and AOA measurements from a camera). In addition, analysis of the measurements are given r2 = (C X )2 + (C Y )2 (3) in each of the measurement sections to illustrate how these C C 2 X = X (1 + k1 r + k2 r ) 4 measurements can be combined to resolve the state estimation problem in an urban scenario. In this approach, during the + 2p1 C X C Y + p2 r2 + 2(C X )2 real-time estimation, C IN T , C R, C T CB , and N P (a mapped B C Y = C Y (1 + k1 r2 + k2 r4 ) landmark point such as a traffic light) are fixed and known. The main reference for this section is [15]. + 2p2 C X C Y + p1 r2 + 2(C Y )2 (4) The rover’s state vector is defined as u = fx · C X + cx xr = [N pT , N v T , ΘT ]T (8) v = fy · Y + cy C (5) where N p = [n, e, d]T , N v = [vn , ve , vd ]T , and the orientationwhere the landmark positions in the navigation frame P N of the rover is described by the Euler angles Θ = [φ, θ, ψ]Tare known. Vision aiding will use (1)–(5) and will require (roll, pitch, and yaw). The augmented state vector is x =knowledge of the extrinsic parameters C R and C T CN and N [N pT , N v T , N R, bT , bT ]T , where {ba , bg } are the accelerom- B a gthe intrinsic parameters, i.e., focal length (fx , fy ), principal eter and gyro biases. The accelerometer and gyro measurementspoint (cx , cy ), and four distortion coefficients (k1 , k2 , p1 , p2 ). are denoted as B a and B ω iB , respectively, and they are mod-The intrinsic parameters are denoted as C IN T throughout this eled aspaper. B a = B aiB − B g + ba + v a Note that B C ω iB = B ω iB + bg + v g NR = C RB R B N (6) where the subscript i represents the inertial frame, C T CN = C T CB − C RB RN T N B (7) B N {v a , v g } are the measurement noise with distributionswhere B R and N T N B are provided by the INS. C R and C T CB {N (0, σa I), N (0, σg I)}. B g = B RN g, where N g = [0, 0, g]T 2 2 N N Bare constants due to the rigid mounting shown in Fig. 2. is the gravity, and {B aiB , B ω iB } are the true acceleration Internal-parameter calibration can be done offline using a and rotation rate vectors relative to the inertial frame. In thenumber of calibration methods such as the approaches de- following sections, variables that are denoted by a hat on topscribed in [26] and [27]. These methods use a checkerboard represent estimated quantities, and variables that are denotedpattern of known dimensions to calibrate the focal length with a tilde represent measurements.measured in pixels, the principal point in pixels, and the fourdistortion coefficients. A. Kinematic and Mechanization Equations 2) Body-to-Camera Calibration: The transformation be- Ctween the body and the camera frames (B R and C T CB ) needs ˙ The kinematic equations x = f (x, u) are given byto be determined to perform sensor fusion. Good initial esti- Nmates are available; however, errors in C R and C T CB , which p = Nv ˙ (9) Bare not modeled, can have a profound effect on the feature N v = N RB aiB − 2N ΩiE N v ˙ B (10)measurements and the measurement residual when they are N ˙used to correct the INS. Mirzaei and Roumeliotis [28] solve this BR = N R(B ΩiB B − ΩiE ) B (11)problem online by parameterizing the rotation and translation where u = [B aT , B ω T ]T , and the frame {E} denotes the iB iBparameters between the body and the camera frames and then ECEF frame. These equations are derived for a tangent frame inaugmenting them to the state vector. An EKF is utilized to [15], where B aiB is the body frame’s true acceleration vector,compute these parameters as the system progresses. Because and B ω iB is the body frame’s true angular rate vector. Thethese quantities change very slowly over a long time period, rotation matrix N R is defined as Bif at all, other researchers choose to estimate them offline (for ⎡ ⎤example, see [29] and [30]). We have opted to use the latter cψ cθ (−sψ cφ + cψ sθ sφ ) (sψ sφ + cψ sθ cφ )approach and compute these quantities offline as a one-time N BR = ⎣ sψ cθ (cψ cφ + sψ sθ sφ ) (−cψ sφ + sψ sθ cφ ) ⎦process. As shown in the following section, these estimated −sθ cθ sφ cθ cφ
  6. 6. 904 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012where sx , cx , and tx represent the sine, cosine, and tangent The software integrates the error-state covariance matrix ˙of x. The mechanization equation x = f (x, u) that was im- using a discrete-time version of the continuous-time system,plemented by the INS is given by (12)–(14), shown below. which is given byThese equations are integrated to propagate the state estimatex between measurement times denoted by tk , i.e., P k = ΦP k−1 ΦT + Qd. (21) N ˙ p = Nv (12) The transition matrix Φ = eF T is expanded using a Taylor expansion, and it is approximated as a first-order function N ˙ v = N RB aiB − 2N ΩiE N v B (13) Φ ≈ I + F T , where the high-order terms have been dropped, N ˙ because they are small compared to the first-order term. T BR = N R(B ΩiB − B ΩiE ) B (14) is the time difference between the IMU measurements. The process noise discrete-time covariance matrix is approximatedwhere as Qd ≈ GQGT T , because F T 1. Using this error-state model, the error between the computed and the actual rotation B aiB = B a + B RN g − ba N (15) matrices can be represented using small-angle transformations, B ω iB = B ω iB − bg . (16) and they are defined as B NR = B R (I − [ρ×]) N (22) NB. Error-State Model BR = (I + [ρ×]) N R. B (23) The error-state vector is defined as The Euler angles Θ = [φ, θ, ψ]T are computed from the rotation matrix N R as B T δx = δpT , δv T , ρT , δbT , δbT a g (17) θ = atan2 −r31 , (r11 )2 + (r21 )2 (24)where δx = x − x. The biases in the accelerometer and gyro ψ = atan2(r21 /cθ , r11 /cθ ), and (25)measurements are modeled as random walk functions as r32 r33 φ = atan2 , (26) ˙ δ ba = v ba , v ba ∼ N 0, σba I 2 cθ cθ ˙ N δ bg = v bg , v bg ∼ N 0, σbg I . 2 where rij is an element in BR at the ith row and the jth column. The time propagation of the error state is C. EKF Update δ x = f (x, u) − f (x, u). ˙ The INS is corrected whenever there is an available measure- To the first order, the dynamic error model is ment from either the GPS or a known feature that was detected in the image stream. Mapped landmarks’ measurements are ˙ δ x = F δx + Gω predicted usingwhere y t− = h x t − k k (27) ω = vT , vT , vT , vT T . (18) δy k = y(tk ) − y t− k (28) a g ba bg The symbol ρ is a 3 × 1 vector that represents the attitude where h(x(t− )) is the measurement function that predicts the kerror in the navigation frame. The matrices F and G are observations from vision or GPS using the current estimatedderived as state of the INS. The predicted and the sensor measurements form a residual in (28), which can be modeled as ⎡ ⎤ 0 I 0 0 0 ∂N g δy k = H k δxk + η (29) ⎢ ∂N p −2 ΩiE − B R( a − ba )× −B R 0 ⎥ N N B N ⎢ N ⎥ ⎢ p ⎥F =⎢ N ⎥ by Taylor series expansion about xk , where H k = ⎢ 0 0 [ ω N E ×] N 0 −B R⎥ ⎣ 0 0 0 0 0 ⎦ ∂h/∂x|x=x(t− ) , and η is the sensor measurement noise. k 0 0 0 0 0 The innovation (residual) covariance is computed to be Sk in (30), shown below, where P − is the covariance matrix before k (19) applying correction at tk , and Rk is the covariance matrix of ⎡ ⎤ 0 0 0 0 the sensor measurement noise. The Kalman gain at time k is ⎢−N R 0 0 0⎥ computed in (31), shown below ⎢ B ⎥G =⎢ 0 −N R ⎢ 0 0⎥ . ⎥ (20) ⎣ 0 B 0 I 0⎦ S k = H k P − H T + Rk k k (30) 0 0 0 I K k = P − H T S −1 . k k k (31)
  7. 7. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 905 As such, corrections to portions of the state (position,velocity, accelerometer biases, and gyro biases) and thecovariance are then carried out by computing (32) and (33),and N R and Θ, shown below, are computed using (23)–(26) B x+ = x− + K k δy k k k (32) P + = (I − K k H k )P − . k k (33) The following two sections focus on defining the measure-ment functions h(x(t− )) for GPS TOA and camera AOA kmeasurements.D. GPS Measurements This section summarizes the GPS observables [13]–[15].These observables are categorized in the TOA class of measure-ments. The measurement equations for each satellite vehicle aredescribed, and the pseudorange measurement is illustrated tocorrect the rover’s longitudinal position in an urban setting. Low-cost GPS receivers typically provide L1 measurements(at f1 = 1575.42 MHz, λ1 = c/f1 ), which provide a StandardPositioning Service (SPS) and have a total of three types of Fig. 4. Bird’s eye view illustrating the Kalman filter update using a TOA measurement from the GPS satellite to correct the vehicle’s uncertainty alongobservables (pseudorange, Doppler, and carrier phase) for each the line-of-sight direction. P − , which is shown in red (dashed), indicates the kof the GPS satellites with the line-of-sight to the GPS receiver’s uncertainty ellipse before the correction has taken place, and P + , which is kantenna. L2 (at f2 = 1227.6 MHz, λ2 = c/f2 ) signals are shown in blue (solid), indicates the uncertainty ellipse after the correction hasalso transmitted from the GPS satellites; however, they are taken place.encrypted and are not available to consumers, unless advancedmore costly GPS receivers are utilized. The pseudorange and [15]. Consequently, the largest contributions to the error comeDoppler are modeled as in (36) and (37), shown below, where from multipath and measurement noise.the superscript i represents the ith satellite, and 1T = [(p − The linearized measurement equation of the double differ- irpi )/ p − pi ]T is defined as a unit vector pointing from enced pseudorange and Doppler observables are computed by sv svthe ith satellite vehicle to the receiving antenna. Although the taking the partial derivative of the measurement equations withDoppler measurement is related to the average velocity over a respect to the full state vector. The result istime interval, it is treated herein as the instantaneous velocity N T N T 1ir − 10r 0 0 0 0between the satellite vehicle and the antenna’s phase center at Hk = N T N T (38)the indicated measurement time, i.e., 0 1ir− 10r 0 0 0 ρ(p, pi ) = p − pi + cΔtr sv sv (34) which is computed between the highest elevation satellite (assigned as the zeroth satellite vehicle), the ith satellite, and f2 i the rover. In an urban area where a vehicle travels on a road χi = ρ1 I + Tr + E i i (35) f1 r (see Fig. 1), buildings, trees, and other objects often block the line-of-sight to the satellites on either side or both sides ρi = ρ p, pi + cΔti + χi + Mρ1 + ηρ1 (36) ¯1 sv ρ1 i i of the vehicle. On the other hand, the portion of the sky along the longitudinal direction of the road is typically free λ1 Dr = 1T v − v i + cΔtr − cΔti + ηD1 . (37) i ir sv ˙ ˙ i of obstruction. Satellites with line-of-sight along this direction can be locked onto by GPS receivers, thus allowing the rover The GPS receiver antenna’s phase center is represented by to correct its position in the longitudinal direction of the road.p and v (position and velocity of the body frame, which can This concept is illustrated in Fig. 4, where the three-sigmabe represented in either the ECEF or the tangent frame) is a uncertainty ellipse is depicted to have been reduced along theportion of the state vector, where pi and v i represent the sv sv line-of-sight from the satellite to the rover (10r ). The time ofposition and velocity of the ith satellite vehicle, respectively, applicability of the GPS measurement correction is indicatedc represents the speed of light, Δtr is the receiver’s clock error, by a 1-PPS signal provided by the GPS receiver.Δti is the satellite’s clock error, χi represents the atmospheric ρ1errors (Ir and Tr ) and the ephemeris error (E i ) coming from i i ithe computed satellite position, Mρ1 is the multipath error, and E. Camera Feature Measurements{ηρ1 , ηD1 } are the measurement noises. The common-mode i i Mapped landmarks in the navigation frame are projectederror {Δti , χi } can be removed by differential GPS methods. ρ1 onto the image plane using a camera projection modelThe receiver clock error is removed using double differencing described by (1)–(5). For each of the visual landmarks, (1) is
  8. 8. 906 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012reexpressed as (39), shown below, which only includes quan-tities known and available to the navigation system. The trans-formation of a mapped landmark N T N L from the navigationframe to the camera frame is shown in (39), which is equivalentto (1). The rest of the transformations follow the same meansas (2)–(5), where the final measurement h is given in pixelcoordinates in (40), shown below, which is equivalent to (5). C P = CR B B N N R( T N L − N p) − B T BC (39) N u h p, B R; N T N L , C R, B T BC = N B . (40) v The linearized measurement of an observed visual landmarkon the image frame is then computed by taking the partialderivative of the measurement h with respect to the full statevector. The linearized measurement that takes the form of thematrix H k is H k = F DC C BR B NR −I 0 [N T BL ×] 0 0 (41) Fig. 5. Bird’s eye view illustrating the Kalman filter update using an AOA measurement from a traffic light to correct the vehicle’s uncertainty along thewhere directions perpendicular to the feature line-of-sight. P − , which is shown in red k (dashed), indicates the uncertainty ellipse before the correction has taken place, N T BL = (N T N L − N p), (42) + and P k , which is shown in blue (solid), indicates the uncertainty ellipse after the correction has taken place. 1 −C X CZ 0 C Z2 C= −C Y . (43) 0 CZ 1 C Z2 has been demonstrated in [36]–[38]. Traffic light detection based on color in either the red–green–blue (RGB) or the In (41), F is the partial derivative of (5) with respect to hue–saturation–value (HSV) color space tends to have high[C X , C Y ]T , D is the partial derivative of (4) with respect detection rates; however, they also tend to return a large numberto [C X , C Y ]T , C is the partial derivative of (2) with respect of false positives. On the other hand, methods that incorporateto C P = [C X, C Y, C Z]T , and the rest of the terms represent shape are less prone to false positives in the day time, are notthe partial derivative of C P with respect to the full state vector. as robust at night (i.e., most of these methods assume three A deeper look at C shows that the row vectors are perpen- dominant circular edges, and some also model the rectangulardicular to C P . This condition means that AOA features can backboard, which are hard to observe at night time due to theonly correct position errors along directions perpendicular to low dynamic range in cameras), and tend to have lower overallthe line-of-sight to the feature. The physical explanation for detection rates. The following section describes the processingthis condition is that a single camera does not provide depth used herein for traffic light detection and feature association.information. This concept is depicted in Fig. 5, where the Special attention is given to the processing time, detectionthree-sigma uncertainty ellipse is reduced along the vehicle’s rate, and reliability of detection (i.e., false association betweenlateral direction due to an observation of a traffic light in projected and detected features).front of the vehicle. In addition, camera measurements aresensitive to attitude error due to the [N T BL ×] term. This factis useful in GPS, vision, and IMU systems for the estimation of A. Feature Detectionattitude, because attitude is not always observable in GPS-aided Most modern traffic lights use an array of LEDs sittingINS [32]. The impact is clear when the vehicle is stationary, behind a lens that outputs a designated wavelength, dependingbecause yaw is corrected when there is an observable visual on the LEDs and the lens. Light output from a red light will alsolandmark as opposed to drifting when the system processes include an orange hue, and some green lights will also emit blueonly satellite measurements. This case will be demonstrated hues to help individuals with color the experimental results. Within this system, where the INS To identify candidate light regions from a color camera thatprovides continuity and lane-level accuracy for approximately represents each pixel on an image as RGB components, a rule-10 s, the camera is software triggered at 1 Hz, at times selected based approach is applied to each pixel on the be 0.5 s out of phase with the 1-PPS signal from the GPS Let I(u, v) : I 2 → I 3 be a pixel on the image I, where thereceiver. output vector [IR , IG , IB ] represents the RGB components of the pixel at location (u, v). Each of the output components is 8 b, with integer values between 0 and 255. The rule that was V. T RAFFIC L IGHT D ETECTION set to find the candidate pixels are presented in Algorithm 1, Traffic light detection using color has been demonstrated where the input image is the M × N × 3 matrix I, whichin [33]–[35], whereas detection utilizing both color and shape has been smoothed using a 5 × 5 Gaussian kernel. A set of
  9. 9. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 907thresholds tf 1 and tf 2 are selected, where tf 2 > tf 1 (chosen value components in the range [0–100], and RGB color valuesto be 150 and 500, respectively). The lower threshold tf 1 is in the range [0–255] refer to pixels at the center of each of theselected to suppress low-intensity values. The upper threshold lights. The arrows on the sliders indicate the value in each colortf 2 is selected to remove saturated pixels. This approach ef- space, whereas the color gradient on the sliders indicate howfectively filters out the sky and glare from the sun, bouncing the color values would change according to the color gradientoff reflective objects. Following this initial threshold filtering, if one of the sliders is perturbed and all other sliders are keptonly unsaturated light sources or bright-colored objects remain constant. These images illustrate the fact that it is hard toto be considered. {tY 1 , tY 2 , tR , tG } are threshold values that classify the lights based on the hue values in the HSV colorrepresent ratios, and they are preconfigured as {3, 2, 2, 2} space as done in other methods. As shown in this case, the hue(chosen heuristically) in our system before the experiment. This values in the middle of the yellow lights in the first image aremethod processes directly on a RGB image, eliminating the reported as dark orange, and the hue values in the middle ofneed for conversion to the HSV color space, which is utilized the red lights in the second image are reported as light most of the aforementioned traffic light detection methods. Traditional traffic light detection methods based on the hue values of the HSV space would fail in these cases. The images in the second row of Fig. 6 show the detection and classification Algorithm 1: Traffic Light Pixel Screening results for traffic lights when the lights have correctly been identified using the traffic light feature detection (the labels in blue identify their colors, and the white circle around the input I, 3-channels (RGB) of dimension M × N light represents the detected light candidate) of Algorithm 1 output O, 3-channels (RGB) of dimension M × N and the subsequent processing steps to extract the candidate for I(u, v) in I regions. Red vehicles and red clothing worn by a pedestrian are intensity = IR + IG + IB also detected, as shown in the middle and the third images on if (intensity > tf 1 and intensity < tf 2 ) IR IG the second row of Fig. 6. These false positives are eliminated if ( IB > tY 1 and IB > tY 2 ) at the data association step using a priori mapped landmarks O(u, v) = [1, 1, 0] (yellow) IR IR combined with quantities computed by the EKF. elif ( IB > tR and IG > tR ) O(u, v) = [1, 0, 0] (red) B. Feature Association elif ( IB > tG and IG > tG ) or ( IB > tG and IG IR IR IG Known mapped traffic light positions and their radii are IR > tG ) O(u, v) = [0, 1, 0] (green) stored in a list, which are projected onto the image using (40) else O(u, v) = [0, 0, 0] (black) (note that the a priori determination of traffic light positions else O(u, v) = [0, 0, 0] (black) is outside the scope of this paper and is a topic of a separate paper). Based on this list, only features that are projected onto the image or lie in a close proximity to the area of the image are selected. Although some of the projected features are inside the The next step involves a morphological closing operation field of view, not all of these features will be visible in the image(see [39]), which is a dilation operation, followed by an erosion due to the perspective effect on objects with different distanceoperation using a 7 × 7 square kernel of a disc structure. to the focal point; therefore, the magnitude of the vector fromContinuous nonzero regions of pixels with eight-neighbor con- the camera’s focal point to the feature is used to remove allnectivity on the image are extracted as connected components, features that are farther than a set distance (removes ambiguityand a minimum enclosing circle that maximizes the number when multiple mapped landmarks exist along a light-of-sight).of pixels along the circle’s perimeter is fitted onto each of the This distance threshold should be set based on the camera’scandidate regions. The center and the radius of the minimum resolution and its focal length. In addition, the radii of the trafficenclosing circles are collected. The centroid that was computed lights are utilized to further reject outliers from being associatedby the minimum enclosing circle fitting tends to be more to the projected traffic lights by comparing the projected radii torobust compared with directly computing the centroid from the the candidate feature radii. Let {y i }nc be the set of detected k i=1connected component or the centroid from an ellipse fitting. features at time tk that match color c. For feature fc of color cOne of the cases where this step is necessary is when the that maps to the image at location y fc , define the measurement kdetected light region appears as a crescent shape due to the residual as δy i = y i − y fc . Finally, the squared Mahalanobis k k klimited dynamic range of the camera (Algorithm 1 filters out distancesaturated pixels). Candidate regions with radii larger than a Tlower bound threshold and smaller than an upper bound thresh- dist = δy i S −1 δy i k k k (44)old are selected, and the rest are discarded. These thresholdsare set to 2 and 15 pixels (ad hoc), respectively, for our system is used to establish correspondence between the remainingbut would likely be different, depending on the resolution of the predicted and detected features. This process is critical, becausecharge-coupled device (CCD) and the focal length of the lens. incorrect data association could result in the catastrophic failure The top row of Fig. 6 shows sample images of yellow and of the state estimation approach. Currently, with the aforemen-red traffic lights taken by our sensor setup, where the HSV tioned method, we have not experienced any incorrect datawith the hue component in the range [0–360], saturation and associations.
  10. 10. 908 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012Fig. 6. Top row shows the HSV with range values [0–360] for hue, [0–100] for saturation and value, and RGB values in the [0–255] range from the center ofthe yellow (left image) and red (right image) traffic lights. The hue values of yellow lights (two examples where the lights are dark orange in the top left image)range from dark orange to yellow. The right image on the first row shows red lights, where the hue values from the center of the lights indicate that the lights areyellow instead of red due to saturation. Although examples for green lights are not shown here, bright green lights can also cause saturation on part of the imagedlight source. The bottom row shows sample traffic light detection results. Correctly identified features are indicated by blue labels (pointed to by green arrows)and false positives from red cars and pedestrian wearing red clothing (pointed to by orange arrows). When LED traffic lights are modulated, either for communi- P8600 runs the localization program, where both the GPS andcation of data, an internal controller, or an AC power source, the image measurements are processed at 1 Hz each, and theythe light’s image intensity flickers. This case was evident in, are 0.5 s out of phase.but does not affect, our experiments. A traffic light internal Additional results in video format are available atstate estimator similar to [40] based on available observations tying together with a tracking procedure could yield more vis/. These videos include real-world traffic light detectiondetections and facilitate traffic light tracking. Their method samples along several segments of roadways in a suburban areautilizes a hidden Markov model to predict the internal state and videos that show the localization results in a parking lotof the traffic light based on a series of observations. Such a with traffic light(s) in several weather conditions (day, evening,prediction was not required in our approach. and in the rain during the day), including one simulating an urban canyon setting (additional descriptions of the videos are VI. V EHICLE E XPERIMENTS available on the webpage). One of the parking lot experiments is discussed in the following sections. This section presents the experimental results, which validatethe concepts presented in previous sections. The aluminum A. Description of a Parking Lot Experimentplate shown in Fig. 2 is mounted on top of a passenger vehiclewhere the u-axis points toward the front of the vehicle. The The following two scenarios are presented following thisaverage calibrated focal length (over fx and fy ) of the camera section: 1) a scenario that uses DGPS to aid the INS and 2) theis 6.267 mm, with the CCD having square pixels of 6 μm. The other scenario that uses vision or DGPS to aid the INS. Theseimages that were returned by the camera contain three channels scenarios are compared with CDGPS-aided INS to quantifywith 640 columns by 480 rows. The shutter speed is set to the performance of the proposed vision/DGPS-aided INS. The0.25 ms so that the returned color image is not saturated in the traffic lights were surveyed using a CDGPS, where the positiondaytime although it is still enough to pick up light sources at is averaged over time (typically 10 min).night. The IMU returns six measurements (accelerations and At startup, position and velocity are directly observable fromrotational rates along the three orthogonal axes of {B}) at a the GPS. Roll and pitch are initialized while stationary by usingrate that exceeds 100 Hz. A computer with an Intel Core 2 Duo the accelerometers and the method described in [15, Sec. 10.3].
  11. 11. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 909 While the system runs in real time, the raw data from the sensors are also recorded onto a hard drive so that it is possible to rerun the raw data to analyze the system’s performance using various aiding scenarios, as discussed in the following sections. The image plane measurement noise Rk in pixels is set to the identity matrix in the experiments. The estimation in these experiments utilizes around 50% of the central processing unit (CPU) resources on both CPU cores. Although the image processing time described in Section V varies depending on the traffic light detection algorithm and the number of features projected onto the image, the typical image processing time from the time that the camera is triggered to the time that the visual feature measurements correct the INS is 200 ms. The localization is implemented as a multithreaded program; therefore, the IMU and GPS data processing is not interrupted by the processing done on the image. Note that the INS state and covariance are saved at the time that the camera is triggered; therefore, it can be used for all INS-aided calculations. B. DGPS-Aided INSFig. 7. Vehicle’s NED position measured using CDGPS-aided INS during theparking lot experiment is used as the ground truth to compare different low-cost This section analyzes the prediction of the pixel feature loca-aiding methods. The blue asterisk denotes the position of the traffic lights with tions using (40) and the INS/GPS state estimate during the timedifferent markers along the route, indicating different events that occur during when both GPS and visual features are available. In the firstthe experiment. The data between the vertical dashes indicate the time that thereis observation of the traffic lights. scenario, image aiding is not enabled, and only pseudorange is utilized as the GPS measurement and not the carrier-phase mea- surement. The residuals based on (28) for the traffic light pre-Subsequent GPS measurements while stationary will cause diction on the image are plotted in Fig. 8, first column, showingthe roll and pitch estimation errors to converge to a subspace large residuals in both the image’s columns and rows during thisdefined in [31]. While stationary, yaw is not observable. time. Note that these residuals, when not used by the Kalman When the vehicle begins to maneuver, yaw is initialized filter, are neither zero mean nor white. In addition, note thatto the heading of the GPS velocity vector, which is derived the residual in the x-direction, which is sensitive to north andfrom GPS Doppler measurements. From this point forward in yaw errors, is larger and growing. The error in the y-direction,time, the INS error state, including attitude and instrument bias which is sensitive to down and pitch errors, is smaller anderrors, are calibrated by the EKF in real time. bounded. This case is because the yaw error is unobservable The vehicle’s north and east positions are shown in the and drifts during the time that the vehicle is stationary.first plot of Fig. 7. The second plot of Fig. 7 shows the The expected position of the traffic light and the one-, two-,down component versus time from the start of the experiment and three-sigma error ellipses (relative to the projected mean)(approximately 10 s from the system startup) to the end of the are shown as the green, yellow, and red ellipses in Fig. 10. Theexperiment (approximately 140 s after startup). The vehicle top three rows with blue plots in Fig. 9 are the vehicle positionstarts out by moving in the east direction as pseudorange, and EKF computed position error standard deviation (the uncer-Doppler, and visual feature aiding are all enabled. When the tainty associated with the DGPS-aided INS using pseudorange).vehicle’s yaw direction is resolved down to the subdegree level, These plots show that DGPS-aided INS alone cannot resolvethe Doppler measurement is turned off at 30 s, leaving only the position down to the lane level. Although the error frompseudorange and visual features on to correct the INS whenever the DGPS-aided INS does not exceed 1.2 m within this timethese measurements are available. Around this time, the vehicle period, the standard deviation plots show that the bound on themakes a right turn heading south past a number of vehicles position uncertainty does not guarantee a level of confidence,that occlude the view of the traffic light. After clearing the where we could use the technique for lane-level navigation. Inocclusions, the vehicle turns left heading east toward the traffic addition, the error in the down component is less accurate thanlights. The segment when the vehicle has observation of the the north or east components. This phenomenon is common intraffic light is indicated by the diamond-shaped markers around GPS positioning [14], [15]. Another aspect with regard to usingthe north–east position and in between the vertical dashed lines only pseudorange from DGPS to correct the INS is that the yawin the down component plot in Fig. 7. In addition, the vehicle is unobservable while the vehicle is nonaccelerating. This factstops and remains stationary for almost 1 min at a red light is shown in the fourth row of Fig. 9, where the yaw angle fromduring the time that it has an observation of the traffic lights. DGPS-aided solution drifts and the standard deviation growsThe vehicle begins to move southward after the light turns green during the time that the vehicle stops at the red light for bothand then heads west until the experiment concludes. pseudorange and carrier aiding.
  12. 12. 910 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 13, NO. 2, JUNE 2012Fig. 8. First column shows the residual from the traffic light from the DGPS-aided (pseudorange) INS method alone (note that image aiding is turned off). Thesecond column shows residual from traffic light from the vision/DGPS-aided (visual feature + pseudorange) INS method, where the frequency histograms of thefeature residual (more than 77 s) are shown in the third column (mean and standard deviation are given in pixels).Fig. 9. First three rows of the first column show vehicle position errors between DGPS and vision/DGPS aiding, where CDPGS aiding is used as the groundtruth. Error in each of the components is calculated using the absolute difference. The first three rows of the second column show the position standard deviationcomputed by the EKF when each of CDGPS, DGPS, and Vision/DGPS are used to aid the INS. The fourth row shows a comparison of the vehicle’s yaw angle andtheir standard deviation computed by the EKF during the time that the traffic lights are observable when CDGPS, DGPS, and vision/DGPS are utilized to correctthe INS. This segment occupies the same time frame between the two vertical dashed lines shown in the first three rows.C. Vision/DGPS-Aided INS of Fig. 10, the processed images of the scene show a traffic As shown in the second and third columns of Fig. 8, when light, the projected light locations on the image (blue O), andvision aiding is enabled, the vision residual is reduced and is their uncertainty ellipses, together with the detected light can-white. In addition, as shown in the middle and right images didate locations (white circles with blue +’s at the centroids).
  13. 13. VU et al.: REAL-TIME COMPUTER VISION/DGPS-AIDED INS FOR LANE-LEVEL VEHICLE NAVIGATION 911Fig. 10. Vision/DGPS-aided INS and tracking of a traffic signal light mounted on a pole. The images show the prediction of the traffic lights using the meanand covariance matrix when the images are taken. The projected features and their uncertainty ellipses with the green, yellow, and red ellipses represent the one-,two-, and three-sigma uncertainty of the location where the traffic lights are expected to appear on the image. Feature association is shown using a white line thatconnects the center of the detected feature to the center of the predicted feature. Vision aiding is enabled at the center and in the right images, where the lightsare detected, which helps reduce the uncertainty in directions perpendicular to the feature’s line-of-sight. The images are taken at approximately 30, 50, and 118 safter the system is started.The green, yellow, and red ellipses indicate the regions that mance in rural areas, are inconsistent in suburban areas, and arewere expected to contain the traffic lights with probabilities of often unsuccessful in most urban areas. Although some of theapproximately 68%, 95%, and 99.7%, respectively. The center research in the vision-aided INS would rectify the availabilityimage shows the vehicle moving eastward toward a red traffic problem in urban areas, most of this research has focused onlight at 50 s. The region where the red light is expected (shown spacecraft/aircraft applications. There has been no significantinside the projected red ellipse) is smaller than the same region effort that addresses this issue in the surface vehicle domain.inside the red ellipse with vision aiding turned off (first image Even with the vision-aided INS methods, only a few methodsin Fig. 10) due to the corrected errors. This case is directly [19], [20], [23], [24] are reported to perform in real time,related to the visual measurement matrix H k based on (41). and none of these approaches have been reported to work inFurthermore, the first three rows of red plots in Fig. 9 show challenging dynamic environments (with many objects in thethat error in the north and down directions are corrected and scene and with a high degree of occlusion).the uncertainty in these two directions are reduced, whereas This paper has addressed the issue of vehicle state estima-the error in the east direction is similar to the DGPS case. tion in urban environments using a real-time tightly coupledThis case is because the east direction aligns with the camera vision/DGPS-aided INS. It combines DGPS measurementsz-axis (depth), which is unobservable, as discussed in the last together with visual features such as traffic lights to obtainparagraph in Section IV. The third image in Fig. 10 shows the better position and attitude estimation in urban environments.visual feature tracking and aiding at 118 s. The vehicle moves Measurement equations are derived, which show that DGPSfrom stationary, whereas the traffic lights are still within the measurements along the longitudinal direction allow the vehiclefield of view. The projected light features stay on top of the to correct its longitudinal position errors, whereas the AOAdetected traffic lights. The uncertainty levels along the north features from a forward-looking camera resolves position errorsand down directions drop down to the submeter-level accuracy along directions perpendicular to the feature line-of-sight. Ex-using the vision/DGPS-aided INS. This condition is achieved perimental results have shown that the combination of DGPSby having an open sky and one visual feature. and a single visual feature measurement at 1 Hz is sufficient During the time that the vehicle is stationary and a visual to achieve position accuracy, which is typically less than 1 mfeature is available to provide correction, the yaw error is in directions perpendicular to the visual feature’s line-of-sight,observable, as shown in the bottom row of Fig. 9. As shown achieving lane-level accuracy while also stabilizing the yaw standard deviation plot, the uncertainty in yaw islower than both DGPS- and CDGPS-aided INS solutions when B. Future Workthe visual feature is used to correct the INS. This paper has presented theoretical work that supports our proposed localization method and presented results in a fairly VII. C ONCLUSION controlled environment. Our future work will address issues such as the mapping of visual features, using additional fea-A. Summary tures (e.g., traffic signs and manmade structures), improving Real-time vehicle positioning with submeter accuracy, high the traffic light detection routine by addressing the modula-availability, and high bandwidth is becoming a prerequisite by tion of the lights, and real-time localization in dense urbanmany ITS applications, particularly control- and safety-related environments. Although a forward-looking camera was usedapplications and traffic management applications. Some of the in the aforementioned experiments, an omnidirectional visionongoing research for addressing this issue takes advantage of sensor can provide a 360◦ view of the surroundings, whichthe availability of the Global Navigation Satellite Systems by can track mapped visual features over longer periods. High-using satellite measurements to aid an INS. Results from these resolution catadioptric omnidirectional vision sensors can bemethods depend on an open sky and have shown good perfor- incorporated in the future to provide improved tracking of