Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recognition of road markings from street-level panoramic images for automated map generation.

This paper presents a road-marking recognition pipeline operating on street-level panoramic images. On a large dataset of 84,387 images, the full processing pipeline achieves detection rates of 85%, 92% and 80% for crosswalks, block- and give-way markings, respectively, with a positioning error smaller than 0.6m. This shows that the presented system is performing sufficiently well for generating road-marking maps.

  • Login to see the comments

  • Be the first to like this

Recognition of road markings from street-level panoramic images for automated map generation.

  1. 1. Disclaimer The Department of Electrical Engineering of the Eindhoven University of Technology accepts no re ponsibility for the contents of M.Sc. theses or practical training reports Department of Electrical Engineering Den Dolech 2, 5612 AZ Eindhoven P.O. Box 513, 5600 MB Eindhoven The Netherlands Series title: Master graduation paper, Electrical Engineering Commissioned by Professor: Group / Chair: Date of final presentation: Report number: by Author: Prof. dr. ir. P.H.N. de With SPS Recognition of road markings from street-level panoramic images for automated map generation May 22, 2015 Internal supervisors: Prof. dr. ir. P.H.N. de With, Ir. L. Hazelhoff T. Woudsma
  2. 2. Recognition of road markings from street-level panoramic images for automated map generation Thomas Woudsma Department of Electrical Engineering Eindhoven University of Technology, The Netherlands Cyclomedia Technology B.V., The Netherlands Email: Abstract—Road-marking maps created from the automated recognition of road markings from images can be used for the automated inspection of markings, used by autonomous vehicles or applied in navigation systems. This paper presents a road- marking recognition pipeline operating on street-level panoramic images. First, all individually images are processed in a geograph- ical region of interest. The single-image marking recognition stage consists of Inverse Perspective Mapping, segmentation, contour classification, context inference and marking model evaluation. Second, single-image detections are merged into the multi-view positioning stage, which uses connectivity-based clustering. The single-image stage detects 88%-97% of the pedestrian crossings, block, give-way and stripe markings in a city environment with ground-truth deviations below 0.5 m. Context inference signifi- cantly improves both the detection performance and positioning accuracy. On a large dataset of 84,387 images, the full processing pipeline achieves detection rates of 85%, 92% and 80% for crosswalks, block- and give-way markings, respectively, with a positioning error smaller than 0.6 m. This shows that the presented system is performing sufficiently well for generating road-marking maps. Closer analysis of missed detections reveals that the common causes are marking damage and high capture range. I. INTRODUCTION Road markings provide extensive information on traffic situ- ations and are therefore vital for traffic safety. Amongst others, these include lane markings, give-way triangles, pedestrian crossings, stop lines and arrows. Databases with the position and type of the road markings can be used for various appli- cations. For instance, this data can be supplied to a navigation system to alert the driver of upcoming traffic hazards or allow for more detailed route generation, e.g. truck drivers can set up a route to avoid pedestrian crossings. The databases can also be used for automatic quality monitoring, thereby strongly reducing the need for manual quality supervision. Additionally, marking situations can be checked for safety analysis, such as markings at priority situations. Currently, marking inspection is performed by manually inspecting the markings on roads. Typically, such inspections are performed reactively, e.g. after complaints of road users or accidents. Road-marking recognition systems can help to automate this inspection. These systems often use images (street-level or aerial) to detect and position road markings. Specific image analysis algorithms can be used to recognize markings in these images. In this paper, the focus is on marking recognition from street-level images, specifically the (a) Abrasion (b) Occlusion (c) Shadows Fig. 1. Three examples of road markings that are difficult to detect. street-level panoramic images created by Cyclomedia B.V. in the Netherlands, which are annually captured on all public roads. Detection and recognition of road markings from images involves several challenges. There are numerous of factors making the recognition difficult, such as occlusions (e.g. by other vehicles), shadows cast by surrounding objects, varying weather conditions (affecting lighting) and marking deterio- ration due to abrasion from vehicles. Figure 1 shows exam- ples of three common difficult detection situations (abrasions, occlusions and shadows on markings). Even if the detection succeeds in overcoming these challenges, recognition of the specific marking types can still be very complicated because most recognition algorithms heavily rely on accurate shape extraction. Further analysis of the most common road markings shows that these markings occur in specific periodic patterns (e.g. dashed lane markings occur at regular intervals) and occur in groups (e.g. block markings and arrows in an exit lane). Where individual recognition of (damaged) markings may be complicated, modeling high-level context information/patterns can help to improve the detection rate and recognize road markings that are for instance partially damaged. In addition, images are often taken at regular intervals giving redundant marking information. In this case, the same markings are cap- tured from different viewpoints which can improve detection rates, if a specific marking is occluded in some images (but not in all). For roads in the Netherlands, which are considered here solely, the marking design standards are managed by the CROW [1]. They divide road markings into three categories:
  3. 3. (1) parallel markings (parallel with the driving direction), (2) perpendicular markings (perpendicular to the driving direction) and (3) symbol markings. Markings in the first category, which are the most common, include e.g. lane markings (continuous and dashed) and block markings. The second category includes e.g. markings of pedestrian and bike crossings, give-way triangles (“shark teeth”) and stop lines. The last category consists of a wide variety of symbols such as arrows, speed numbers, words and bike symbols, which do not occur in periodic patterns. This research focuses on the automated generation of road-marking databases (type, position and orientation) in a geographical region of interest. This implies the automated detection, recognition and positioning of road markings. Our research work builds upon previous work by Li et al. [2] that has mainly concentrated on recognition of road markings on highways. This work extends this prior research for road markings in city and rural environments in several ways. Next to adding support for different road types, this work also contains significant algorithmic alterations and improvements, of which the most important contributions at the algorithmic level are (1) context inference using probabilistic modeling, (2) evaluation of marking placement models to identify marking clusters, and (3) add multi-view positioning of markings to find real-world positions and generate marking maps. This work resulted in a generic road-marking recognition pipeline, which can be applied to the recognition of a wide variety of markings (e.g. crosswalks, give-way triangles and block markings). Furthermore, this system complements an existing traffic-sign recognition system [3], by providing both redundant (i.e. several situations consist of both road markings and traffic signs) and complementary information (i.e. some situations are indicated by only signs or markings). This overall results in a more complete overview of high-quality driver signaling and traffic situations. Before we present our approach, some related work is discussed. II. RELATED WORK Commonly, road marking recognition is developed for Ad- vanced Driver Assistance Systems (ADAS) or for autonomous driving vehicles [4], using car-mounted cameras, which are sometimes combined with LIDAR systems. As the main goal of these ADAS systems is to aid drivers to stay in their lane, a significant portion of the related work focuses at lane detection [5][6]. As described by a survey paper [7], these systems commonly follow three principal steps: (1) pre-processing to remove noise and other unwanted image data, (2) feature extraction to find relevant parts such as edges, (3) model fitting to verify the detected markings and to remove false positives. Recognition of several marking types is e.g. performed by Foucher et. al. [8], who present a system for the recognition of pedestrian crossings and arrows. After segmentation, the authors identify crossings based on the mutual relations be- tween the connected components from the segmentation mask, where crosswalks are identified if the segments meet certain conditions. To recognize arrows, the connected components are compared to 63 models of arrows. Experimental results on a dataset containing 165 crosswalks and 151 arrows show a true positive rate of 90% and 78%, respectively. Li et al. [9] follow a similar approach for the recognition of crosswalks, stop lines and lane markings. After filtering the segmentation result with directed morphological opera- tions, all connected components are analyzed, and the target markings are identified based on angular orientation and blob dimensions. Although no quantitative results are available, qualitative recognition results on urban images are presented. Qin et al. [10] present a general framework for road- marking detection and analysis. After Inverse Perspective Mapping (IPM), segmentation and contour extraction, the framework is split into different modules for specific markings (lanes, arrows, crosswalks and words). Every module includes a Support Vector Machine (SVM), which is trained for the classification of each marking, using geometric features such as Hu moments. Experiments show precision rates above 90%, though problems occur with the recognition of worn and shadow-covered markings. Previous work in [2], describes a recognition pipeline that can accurately detect and recognize lane, stripe, block and arrow markings. From the IPM image, a marking segmentation is obtained with a local threshold. Then each connected component is translated to its centroids, scaled to be within a unity interval and rotated to align its primary axis, making them invariant to these three transformations. Next, the four different marking types are classified by SVMs, trained on the specific types, using shape features. These are the distance from the shape centroid to its contour at regular angular intervals. Lanes are then modeled using RANSAC for straight lanes and the Catmull-Rom spline for curved lanes. On the dataset of 910 highway panoramic street-level images, the algorithm achieved precision and recall metrics of over 90%. This work is used as a starting point for our contributions for which an approach is discussed in the next section. III. APPROACH Most related research is focused on in-car use where images are captured with regular (non-panoramic) video cameras. These systems aim at the recognition of specific markings that are important for driver assistance and autonomous vehicles, such as lane markings and stop lines and do not generate databases of the recognized markings with type and global position and orientation. In this research, we use the same basic processing steps (IPM, segmentation, contour classification, model evaluation), as commonly applied in literature. This pipeline is extended with two novel major processing stages, compared to related work. First, contextual inference is added to incorporate in- formation about the neighboring marking elements, thereby clearly improving the detection performance. Second, an accurate positioning stage is added, which uses recognized markings from several images to determine the real-world coordinates. The proposed system should satisfy the following requirements:
  4. 4. fvertical = -½p fvertical = 0 (horizon) fvertical = ½p fhorizontal -p p -½p ½pfvertical S SNW E (a) Street-level panoramic images from Cyclomedia (Cycloramas) used in the experiments. The horizontal axis corresponds to the horizontal angle (azimuth) around the camera. The vertical axis corresponds to the vertical angle (altitude), where φvertical = 0 indicates the horizon. (b) IPM of street-level panoramic image. Fig. 2. Example street-level panoramic image with its IPM image. 1) follow a semi-supervised generic learning-based ap- proach to recognize a variety of markings, 2) apply context inference to exploit contextual relations between neighboring elements, 3) apply road-marking models in a generic framework to retrieve marking clusters, 4) extraction of global marking positions of the identified clusters, allowing for the generation of road-marking maps. These requirements result in a system capable of recognizing multiple marking types: pedestrian crossings, give-way and block lines, stripes. We have selected these types as they are most common on intersections and denote very important information for road safety. It should be noted that recognition of lane markings is covered in [2]. This system will be evaluated at two different levels: (1) single-image marking recognition, and (2) multi-view marking positioning. The first experiment assesses the performance at different stages in the system to investigate their performance aspects. In a second experiment we evaluate the quality of the generated marking maps. Additionally, this experiment in- volves a combination with a traffic-sign recognition system [3], which has been published in [11]. The proposed road-marking recognition system following this approach is explained in Section V. However, we will first elaborate on the characteristics of the source data (street-level panoramic images) used as input by the recognition pipeline. IV. SOURCE DATA The presented system for the recognition of road markings operates on street-level panoramic images, which provide a recent and accurate overview of the road infrastructure. These images are acquired at a large scale and are recorded at all public roads within the target area, using a capturing interval of 5 m. The recording vehicles drive along with regular traffic at normal speeds. The cars are utilized in an efficient way by capturing during daytime during all kinds of weather conditions, including sunny, cloudy and foggy weather, and directly after (but not during) rain or snow. The panoramic images have a resolution of 2, 400 × 4, 800 pixels and are stored as equi-rectangular images. The capturing location is also accurately known for each image, based on a high-quality positioning system featuring both GPS and IMU devices. Figure 2a displays an example equi-rectangular panoramic image. The employed capturing systems are calibrated precisely, resulting in panoramic images that are mapped to a sphere, on which angular distances can be measured. The resulting images are stored as equi-rectangular images, which have a linear relationship between the pixel coordinates within the image and the viewing directions in horizontal and vertical dimensions. This allows for the precise calculation of the real-world 3D positions based on triangulation. The position of an object can be retrieved in case multiple points (≥ 2) corresponding with the considered object are found in multiple images, using straightforward geometrical computations. V. ROAD-MARKING RECOGNITION SYSTEM This section presents a learning-based system for road- marking recognition, using street-level panoramic images captured from a vehicle. This system is split up into two major processing blocks: single-image marking recognition and multi-view positioning. The first block independently processes all images in a specific geographical region of interest. To recognize markings in images, this block consists of five consecutive processing steps, which are described in Section V-A. This results in both a pixel location, orientation and a marking type for each recognition. The detection results from the single-image marking recog- nition (marking types, positions and orientations in images) are passed to the multi-view positioning block which merges these results to get global marking positions. Section V-B elaborates further on this block. The complete recognition and positioning pipeline is shown in Figure 3.
  5. 5. Single-image Marking Recognition Multi-view Positioning Image Segmentation Pre Processing Marking SVMMarking SVMMarking SVM Contour Classification Marking SVMMarking SVMMarking Model Model Evaluation Context Inference Marking SVMMarking SVMContext Model Fig. 3. Road-marking recognition pipeline. First, markings are recognized in each image separately by performing IPM, segmentation, contour classification, context inference and model evaluation. Then the results are combined with multi-view positioning. The five images below the diagram give the intermediate results for the single-image marking recognition. Note that the contour classification, which generates probability maps, is performed for each marking type (here shown for give-way markings). These are used by the context inference and model evaluation together with context and marking models, respectively. A. Single-Image Marking Recognition The recognition pipeline first processes all panoramic images individually and recognizes markings in each im- age in five sequential processing steps. Because street-level panoramic images (Cycloramas) are used, the first step is to (1) perform the aforementioned IPM. This results in a top-down view of the scene, centered around the vehicle. Then, (2) image segmentation is used to find the relevant regions in the top-down image (i.e. road markings). This is followed by (3) the classification of each relevant region (or connected component) in the segmentation result. Features from these connected components are extracted and classified by trained SVMs. This results in a probability map for each marking type. To enhance the performance of the SVMs, context information (i.e. neighboring elements in the spatial placement patterns) is exploited in the next step by modeling the probability maps as Markov Random Fields (MRF). As a result, we adopt (4) context inference, selecting the most likely marking type with respect to the context, is performed by using Loopy Belief Propagation (LBP) on the MRFs. Finally, (5) the classified contours are evaluated by marking models which merge single elements into multi-element markings (e.g. pedestrian crossings or give-way lines). The next five sections elaborate on these steps. 1) Image Pre-Processing: Direct recognition of road mark- ings from the spherical, equi-rectangular panoramic images is challenging, due to the inherent perspective deformations. This can be observed from e.g. Fig. 2a, which illustrates that parallel lines on the ground plane are not parallel in the image plane, but instead converge to a single vanishing point. Therefore, such perspective-distorted images are commonly transformed to a top-down view, using an Inverse Perspective Mapping (IPM) (similar to [12] [13]). This transformation remaps the image such that the image plane equals a pre- defined (ground) plane (e.g. the road). It should be noted that since most roads are not perfectly flat (but slightly curved for drainage), small deformations may be visible. Nevertheless, the resulting images allow for easier detection, recognition and positioning of road markings, as e.g. illustrated by Fig. 2b. These top-down images are calculated by: x = xcar + arctan(yIP M xIP M ) 2π × n mod n, (1) y = m − arctan(d/h) 2π × n mod m. (2) In these equations, (x, y) and (xIP M , yIP M ) denote the hor- izontal and vertical image coordinates in the equi-rectangular panoramic image and the computed IPM image, respectively. The parameter xcar represents the horizontal coordinate of the front of the car within the panoramic image, h and d denote the camera height from the ground plane and the distance from pixel coordinate (xIP M , yIP M ) to the center of the IPM image. Finally, m and n denote the resolution of the panoramic image. 2) Road Marking Segmentation: The retrieved IPM image is segmented into two categories: road marking- and non-road marking-pixels. Road markings are typically brighter than the road and have a low saturation, as they are typically close to white luminance. Therefore, image regions that have a high local intensity and a low saturation are extracted in a two- step process. Using this metric for the segmentation of road markings gives good results [14]. The first step involves the calculation of the intensity difference between the grayscale pixel values and the average graycale intensity value in a rectangular window around each considered pixel. With gp the grayscale pixel value of pixel p
  6. 6. (a) (b) (c) Fig. 4. Illustration of the segmentation steps. (a) input top-down image, (b) segmentation result based on local intensity measure, (c) segmentation based on both local intensity and saturation measures. θ d1d2d3 d4 d5 d6 d7 d8 θ d1d2d3 d4 d5 d6 d7 d8 Fig. 5. Example of shape features in two different shapes (block and triangle). Clearly, the vectors have different magnitudes for equal angles. and v, w the size of the local neighborhood around pixel p, this calculation can be expressed as: gp = gp − 1 vw v 2 i=− v 2 w 2 j=− w 2 gij. (3) The size of the window is determined by the marking types of interest. A binary segmentation is then obtained by applying Otsu’s threshold method on the found differences. The second step involves filtering based on the saturation value, where a thresholding operation removes the highly- saturated pixels from the previously obtained mask. After mor- phological closing and hole filling, all connected components (groups of neighboring pixels) are extracted from the retrieved segmentation mask. Figure 4 illustrates this procedure. 3) Contour Classification: The next step is to classify each of the connected components in the segmentation result. First the contour is extracted from each connected component, representing its outline in pixel positions. Then all contours are translated to the origin and rotated to align their primary axis to be translation- and rotation-invariant. Often scale invariance is used as well, but in this case the scale is relevant to the marking type, e.g. small stripe markings and crosswalks may have the same shape but different scale, such that scale- invariance is omitted. Next, the distance between the contour centroid and the contour edge at set angular intervals is determined, as shown in Figure 5. As road markings have highly regular (and mostly convex) shapes, these values can be used as a shape descriptor and concatenated to form a feature vector. To classify between the different marking types, the feature vectors are transformed to zero mean and unity standard deviation, by subtracting the mean feature vector and dividing by the standard deviation from vectors from training sets. This results in a set of N feature vectors for each marking, where each vector corresponds to a specific marking type. These vectors are then classified by SVM classifiers, each operating on the feature vector extracted for the marking type that it should recognize. Each SVM outputs the distance towards its decision bound, which is then converted to a probability measure using Platt scaling [15] [16]. After evaluation of the SVMs, each segmented object has N probability measures, which are in the same unity interval and indicate probabilities, which allows for direct comparison between marking types. For each contour, a vector of N probability measures is created. Figure 6a and 6c show the obtained probability map for two different marking categories. 4) Context Inference: Individual markings can be occluded (e.g. by other vehicles) and may have a lowered visibility or can be damaged. This results in non-ideal shapes in the segmentation mask, which are recognized with lowered proba- bility, or a shape has high probabilities for other marking types. Therefore, we exploit the periodic spatial placement patterns at which road markings are typically placed, to improve the recognition performance. We use this contextual information to update the recognition scores for each detected marking, based on the scores of markings located at the expected locations for the respective marking types. For this type of novel contextual information, we employ a Markov Random Field (MRF), which allows for updating of the recognition probabilities based on contextual information, i.e. based on the probabilities of their neighbors. Within the MRF, all detected road markings are modeled as nodes, where the initial probabilities of the nodes are set to the probabilities found by the SVMs at the classification step. All neighboring nodes, having an inter-node distance smaller than a pre-defined threshold, are connected with edges.
  7. 7. (a) (b) (c) (d) Fig. 6. Illustration of the MRF processing. Left column: input probabilities for (a) give-way and (c) block markings, red-green denote low-high values. Right column: output probabilities for (b) shark teeth and (d) block marking. Note that the overlap between triangle and block markings disappears. Fig. 7. Construction of context weighting functions using marking distance and orientation difference from a training set for perpendicular markings. The heat map at the top shows the weights in the x-y plane for neighboring nodes in the MRF, where red indicates a high and blue a low weight. For instance, for perpendicular markings the weight to another node in the MRF is high if it is at its periodical distance (e.g. 1 meter on the minor axis and < 0.5 meter on the major axis). The second figure shows the weighting function in terms of the orientation difference (between markings/nodes of the MRF). In contrast to the conventional MRF, where all edges are unweighted, we assign weights to the edges. These weights represent the contextual influence of nodes on each other, based on how well their inter-node distance and relative orientations fit to the expected marking placement pattern. This relationship is modeled by fitting a Gaussian Mixture Model (GMM) on the inter-node distances and orientations of a train- ing set. Figure 7 shows an example context weight function for perpendicular markings. Because different marking types have varying contextual relations, the weights are determined for each marking type. Figure 8 provides an illustration of an example MRF for road markings. Details of this figure will 1 2 3 4 p1 = [0, 0.01, 0.99]T p2 = [0, 0.15, 0.85]T p3 = [0, 0.6, 0.4]T p4 = [0, 0.1, 0.9]T w1→2 = [w1 1→2 , w2 1→2 ,...]T w2→1 w2→3 w3→2 w3→4 w4→3 Fig. 8. Example MRF network including edge weights, where each node has a probability to belong to one out of 3 classes. Note that the weights are also specific for each marking type and are thus vectors. be further explained below. Since finding the exact probabilities within an MRF is computationally expensive, the solution is typically approx- imated with Loopy Belief Propagation (LBP) [17] [18], which updates the probabilities by passing messages on the edges. After all messages have been sent, new probability values are calculated from the probabilities of the previous iteration and the weighted messages. Below, we will explain this process in detail. With pi being the vector of probabilities that node i belongs to each of the marking types, the message that will be sent from node i to node j in the next iteration, ˆm i→j, equals: ˆm i→j = m i→j ||m i→j|| , where (4) m i→j = pi + k∈N∧k=j wk→i ˆmk→i. (5) In these equations, m denotes the vector containing the mes- sages for each marking type, N is the set of nodes connected to node i and wk→i denotes the vector of edge weights for each marking type on the edge from node k to node i. Next, the probabilities are updated. First a belief value b is calculated by incrementing the current probabilities with the weighted sum of all incoming messages. The new probabilities
  8. 8. that node j belongs to each of the marking types, p j, are then found by normalizing the obtained belief values. bj = pj + i→j wi→j ˆmi→j and p j = bj ||bj|| . (6) This process is repeated until all probabilities converge, or the maximum number of iterations is reached. Based on the newly acquired probabilities, all segmented markings are assigned to a single marking class, by selecting the class with the highest probability. Figure 6 displays the input and output of this pro- cessing stage. Because the orientation of (partially damaged) markings is subject to noise, each marking element is assigned an orientation co-determined by its context. In particular, we fit a line using least-squares to neighboring elements of the same class and determine the dominant orientation of this line. 5) Model Evaluation: To recognize high-level marking elements (e.g. lines of stripes, crosswalks, give-way situations) and to remove falsely detected markings, we apply a marking model that exploits the periodic placement patterns in which most road markings occur. For example, pedestrian crossings are constructed by a number of equally-sized rectangles at reg- ular intervals with equal orientations, and give-way situations are constructed by multiple shark teeth, located along a line, where each triangle is oriented parallel to the driving direction. This placement model is evaluated as follows. First, we calculate the distances between all markings and their close- by neighbors on their major and minor axes, where the major axis is aligned with the orientation of the contour as extracted in the previous steps. Second, we connect all markings that adhere to the adjacency and orientation rules for the specific marking type, where we ignore marking pairs deviating from those rules. In this stage, elements are connected when they are located at the expected locations within the pattern. These positions consist of the locations of neighbors at once or twice the period of the specific marking pattern. For example, two lane markings are connected, if they are positioned at a predefined distance interval on their major and minor axes within set deviation. Last, markings belonging to the same high-level element (such as a pedestrian crossing) are then grouped together using connectivity-based clustering, which creates clusters from markings which are pair-wise connected. Groups of recognized markings that contain too few elements can be removed to reduce the amount of falsely positives. This model evaluation results in a single position, orienta- tion and width for each found marking element, thereby allow- ing for the recognition of high-level elements (i.e. crosswalks, give-way situations and lane divisions). The evaluation of the placement model is illustrated by Fig. 9. B. Multi-View Positioning When all images in a certain region of interest are processed (e.g. in a city), the detected markings are processed by the multi-view positioning block. Because the images have a global position (in GPS coordinates, logged at capture time), the position of each detected marking can be calculated from Fig. 9. Illustration and evaluation of the placement model. Left: crop from input IPM image. Middle: found markings within the cropped region (pedestrian crossing element = red, block marking element = green, shark teeth = black, other = blue). Right: output. The pedestrian marking segments are coupled together, and also the block markings that are located on the same line. The two erroneously found block markings are ignored as they do not fit the model. 0 5 10 15 20 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Distance from car [m] DistanceWeight Distance Weighting Function Fig. 10. Distance weighting function constructed by fitting a Gaussian function to the AUC. The weight function is maximal around 6 m, i.e. this is the range where the least markings are missed. the relative position in the image due to the IPM transfor- mation, which maps the image plane to the horizontal plane at ground level. This results in a marking map on which multiple detections of the same marking are present, where all detections originate from different images. To obtain a result where each marking is detected only once, connectivity-based clustering is used again to merge detections. The connectivity criterion is based on the size and orientation of the marking. Specifically, Marking A is connected to Marking B if the centroid of A is within the shape of B. Each shape originates from the single image detection processing stage, and is determined by its cluster size (number of elements times period) and its orientation. So for each cluster size, the size of the shape is defined from [1]. After this procedure, clusters with only one element (mark- ing detected in only one image) are discarded, assuming that markings are detected at least twice in all images. For each cluster, the final crosswalk position is calculated from the weighted mean of all detections in the clusters. This weight is computed by fitting a function to a performance metric as a function of the detection distance (i.e. pixel distance from image center). In this case, we have determined the AUC at
  9. 9. different detection ranges and fitted a Gaussian function on the resulting values. From the fitted curve in Figure 10, it can be observed that the performance is optimal at 6 m. For distances smaller than the optimum, markings are occluded by nearby objects or the car itself. At distances larger than the optimum, markings can be occluded as well, but also suffer from perspective distortions. VI. EXPERIMENTAL SETUP The system consists of two major stages, the single-image marking recognition and the multi-view positioning. We eval- uate the system at both stages, using different datasets and configurations. The datasets consist of the aforementioned street-level panoramic images with corresponding metadata (i.e. car orientation, global position, time stamp). As general performance metrics we use the true positives (TPs), representing the correctly detected markings, false pos- itives (FPs), indicating the falsely found markings, and false negatives (FNs), referring to the missed markings. Addition- ally, we apply recall-precision curves to show the performance for the specified marking types. The recall denotes the ratio of found and missed markings and the precision the ratio between the correctly and falsely found markings. In an ideal system, both the recall and precision are unity (all markings found and no false positives.) The first dataset contains 263 images of a large city in the Netherlands, in which the ground-truth positions of pedestrian crossings, block-shaped, give-way and stripe marking elements have been annotated. The set consists of 834, 1573, 805 and 771 single-marking elements for the previously mentioned four marking types, respectively. This set is used to test the single-image marking recognition performance, where we specifically assess (1) the impact of using context information to enhance recognition rates, (2) evaluate the influence of detection distance from the car/capture position, and (3) to analyze the contribution of the marking model evaluation. Next to testing to the recognition performance of single- marking elements, we also specifically evaluate detection rate and positioning accuracy of clusters of markings, such as crosswalks or give-way lines. We calculate the average positioning error for each marking type and also create a curve of the detected percentage as a function of the distance from the ground truth. The single-image marking recognition is executed on the first dataset both with and without the context inference step. The dataset contains 25 pedestrian crossings, 60 block lines, 39 give-way lines and 34 dashed lines. To test the performance of the complete marking recognition pipeline, we have created a large dataset in the municipality of Lingewaard with 84,387 images corresponding to 400 km of road. To test both the recognition and positioning performance of marking clusters, global GPS positions along with the size and orientation of the markings are used as ground truth for this set. We focus on the recognition and positioning accuracy of pedestrian crossings, give-way markings and block markings, which occur with amounts 105, 729 and 141, respectively. After the evaluation of the complete road-marking recogni- tion pipeline, we perform an additional experiment in order to relate our detected markings to traffic signs. Road markings and traffic signs often coexist, such that combined databases can be used for the evaluation of redundant and complemen- tary information. In this case, we create such a database for the consistency checking of road markings and traffic signs, concentrating on crosswalks and give-way markings. Markings and signs are consistent if they are within a set distance and have matching orientations. For the traffic-sign recognition, we use the system described in [3]. The consistency evaluation is performed on the full dataset of 84,387 images. VII. DETECTION AND POSITIONING RESULTS This section presents the results of (1) single road-marking recognition performance, (2) marking cluster detection perfor- mance, (3) 3D positioning (full pipeline) and (4) consistency checking. A. Individual Road-Marking Recognition Performance Figure 11 shows the recall-precision curves for pedes- trian crossings, block-shaped markings, give-way triangles and stripe markings. The performance of each marking type has been evaluated for three pipeline stages (SVM, context inference and model evaluation) and for two detection ranges (distance within 10 m and 20 m of the car). We first analyze the recognition performance of individual markings for the SVM classification and then evaluate the performance impact of the added context inference and model evaluation. SVM Performance: Within a range of 10 m from the car, over 90% of all markings are detected, with the exception of give-way markings (> 80%). However, when a distance of 20 m is considered, the performance drops significantly for crosswalks, give-way and stripe markings. First, this is due to farther away markings often tending to be occluded by other objects, such as vehicles. Second, perspective distortions occur at larger distances from the capturing location, as the IPM assumes a flat ground plane (although most roads are curved). Influence of context inference: When context inference is used, the performance is equal or in most cases better than the SVM performance. For pedestrian crossings, block markings and give-way triangles, the recognition performance is only increased slightly. Looking at stripe markings, the impact of context inference on the recognition performance is considerable. This is due to the fact that stripe contours are easily distorted, resulting in a low probability output of the SVM. By exploiting the contextual placement patterns, stripe markings with low probabilities can be boosted. Within 10 m, the same recall of above 90% can be achieved at a much higher precision (>90%). For a distance of 20 m, the precision can be improved as well, albeit at a lower recall. Result of model evaluation: As mentioned before, the model evaluation is mainly used for clustering and removal of false positives. The output probabilities after context inference are set to zero if markings do not adhere to the model. This improves the final precision in most cases, but also lowers the
  10. 10. 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Recall−precision performance for Crosswalks SVM (<10m) +Context (<10m) +Model (<10m) SVM (<20m) +Context (<20m) +Model (<20m) (a) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Recall−precision performance for Blocks SVM (<10m) +Context (<10m) +Model (<10m) SVM (<20m) +Context (<20m) +Model (<20m) (b) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Recall−precision performance for Give−way SVM (<10m) +Context (<10m) +Model (<10m) SVM (<20m) +Context (<20m) +Model (<20m) (c) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Recall−precision performance for Stripes SVM (<10m) +Context (<10m) +Model (<10m) SVM (<20m) +Context (<20m) +Model (<20m) (d) Fig. 11. Recall-precision curves for single-image marking recognition for (a) pedestrian crossings, (b) block markings, (c) give-way markings and (d) stripe markings. For each marking, the performance is shown after SVM classification (blue), context inference (red) and model evaluation (green) and both detection ranges: within 10 m (solid line) and 20 m (dashed line). recall slightly. Since marking clusters should at least have 2 marking elements, single isolated markings are discarded, even though they might be correct. Furthermore, due to perspective distortions, distances between markings are altered. The IPM assumes a flat ground plane, but roads are often curved (e.g. for drainage). Markings that are far away from the car are particularly affected by these distortions. B. Road-Marking Cluster Performance The model evaluation produces road-marking clusters with a position, orientation and size. Table I shows the recognition results of road-marking clusters within 10 m of the car. Con- sidering the detection results, we observe that without context inference, between 79% and 90% of the clusters are found, except for block markings of which only 40% of the clusters is found. With context inference, the detection performance is significantly improved, where 88% of the crosswalks and over 90% of the other markings are found. The positioning accuracy, which is expressed as the mean of the distances from the ground-truth cluster positions, is considerably improved by using context inference, except for give-way markings, where it is marginally worse (2 cm). However, this is still within the significance of this measurement. Table II shows the same detection and accuracy metrics TABLE I MARKING GROUP PERFORMANCE FOR DETECTIONS WITHIN 10 M Situation Found # Found % False det. Pos. Error m Pedestrian Crossing 22 88% 1 0.59 m with context 22 88% 3 0.53 m Block Markings 24 40% 3 0.77 m with context 58 97% 5 0.16 m Give-way Markings 31 79% 1 0.20 m with context 35 90% 6 0.22 m Stripe Markings 30 88% 4 0.73 m with context 32 94% 5 0.15 m TABLE II MARKING GROUP PERFORMANCE FOR DETECTIONS WITHIN 20 M Situation Found # Found % False det. Pos. Error m Pedestrian Crossing 51 56% 3 0.60 m with context 51 56% 5 0.55 m Block Markings 63 38% 39 1.14 m with context 154 89% 32 0.40 m Give-way Markings 90 58% 3 0.46 m with context 112 72% 12 0.34 m Stripe Markings 54 49% 7 0.99 m with context 59 53% 9 0.63 m for a range of 20 meters from the car. Overall, context inference still improves both the number of found markings and the positioning accuracy, but we observe that the results are not improved for pedestrian crossings and stripe markings
  11. 11. (a) (b) Fig. 12. Two situations where marking clusters are missed. In (a) the pedestrian crossing is occluded, in (b) the stripe markings are distorted by the IPM. 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Percentage of detected markings as a function of distance from ground truth Detection distance [m] Detected% SVM +Context Fig. 13. Detected percentage as a function of distance from ground truth. (56% and 53% found). For these two marking types, we now investigate the causes for the lower detection performance. Figure 12 shows two cases where crosswalks and stripe markings are not detected. Because crosswalk markings are relatively large, they tend to be occluded by other objects if they are farther away from the car. Stripe markings are relatively small and thus are more susceptible to the distortions of the IPM. This implies that even without context inference, all undistorted markings have been detected, such that added context information does not improve detection rates. Because most markings are detected within 10 m and the results from individual images are merged in the next stage, almost all markings can still be detected, as the same marking occurs in multiple images. Regarding positioning accuracy, Figure 13 gives the percent- age of detected markings that are within a certain range of the ground truth. From these curves, it is clear that the context in- ference improves the positioning accuracy of marking clusters in an image. When the pipeline is used only with the SVM classification, roughly 60% is detected within 1 meter and above 80% within 2 meters. Using context information, this can be improved to above 80% within 1 meter and above 90% within 2 meters. Positioning accuracy is particularly important for the multi-view positioning, which is the next step in the pipeline and is evaluated in the next section. TABLE III MULTI-VIEW POSITIONING RESULTS FOR CROSSWALKS, BLOCK MARKINGS AND GIVE-WAY TRIANGLES. Situation Found # Found % False det. Accuracy m Pedestrian Crossing 89 85% 3 0.60 m Block Markings 130 92% 41 0.50 m Give-way Markings 581 80% 23 0.38 m C. 3D Positioning Results Table III shows the detection and positioning accuracy results for the multi-view positioning of crosswalks, block markings and give-way triangles. Overall, the recognition rate is equal or above 80%. For each marking type, we will evaluate the most common causes for false negatives (misses) and false positives (erroneously found markings), provided that they are significant. It should be noted that the impact of missed markings (FNs) is larger than falsely found markings (FPs). With little manual effort, all found detections of the pipeline can be inspected and accepted or discarded accordingly. For all classes, marking abrasion is the most common cause for missed detections. Furthermore, markings are missed if there are only a few images which capture them, often occurring when they are far away. Below, we specifically address each marking type. The main cause for missed detections in crosswalks is the merging of clusters. In most cases, a smaller crosswalk of only two elements is located closely to a larger crosswalk. Due to positioning errors, caused by perspective distortions and GPS position errors, these clusters are ’connected’ and are merged, but marked as missed because both markings are annotated. Figure 14a shows an example of two smaller two- stripe crosswalks that are merged to the larger crosswalks in the middle. The detection rate of block markings is already high (92%). However, there are a lot of false positive detections. Closer inspection of the false positives reveals that the pipeline recognized tile patterns in gardens and on driveways and sidewalks, as shown in Figure 14b. Give-way triangles have the lowest detection rate compared to the other markings and also have the highest number of elements in the dataset. As there are around 150 missed markings, we create a breakdown of the causes of this high number. Over 50% of all missed give-way markings were heavily damaged, thus resulting in the inability of detection by the presented system. The second cause at 23% is due to give- way markings being present on bike roads/lanes. These lanes are often located far from the road/capture location, resulting in distorted and occluded markings. At 13%, cluster merging is the third most probable cause of FNs and is comparable to the merging aspects described for crosswalks. Other minor causes include occlusions and IPM distortions (e.g. when markings are on slopes). Figure 14c and Figure 14d show cases of damaged markings and far-away capture locations, respectively.
  12. 12. (a) FNs (red) due to the merging of close-by pedestrian crossings (blue). (b) FP due to similar shapes and patterns. (c) FN due to severely damaged markings. (d) FN due to far away capture locations (blue dots). Fig. 14. False and missed detections of 3D-positioned markings. D. Road-Marking and Traffic-Sign Co-Occurrence Validation The presented road-marking recognition system is applied in conjunction with the existing traffic-sign recognition sys- tem [3] to check the correct co-occurrence of signs and markings, in particular for pedestrian crossings and give-way situations. The goal of this experiment is to (1) evaluate the recognition of traffic situations where both marking and signs occur, (2) explore consistency checking on databases containing the positions of signs and markings. The last aspect aims at identification of situations where expected signs TABLE IV OVERVIEW OF THE COMPLETE COMBINED RESULTS AND FOR INDIVIDUAL SIGN AND MARKING RECOGNITION ONLY. Situation Approach Correctly det. False det. Pedestrian crossings Combined recognition 53 100% 8 Marking recognition only 51 96.2% 4 Sign recognition only 49 92.5% 4 Give-way situations Combined recognition 694 96.5% 28 Marking recognition only 500 69.5% 23 Sign recognition only 598 83.1% 5 TABLE V OVERVIEW OF THE CONSISTENCY CHECKING RESULTS. Situation Consistent Pedestrian crossings 34 / 53 64.2% Give-way situations 349 / 719 48.5% or markings are missing, potentially leading to dangerous traffic behavior. For instance, a pedestrian crossing should be indicated by both signs visible from all driving directions and a sufficiently large crosswalk marking. Combined Recognition: Table IV shows the number of recognized traffic situations for marking-, sign- and combined- recognition. It should be noted that this table considers traffic situations and not single-marking clusters or signs, such that multiple signs and markings denoting the same situation are merged. The results show that marking- and sign-recognition complement each other when considering the identification of situations, especially taking into account that a significant part of the situations is denoted exclusively by only a road marking or a sign. Consistency Checking: Considering Table V, two-third of the pedestrian crossings and about half of the give-way sit- uations are marked as consistent, i.e. expected markings and signs are both detected. Compared to manual safety inspection, this approach reduces the number of situations that has to be verified with about a factor of two (or better). VIII. CONCLUSIONS AND FUTURE WORK In this paper we have presented a road-marking recogni- tion system to create road-marking maps from street-level panoramic images. Next to the general characteristics of marking recognition system, such as segmentation and contour classification, the proposed system contributes to the perfor- mance by several aspects. This system is able to (1) recognize a variety of markings, (2) exploit context relations between individual marking elements, (3) retrieve marking clusters, and (4) find the global positions of the recognized mark- ings. These contributions have resulted in a system design with two consecutive processing stages. First, each image is processed individually to identify the present markings. This stage applies Inverse Perspective Mapping (IPM), segmenta- tion, learning-based contour classification with SVMs, context inference and model evaluation on each individual image sub- sequently. Context inference is realized by modeling the SVM results in a Markov Random Field with weighted edges and
  13. 13. performing inference with Loopy Belief Propagation. From the context inference results, marking clusters are constructed by applying marking placement models, which exploit marking design rules provided by traffic legislation. In the second processing stage, recognition results from the separate images are combined with connectivity-based clustering to find the 3D positions of the markings. First the single-image processing stage has been evaluated for crosswalks, block-, stripe- and give-way markings. The base performance (SVM classification) within 10 m of the car has been found sufficient for crosswalks, give-way and stripe markings (≥79%), but not for block markings (40%). The use of context inference strongly improves both the detection performance and positioning accuracy of all marking types to above 88% (above 90% for most marking types) and finds them within a few decimeters of the ground-truth locations. The performance within 20 m of the car is lower than close- by detections, even with context inference for some types. However, because actual markings are captured in multiple images, this is not an issue, as is discussed below in the results of the multi-view positioning. Applying the full processing pipeline to a dataset of a complete municipality in the Netherlands including more than 84,387 images (corresponding to more than 400 km of road), yields promising results. In this set, pedestrian crossings, block- and give-way markings have been recognized at 85%, 92% and 80%, respectively. Closer inspection of missed de- tections reveals that most undetected markings are severely damaged (give-way markings in particular), or are located far from the capture location, which results in more occlusions and perspective distortions. Manually verifying all detections of the system for removing false positives, this system is per- forming sufficiently well for creating road-marking maps for the use of traffic safety inspection or navigation applications. Exploiting high-level context information from other sources such a traffic signs, a significant amount of the missed detections can be identified with traffic situation analysis. Combined databases of markings and signs (1) supply a larger number of traffic situations than using a single source, and (2) allow for consistency checking of sign and marking co- occurrences, which directs manual traffic safety inspection to aberrant cases. On our dataset, we have found 64.2% for crosswalk situations and 48.9% for give-way situations to be consistent, thereby reducing the manual verification by a factor of two or more. Looking to the results, future work should be geared towards two objectives: (1) specific processing of damaged markings and (2) support for other marking types, such as lines, arrows and speed numbers. Improving segmentation performance and exploiting high-level context information of other markings and signs, can help to increase recognition performance and give an additional indication of damaged markings. Shapes of alternative marking types can be learned by the system and context relations and marking models can be setup. Besides development and validation of a road-marking recognition system in an industrial environment, this work has initiated and contributed to several international publications: • L. Hazelhoff, I. Creusen, T. Woudsma, and P.H.N. de With, Combined generation of road marking and road sign databases applied to consistency checking of pedestrian crossings, Ac- cepted for: IAPR Int. Conf. on Machine Vision and Applica- tions, 2015. • T. Woudsma, L. Hazelhoff, I. Creusen, and P.H.N. de With, Automated generation of road marking maps from street-level panoramic images, Submitted to Int. Conf. on Intelligent Trans- portation Systems, 2015. • L. Hazelhoff, I. Creusen, T. Woudsma, and P.H.N. de With, Exploiting automatically generated databases of traffic signs and road markings for contextual co-occurrence analysis, Submitted to Int. Journal on Electronic Imaging, 2015. REFERENCES [1] CROW, “Richtlijnen voor de bebakening en markering van wegen,” 2005. [2] C. Li, I. Creusen, L. Hazelhoff, and P.H.N. de With, “Detection and recognition of road markings in panoramic images,” in ACCV Workshop on My Car Has Eyes - Intelligent Vehicles with Vision Technology, 2014. [3] L. Hazelhoff, I. Creusen, and P. H. N. de With, “Exploiting street-level panoramic images for large-scale automated surveying of traffic signs,” Machine Vision and Applications, vol. 25, no. 7, pp. 1893–1911, 2014. [4] S. Vacek, C. Schimmel, and R. Dillmann, “Road-marking analysis for autonomous vehicle guidance.” in EMCR. [5] M. Fu, X. Wang, H. Ma, Y. Yang, and M. Wang, “Multi-lanes detection based on panoramic camera,” in Control Automation (ICCA), 11th IEEE International Conference on, June 2014, pp. 655–660. [6] J. Huang, H. Liang, Z. Wang, T. Mei, and Y. Song, “Robust lane marking detection under different road conditions,” in Robotics and Biomimetics, 2013 IEEE International Conference on, Dec 2013, pp. 1753–1758. [7] S. Yenikaya, G. Yenikaya, and E. D¨uven, “Keeping the vehicle on the road: A survey on on-road lane detection systems,” ACM Comput. Surv., vol. 46, no. 1, pp. 2:1–2:43, Jul. 2013. [8] P. Foucher, Y. Sebsadji, and J.-P. Tarel et al., “Detection and recognition of urban road markings using images,” in Intelligent Transportation Systems, International IEEE Conference on, 2011, pp. 1747–1752. [9] H. Li, M. Feng, and X. Wang, “Inverse perspective mapping based urban road markings detection,” in Cloud Computing and Intelligent Systems (CCIS), International Conference on, vol. 03, 2012, pp. 1178–1182. [10] B. Qin, W. Liu, and X. Shen et al., “A general framework for road marking detection and analysis,” in Intelligent Transportation Systems, 2013 16th International IEEE Conference on, 2013, pp. 619–625. [11] L. Hazelhoff, I. Creusen, T. Woudsma, and P. H. N. de With, “Combined generation of road marking and road sign databases applied to con- sistency checking of pedestrian crossings,” in 14th IAPR International Conference on Machine Vision Applications (submitted to), 2015. [12] J. Rebut, A. Bensrhair, and G. Toulminet, “Image segmentation and pattern recognition for road marking analysis,” in Industrial Electronics, IEEE Int. Symp. on, vol. 1, May 2004, pp. 727–732. [13] T. Wu and A. Ranganathan, “A practical system for road marking detection and recognition,” in Intelligent Vehicles Symp. (IV), IEEE, June 2012, pp. 25–30. [14] T. Veit, J.-P. Tarel, P. Nicolle, and P. Charbonnier, “Evaluation of road marking feature extraction,” in Intelligent Transportation Systems, 2008. 11th International IEEE Conference on, Oct 2008, pp. 174–181. [15] J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” in ADVANCES IN LARGE MARGIN CLASSIFIERS. MIT Press, 1999, pp. 61–74. [16] H.-T. Lin, C.-J. Lin, and R. Weng, “A note on platt’s probabilistic outputs for support vector machines,” Machine Learning, vol. 68, no. 3, pp. 267–276, 2007. [17] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1988. [18] K. P. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation for approximate inference: An empirical study,” in Proceedings of the 15th Conf. on Uncertainty in Artificial Intelligence, 1999, pp. 467–475. [Online]. Available: