Visual Search


Published on

Image indexing using edge-detection and Hausdorff distance computation.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Visual Search

  1. 1. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 --- VISUAL SEARCH Lov Loothra, Ashish Goel, Prateek and Shikha Vashistha Department of Information Technology and Computer Science Engineering Amity School of Engineering and Technology, BijwasanAbstract – This paper describes the on a codification of the image, trying to work on aimplementation of an application which accepts an minimal set of data which respects (and allows toimage as input from the user and finds images that reconstruct) the most important characteristics of theare similar to it from a specified directory. Similar image. Besides, codification usually allows theimages may be defined as images that bear an deletion of redundant information and it is easy toexact (pixel to pixel) resemblance to the query work on the improvement and analysis of the imageimage or images that depict some likeness to the directly on the codified representation of the same.query image in terms of their intensities (color),overall shape (texture) or a combination of these Obviously, the reduction level of the image originaltwo factors. The application also aims to index or data can be associated to a relative loss ofsort the images of the database in order of their information. It is always convenient that thesimilarity to the query image, i.e., from the most codification admits inversion (i.e., recovering thesimilar to the least similar image. original image or an approximation of that original image with the slightest error). Also, despiteIndex Terms – edge detection, hausdorff distance, modifications made to the image, such as color, scaleimage codification, image comparison, image or texture changes, it would be important to maintainindexing, image similarity codification invariability. But this, at the same time, requires the codified representation to store some1. INTRODUCTION extra information to make such an inversion possible.As of now, almost all popular search engines are text Traditionally, the problem of image similarityor tag based, i.e., they search for a web page, an analysis – i.e., the problem of finding the subset of animage, a video etc. on the basis of keywords used to image bank with similar characteristics to a givendescribe/store them. This provides for extremely image – has been solved by computing a "signature"accurate and practical results when we want to search (codification) of each image to be compared, so then,for a particular topic or information contained in a correspondence between the signatures could beweb page. But the same method usually leads to analyzed by means of a distance function thatsomewhat inaccurate results when we’re specifically measures the degree of approximation between thesearching for images, videos or related media for the two given signatures.simple reason that one person’s description may notbe accurate enough to cover all keywords. Traditional methods to compute signatures are based on some attributes of the image (for example, colorInstead, if we use an image itself as the search histogram, recognition of a fixed pattern, number of‘keyword’ and check for images that are similar to it, components of a given type, etc). This "linearity" ofwe’re bound to get more accurate results. This is the signature makes it really difficult to obtain dataespecially useful when the user knows what he wants about attributes which were not considered in theto obtain as a result of the search: it could be an signature (and which could be relevant to theimage similar to the one he inputs, an image of higher similarity or difference between two images). Forquality (better resolution) or an image that ‘contains’ instance, if we only take into account colorthe image he’s input. histograms, we would not take into account image texture, nor we would be able to recognize similar2. IMAGE & IMAGE SIMILARITY objects painted in different colors.A digital image is a function f (x, y) which hasbeen discretized in spatial coordinates and brightness. There are several well-researched methods in theIt can also be represented as a matrix, in which the domain of image processing that can be used torates of line and column identify a point in the image, formulate a working visual-query based databaseand the content value in the matrix identifies the level search application. The techniques used in our projectof gray (or color) in that point (pixel). are briefly described below. Furthermore, this paper elucidates the nuances of the actual implementationThe volume of the required data for the storage (and of the visual search application.processing) of an image, makes it convenient to work
  2. 2. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---3. HASHING 7. DETAILS OF IMPLEMENTATIONA cryptographic hash function is a transformation The application, while searching, considers:that takes an input (or message) and returns a fixed-  Exact match(es) (of the Source Image)size string, which is called the hash value. The ideal  Colorhash function has three main properties - it is  Texture (Shape)extremely easy to calculate a hash for any given data,it is extremely difficult or almost impossible in a The first point involves searching the target directorypractical sense to calculate a text that has a given for an image or for images that are exact replicas ofhash, and it is extremely unlikely that two different the query image. This is accomplished using themessages, however close, will have the same hash. hashing technique (explained below). The second and third points involve searching for non-exact imagesBy computing and then comparing the hash of each that bear some degree of resemblance to the queryimage, it can be quickly ascertained whether the image. For this, the images (query and database) areimages were identical or not. first subjected to the edge-detection filter and, subsequently, the Hausdorff metric of the filtered4. COLOR MAP database images with respect to the query image isA pixel by pixel image comparison of two images can computed. Also, the generated Color Maps of thealso determine whether two images are alike. This, images are compared trivially to generate differencehowever, becomes highly inefficient for large images metric. These are used to determine the degree ofand at the same time doesn’t take into account the similarity. The nuances of the implementation of theregional or spatial similarity or dissimilarity. Hence above techniques are detailed below.we use Color Maps. In our implementation, a ColorMap represents an image divided into blocks. These 7.1 HASHING TECHNIQUEblocks (of a predetermined size) are made of a group The SHA hash functions are a set of cryptographicof pixels and are used to represent the average pixel hash functions designed by the National Securityintensity of a particular area of the image. Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. SHA standsCorresponding blocks of two image maps can then be for Secure Hash Algorithm. The five algorithms arecompared to determine similarity or dissimilarity. denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. The latter four variants are sometimes5. EDGE DETECTION collectively referred to as SHA-2. SHA-1 produces aEdges characterize boundaries and are, therefore, a message digest that is 160 bits long; the number inproblem of fundamental importance in image the other four algorithm names denote the bit lengthprocessing. Edges in images are areas with strong of the digest they produce. The classes used forintensity contrasts – a jump in intensity from one computing these hashes are predefined inpixel to the next. Detecting the edges of an image System.Security.Cryptography [6] whichsignificantly reduces the amount of data and filters can be freely used in any .NET or Visual Studioout useless information, while preserving the implementation.important structural properties in an image. Hashing is a faster method to compare the images to6. HAUSDORFF DISTANCE allow the tests to complete in a timely manner, rather than comparing the individual pixels in each imageThe Hausdorff distance [1] measures the extent to using GetPixel (x, y) [5][6]. Hashes of twowhich each point of a ‘model’ set lies near some point images should match if and only if the correspondingof an ‘image’ set and vice versa. Thus, this distance images also match. Small changes to the image resultcan be used to determine the degree of resemblance in large unpredictable changes in the hash. Thisbetween two objects that are superimposed on one property of the generated hashes can be used to findanother. Computing the Hausdorff distance between exact matches (duplicates) of the query image.all possible relative positions of the query image andthe database image can solve the problem of detecting The ComputeHash [6] method of this class takes aimage containment. The Hausdorff distance byte array of data as an input parameter and producescomputation differs from many other shape a 256 bit hash of that data. By computing and thencomparison methods in that no correspondence comparing the hash of each image, it would bebetween the query image and database image(s) is quickly able to tell if the images were identical or not.derived [1]. The method is quite tolerant of small The problem was hence to device a way to convertposition errors as occur with edge detectors and other the image data stored in the Bitmap [5][6] objects tofeature extraction methods. Moreover, the method a suitable form for passing to the ComputeHashextends naturally to the problem of comparing a method, namely, a byte array. Theportion of a model against an image. ImageConvertor [6] class was thus used to allow -2-
  3. 3. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---us to convert the Image (or Bitmap) objects to the the gradient of this signal (which, in one dimension,hash-able byte array. is just the first derivative with respect to t) we get a signal as shown by [FIG 7.3.2].Examples: [7.1.1], [7.1.2]. Clearly, the derivative shows a maximum located at7.2 COLOR MAPS the center of the edge in the original signal. This method of locating an edge is characteristic of theColor Maps can be easily and efficiently generated ‘gradient filter’ family of edge detection filters andfor small images by taking the respective Red, Green includes the Sobel method [3]. A pixel location isand Blue averages of a Block (16x16 in our declared an edge location if the value of the gradientimplementation) at a time dynamically using: exceeds some threshold. As mentioned before, edges will have higher pixel intensity values than thoseIntnstyAvg = surrounding it.(IntnstyAvg * (p – 1) + CIntnsty)/p Based on this one-dimensional analysis, the theorywhere p represent the current pixel location, and can be carried over to two-dimensions as long asCIntensity represents the present calculated intensity there is an accurate approximation to calculate thevalue. derivative of a two-dimensional image. The Sobel operator performs a 2-D spatial gradient measurementHowever this method fast deteriorates as image size on an image. Typically it is used to find theincreases and the number of pixels go up to a few approximate absolute gradient magnitude at eachmillion. The most practical and efficient solution is to point in an input grayscale image.Scale the image down to a fixed size. For this weneed to know the scale factor, sf, based on the image The Sobel edge detector uses a pair of 3x3dimensions and the size itself: convolution masks [3], one estimating the gradient inMAX_DIM = Max(Img_Width, Img_Height) the x-direction (columns, Gx) [FIG 7.3.3] and the sf = FIXED_SIZE / MAX_DIM other estimating the gradient in the y-direction (rows, Gy) [FIG 7.3.3]. A convolution mask is usually muchSo therefore, we have: smaller than the actual image. As a result, the mask is slid over the image, manipulating a square of pixels at New_Width = sf * Img_Width a time. An approximate magnitude can then be New_Height = sf * Img_Height calculated using: |G| = |G x| + |Gy| [3]Once an image is scaled the Intensity Average for a The actual algorithm involves the computation of theblock is computed and stored. The intensity of a grayscale of the image (if required) followed by theparticular pixel is obtained by the trivial application of the gradient masks.GetPixel(x, y) method. These stored values ofthe regional blocks (say A1, B1 for two images A, B) In our implementation, we used the Bitmap class tocan then be compared by a simple absolute difference represent the image. The GetPixel(x,y) methodscaled over the 8-bits used to represent the color was used to obtain the value of the Color[5][6] ofcomponent (RGB): the pixel located at x, y. The working loop traversed the entire dimensions of the image and obtained theDifference = Color value (24 bit value for modern images). By1 - |Blk_A1_Avg - Blk_B1_Avg| / 255 taking the average of the RGB component of the Color value, we converted it to an 8-bit grayscale. The computed value was then stored in a matrix as aExamples: [7.2.1], [7.2.2]. simple integer between 0 – 255 for easy recall.7.3 SOBEL EDGE DETECTION The active pixel region, consisting of the currentThere are many ways to perform edge detection. pixel location (say x, y) was then subjected to aHowever, most of the different methods may be gradient. The region included 8 pixels adjacent to thegrouped into two categories: gradient and Laplacian. active pixel for a total of 9 pixels which could beThe gradient method detects the edges by looking for directly correlated (using Hadamard product) with thethe maximum and minimum in the first derivative of 3x3 gradient matrices and summed to produce thethe image. The Laplacian method searches for zero gradient values in x and y directions. The computedcrossings in the second derivative of the image to find gradient was then compared to the threshold of the 8-edges. bit Bitmap, i.e., 0 & 255 and an appropriate intensity value was assigned.Suppose we have a signal, with an edge shown by thejump in intensity as shown in [FIG 7.3.1]. If we take Examples: [7.3.4], [7.3.5]. -3-
  4. 4. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---7.4 CANNY EDGE DETECTION along the edge in the edge direction and suppress any [2] pixel value (set it equal to 0) that is not considered toThe Canny edge detection algorithm is known to be an edge (i.e., has a value less than its neighbor).many as the optimal edge detector. It enhances the This will give a thin line in the output image. This ismany edge detectors already available. It is important accomplished by simply comparing the current pixelthat edges occurring in images should not be missed value under consideration with its two nearestand that there be NO responses to non-edges. neighbors in one (of the four possible) direction thatLikewise, it is also important that the edge points be has been determined previously. The lower valueswell localized. In other words, the distance between can be ignored.the edge pixels as found by the detector and the actualedge is to be at a minimum. Finally, hysteresis is used as a means of eliminating streaking [2]. Streaking is the breaking up of an edgeThe detector draws upon the implementation of the contour caused by the operator output fluctuatingSobel filter discussed previously. But before applying above and below a particular threshold. If a singlethe Sobel filter to the image, there is a need to threshold, T1 is applied to an image, and an edge haseliminate noise from the image. This noise removal is an average strength equal to T1, then, due to noise,done with the help of a Gaussian filter which there will be instances where the edge dips below thebasically blurs the image. This is done by applying a threshold. Equally it will also extend above theGaussian mask over the image. For the purpose of threshold making an edge look like a dashed line.implementation, we used a 3x3 mask [FIG 7.4.1] andslid it over the image; manipulating a square of pixels To avoid this, hysteresis uses 2 thresholds: high andat a time by simple convolution. low. Any pixel in the image that has a value greater than T1 is presumed to be an edge pixel, and isAfter the application of the Gaussian and Sobel marked as such immediately. Then, any pixels thatfilters, we obtain an image (over an 8-bit grayscale) are connected to this edge pixel and that have a valuethat approximates the intensity change areas of the greater than T2 are also selected as edge pixels. Toimage. The problem statement now is to remove the follow an edge, start with a gradient of T2 and stopgray factor which is a local maximum but a non- when you get a gradient below T1. This step is verymaximum when viewed w.r.t. its neighbors. This is similar to the following of edges and suppression ofknown as non-maximum suppression and is done by non-maximums and hence can be clubbed together indetermining the edge direction and then following it the final remove the regional non-maximums. This step wasclubbed with the implementation of the Sobel filter as Example: [7.4.3].the direction could be trivially deduced as: θ =tan-1 Gy/Gx, with appropriate exceptions being 7.5 HAUSDORFF DISTANCE COMPUTATIONmade when Gx and/or Gy compute to 0, as:orientation = (Gy == 0) ? 0 : 90. Given two finite point sets A = {a1,...ap} and B = {b1,}, the hausdorff distance betweenOnce the edge direction is known, the next step is to them is defined as:relate the edge direction to a direction that can betraced in an image. So if the pixels of a 5x5 image are H(A, B) = max(h(A, B), h(B, A)) [1]aligned as in [FIG 7.4.2], then, it can be seen bylooking at the centre pixel, a, there are only four where h(A, B) = max a є A min b є B || a -possible directions when describing the surrounding b || and || - || is some underlying norm on thepixels: points of A and B (for a visual representation of hausdorff distance refer [7.5.1]).  0 degrees (in the horizontal direction),  45 degrees (along the positive diagonal), The function h(A, B) is called the directed  90 degrees (in the vertical direction), or hausdorff distance [1] from A to B. It identifies the  135 degrees (along the negative diagonal) point a є A that is farthest from any point of B, andHence the obtained direction is now resolved into one measures the distance from a to its nearest neighborof these four directions depending on which direction in B (using the given norm || - ||, Euclidean in thisit is closest to. As an example, if the orientation angle case). That is, h(A, B) in effect ranks each point ofis found to be 3 degrees, make it zero degrees. The A based on its distance to the nearest point of B, andresolved angle is stored in an array for further then uses the largest ranked such point as the distancereference and recall. (the most mismatched point of A). Intuitively, if h(A, B) = d, then each point of A must be withinFollowing the computation of the edge directions, we distance d of some point of B, and there also is someare now in a position to perform non-maximum point of A that is exactly distance d from the nearestsuppression [2]. Therefore, we now need to trace point of B (the most mismatched point). -4-
  5. 5. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---The hausdorff distance, H(A, B), is the maximum example. Given a threshold distance τ and the pointof h(A, B) and h(B, A). Thus it measures the (Bx, By), we need only consider it for distancedegree of mismatch between two sets, by measuring computation from the point (Ax, Ay) iff: (Ax – τ)the distance of the point of A that is farthest from any ≤ Bx ≤ (Ax + τ) AND (Ay – τ) ≤ By ≤point of B and vice versa. Intuitively, if the hausdorff (Ay + τ). This speeds up computations for smallerdistance is d, then every point of A must be within a values of τ and limits the maximum possibledistance d of some point of B and vice versa. Thus hausdorff distance. Visual inaccuracies may occurthe notion of resemblance encoded by this distance is when seemingly similar but translated images arethat each member of A be near some member of B compared under this assumption.and vice versa. Unlike most methods of comparingshapes, there is no explicit pairing of points of A with 7.5.3 Termination at Infinite Distancepoints of B (for example many points of A may beclose to the same point of B) [1]. It can be noted that the outer loop of the algorithm (Loop 2) simulates the maximum distance retention.The extraction of the point sets from the images is This assumption builds on the previous assumption inbased on the result of the Canny Edge detector. The the sense that given the boundaries of the thresholdimplementation uses those points of the Canny- distance window, there may be a few points from Afiltered image that actually constitute an edge. These which are not in the vicinity of any point from B.points can be trivially determined by checking for Hence the computed distance will retain the initialonly the non-zero intensity pixels. value of infinity. Further consideration of any point hereafter is trivially meaningless as the maximumThe function h(A, B) can be trivially computed in value of infinity was retained.time O(pq) for two point sets of size p and qrespectively using the following Brute-Force 7.5.4 ScalingAlgorithm: Even after the application of the above techniques, the computation efficiency rapidly deteriorates as1. h = 0 image size increases and the number pixels go up to a2. for every point ai of A, few million. Hence, as was discussed in section 7.2, 2.1 shortest = INF; the image is scaled down to a fixed size on the basis 2.2 for every point bj of B of a scale factor to effectively reduce the number of dij = d (ai , bj ) pixels significantly. if dij < shortest then shortest = dij The above assumptions do affect the overall accuracy 2.3 if shortest > h then of the hausdorff metric but are useful nonetheless for h = shortest a much required speed-up.Our implementation used a slightly modified version 7.6 CONCLUSION AND OBSERVATIONSof the above algorithm which makes certainassumptions and eliminations based on the Hence, given any two images under consideration, wecomputation of the Hausdorff metric. The steps to can easily compute their hash-values and their mutualimprove computation time are summarized below: hausdorff metric (after Canny filter application). While on one hand the hash value comparison can7.5.1 Termination at Zero Distance trivially determine whether or not the given images are exact in all respects; the hasudorff metric signifiesThis builds on the fact that the result of the distance the ‘closeness’ of the two images. A hausdorff metricnorm (Euclidean norm was used in our of 0 indicates exactness as far as features areimplementation; i.e., d = √ {(x1 - x2)2 + (y1 concerned, whereas further values reveal increasing- y2)2}) can never be less than 0. Hence, once the dissimilarity between images.inner loop of the above algorithm (Loop 2.2)computes the shortest distance to be 0, we can safely This implementation can be extended intuitively tostop considering any further points from B to consider a database of images.compute the distance from the particular point ai єA). This considerably speeds up the computation time Examples:by skipping a significant chunk of unconsidered [7.6.1] Source Databasepoints. [7.6.2] Filtered images [7.6.3] Hausdorff distances computed w.r.t.7.5.2 Threshold Distance Window Firefox_Logo_Normal (Source Image)We can eliminate the need to consider a point if it lies Results sorted in order of decreasing similarity.outside a particular threshold distance window orblock. This can be understood with the help of an -5-
  6. 6. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---8. SUMMARY OF IMPLEMENTATION 9. REFERENCESA summary of the implementation is presented below [1] Daniel P. Huttenlocher, Gregory A. Klanderman,in the form of a pseudo-code. and William J. Rucklidge. Comparing Images Using the Hausdorff Distance. IEEE Trans. Pattern Analysis8.1 Input Source Image, SI and Machine Intelligence, September 1993.8.2 Input Target Directory, TD-- Preprocessing Phase [2] J. Canny. A Computational Approach To Edge8.3 For each image in the TD: Detection. IEEE Trans. Pattern Analysis and Machine 8.3.1 Compute & store the hash value (HV) Intelligence, November1986. 8.3.2 Compute and Store Color Details (CD) 8.3.3 Apply the Canny (Sobel based) filter [3] I. Sobel, G. Feldman. ‘A 3x3 Isotropic Gradient 8.3.4 Compute the location of non-zero Operator for Image Processing’. Presented at a talk pixels and store in a matrix at the Stanford Artificial Project in 1968; Pattern-- Preparation Phase Classification and Scene Analysis, 1973.8.4 Compute HV for SI8.5 Compute & Store Color Details of SI [4] H. Alt, B. Behrends and J. Blomer. Measuring the8.6 Apply Canny filter to SI resemblance of Polygon Shapes. Proc. Seventh ACM8.7 Compute & store location of non-zero pixels Symposium on Computational Geometry, 1991.-- Comparison Phase8.8 For each image in the TD: [5] Herbert Schildt. C# 2.0: The Complete Reference, 8.8.1 Compare HV of SI with stored HVs of Second Edition. Tata McGraw-Hill, 2006. the image 8.8.2 Compare CD of SI with stored CDs of [6] MSDN Library. the image us/library/default.aspx 8.8.3 Compute Hausdorff metric b/w SI and the image using the stored location of non- zero pixels 8.8.4 Assign rank to image based on HV comparison, computed Hausdorff metric and the Color Details-- Sorting Phase8.9 Sort images of TD based on rank8.10 Display images in sort-order -6-
  7. 7. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---FIGURES[7.1.1][7.1.2] -7-
  8. 8. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.2.1][7.2.2] -8-
  9. 9. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.3.1][7.3.2][7.3.3] -9-
  10. 10. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.3.4][7.3.5] - 10 -
  11. 11. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.4.1][7.4.2][7.4.3] - 11 -
  12. 12. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.5.1] - 12 -
  13. 13. --- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---[7.6.1][7.6.2][7.6.3] - 13 -