Successfully reported this slideshow.
Upcoming SlideShare
×

Searching Images by Color: Presented by Chris Becker, Shutterstock

5,551 views

Published on

Presented at Lucene/Solr Revolution 2014

Published in: Software
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Searching Images by Color: Presented by Chris Becker, Shutterstock

1. 1. Searching Images by Color Chris Becker Search Engineering @ Shutterstock
2. 2. What is Shutterstock?! • Shutterstock sells stock images, videos & music.! • Crowdsourced from artists around the world! • Shutterstock reviews and indexes them for search! • Customers by a subscription and download them!
3. 3. Why search by color?!
4. 4. Stock photography on the internet…!
5. 5. Stock photography on the internet…!
6. 6. Color is one of several visual attributes that you can use ! to create an engaging ! image search experience!
7. 7. Shutterstock Labs! www.shutterstock.com/labs! ! Spectrum! Palette!
8. 8. Diving into Color Data!
9. 9. Color Spaces! • RGB! ! • HSL! ! • LCH! ! • Lab!
10. 10. Calculating Distances Between Colors! • Euclidean distance works reasonably well in any color space! ! distRGB = sqrt((r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2)! distHSL = sqrt((h1-h2)^2 + (s1-s2)^2 + (l1-l2)^2)! distLCH = sqrt((L1-L2)^2 + (C1-C2)^2 + (H1-H2)^2)! ! • More sophisticated equations that better account for human perception can be found at! http://en.wikipedia.org/wiki/Color_difference! !
11. 11. Images are just numbers! [ [[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], [[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], [[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], [[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], [[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], [[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], [[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], [[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], [[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], [[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], [[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], [[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], [[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], [[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], [[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], [[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], [[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], [[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], [[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], ]
12. 12. Any operation you can do on a set of numbers, you can do on an image! • getting histograms! • computing median values! • standard deviations / variance! • other statistics !
13. 13. Extracting Color Data!
14. 14. Tools & Libraries! • ImageMagick! • Python Image Library! • ImageJ!
15. 15. Code Example! #! /usr/bin/env perl! use Image::Magick;! ! my \$image = Image::Magick->new;! \$image->Read(‘SamplePhoto.jpg’);! \$image->Quantize(colorspace => 'RGB', colors => 64);! my @histogram = \$image->Histogram();! my %colors;! ! while ( my(\$R,\$G,\$B,\$opacity,\$count) = splice(@histogram,0,5)) {! ! # convert r,g,b to a hex color value! my \$hex = sprintf("%02x%02x%02x",! \$R / 256,! \$G / 256,! \$B / 256! );! ! \$colors{\$hex} += \$count; ! }!
16. 16. Indexing & Searching in Solr!
17. 17. Indexing color histograms! • index colors just like you would index text! • volume of color == frequency of the term! color_txt = "cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 2e6b2e 2e6b2e 2e6b2e ff0000 …"
18. 18. Solr Fields & Queries! <field name="color" type="text_ws" …>! • Easy to query! • Can use solr’s default ranking effectively! ! /solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax…! ! • or access term frequencies directly to create specific sort functions:! ! sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc!
19. 19. Indexing color statistics! Represent aggregate statistics of each image! lightness: median: 2 standard dev: 1 largest bin: 0 largest bin size: 50 saturation median: 0 standard dev: 0 largest bin: 0 largest bin size: 100 …
20. 20. Solr Fields & Queries! <field name=”hue_median” type=”int” …>! • Sort by the distance between input param and median value! ! /solr/select?q=*&sort=abs(sub(\$query,hue_median)) asc!
21. 21. Ranking & Relevance!
22. 22. How much of the image has the color ? !
23. 23. is this relevant if I search for ?!
24. 24. which image is more relevant if I search for ?!
25. 25. is this relevant if I search for ?!
26. 26. How do we account for these factors?!
27. 27. How much of the image contains the selected color?! • Score each color by number/percentage of pixels! ! sort=tf(color,"ff9900") desc!
28. 28. Color Accuracy! • As you reduce your color space, you also reduce precision! • reducing the colorspace too much increases recall and lowers precision. ! • Not reducing it enough lowers recall and higher precision.! • reducing your color space down to ~100 to ~300 colors works well!
29. 29. Weighing Multiple Colors Equally! • If you search for 2 or more colors, the top result should have the most even distribution of those colors! • simple option:! ! sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc! ! • more complex: compute the stdev or variance of the matching color values in your solr sort function, and sort the results with the lowest variance first. ! !
30. 30. Accounting for Similar & Different Colors! • The score for a particular color should reflect all the colors in the image.! • At indexing time, increase the score based on similar colors; decrease it based on differing colors.!
31. 31. Conclusion!
32. 32. Conclusion! • This talk provided a rough guide to building a basic search-by-color application! • Lots of opportunity to do more sophisticated things in image search. ! • matching colors in certain parts of an image! • identifying visual styles (blur vs sharp, high contrast, etc)! • patterns & textures! • analyzing content in images (object detection)! ! !
33. 33. One more demo…!