SlideShare a Scribd company logo
1 of 35
Searching Images by Color 
Chris Becker 
Search Engineering @ Shutterstock
What is Shutterstock? 
• Shutterstock sells stock images, videos & music. 
• Crowdsourced from artists around the world 
• Shutterstock reviews and indexes them for search 
• Customers buy a subscription and download them
Why search by color?
Stock photography on the internet… 
images from www.shutterstock.com
Stock photography on the internet… 
images from www.shutterstock.com
Color is one of many visual 
attributes that you can use 
to create an engaging 
image search experience
Shutterstock Labs 
Spectrum 
Palette
Diving into Color Data
Color Spaces 
• RGB 
• HSL 
• Lab 
• LCH 
images from www.wikipedia.org
Calculating Distances Between Colors 
• Euclidean distance works reasonably well in any color space 
distRGB = sqrt((r 
-r 
1 
)^2 + (g 
2 
-g 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
distHSL = sqrt((h 
-h 
1 
)^2 + (s 
2 
-s 
1 
)^2 + (l 
2 
-l 
1 
)^2) 
2 
distLCH = sqrt((L 
-L 
1 
)^2 + (C 
2 
-C 
1 
)^2 + (H 
2 
-H 
1 
)^2) 
2 
distLAB = sqrt((L 
-L 
1 
)^2 + (a 
2 
-a 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
• More sophisticated equations that better account for human 
perception can be found at 
http://en.wikipedia.org/wiki/Color_difference
Images are just numbers 
[ 
[[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], 
[[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], 
[[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], 
[[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], 
[[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], 
[[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], 
[[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], 
[[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], 
[[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], 
[[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], 
[[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], 
[[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], 
[[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], 
[[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], 
[[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], 
[[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], 
[[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], 
[[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], 
[[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], 
]
Any operation you can do on a set of 
numbers, you can do on an image 
• getting histograms 
• computing median values 
• standard deviations / variance 
• other statistics
Extracting Color Data
Tools & Libraries 
• ImageMagick 
• Python Image Library 
• ImageJ
# python example to get a histogram from an image 
import PIL 
from PIL import Image 
from pprint import pprint 
image = Image.open('./samplephoto.jpg') 
width, height = image.size 
colors = image.getcolors(width*height) 
hist = {} 
for i, c in enumerate(colors): 
hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) 
hist[hex] = c[0] 
pprint(hist)
Indexing & Searching 
in Solr
Indexing color histograms 
• index colors just like you would index text 
• amount of color = frequency of the term 
color_txt = "cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
95bf40 95bf40 95bf40 
95bf40 95bf40 95bf40 
2e6b2e 2e6b2e 2e6b2e 
ff0000 …"
Solr Schema & Queries 
<field name="color" type="text_ws" …> 
• Can use solr’s default ranking effectively 
/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… 
• or use term frequencies directly for specific sort functions: 
sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
Indexing color statistics 
Represent aggregate statistics of each image 
lightness: 
median: 2 
standard dev: 1 
largest bin: 0 
largest bin size: 50 
saturation 
median: 0 
standard dev: 0 
largest bin: 0 
largest bin size: 100 
…
Solr Fields & Queries 
<field name=”hue_median” type=”int” …> 
• Sort by the distance between input param 
and median value for each image 
/solr/select?q=*&sort=abs(sub($query,hue_median)) asc
Ranking & Relevance
How much of the image has the color ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
which image is more relevant if I search for ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
How do we account for these factors?
How much of the image contains the 
selected color? 
• Score each color by the number of pixels 
sort=tf(color,"cfebc2") desc
Balance Precision and Recall 
• Reduce your colorspace enough 
to balance: 
• color accuracy 
• index size 
• query complexity 
• result counts 
• only need 100-200 colors for a good UX 
✓
Weighing Multiple Colors Together 
• If you search for 2 or more colors, the top result should have 
the most even distribution of those colors 
✓ 
• simple option: 
sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc 
• more complex: compute the standard deviation or variance 
of the term frequencies of matching color values for each 
image, and sort the results with the lowest variance first.
Weighing Similar & Different Colors 
• The score for one color should reflect all the colors in the image. 
• At indexing time, increase the score based on similar colors; 
decrease it based on differing colors.
Conclusion
Conclusion 
• Steps for building color search in Solr: 
• Extract colors using a tool like the Python Image Library 
• Score colors based on the number of pixels 
• Adjust scores based on similar / different colors 
• Index colors into Solr as text document 
• In your query, sort by the term frequency values for each 
color
One more demo…

More Related Content

Similar to Searching Images by Color Using Solr

Similar to Searching Images by Color Using Solr (20)

Style Guide
Style GuideStyle Guide
Style Guide
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...
 
Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...
 
Ch2
Ch2Ch2
Ch2
 
Helvetia
HelvetiaHelvetia
Helvetia
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx
 
What Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionWhat Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel Discussion
 
Building Composable Abstractions
Building Composable AbstractionsBuilding Composable Abstractions
Building Composable Abstractions
 
Introduction to Coding
Introduction to CodingIntroduction to Coding
Introduction to Coding
 
Multimedia
MultimediaMultimedia
Multimedia
 
Lecture 02 visualization and programming
Lecture 02   visualization and programmingLecture 02   visualization and programming
Lecture 02 visualization and programming
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
 

Recently uploaded

一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 

Recently uploaded (20)

Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 

Searching Images by Color Using Solr

  • 1.
  • 2. Searching Images by Color Chris Becker Search Engineering @ Shutterstock
  • 3. What is Shutterstock? • Shutterstock sells stock images, videos & music. • Crowdsourced from artists around the world • Shutterstock reviews and indexes them for search • Customers buy a subscription and download them
  • 4. Why search by color?
  • 5. Stock photography on the internet… images from www.shutterstock.com
  • 6. Stock photography on the internet… images from www.shutterstock.com
  • 7. Color is one of many visual attributes that you can use to create an engaging image search experience
  • 10. Color Spaces • RGB • HSL • Lab • LCH images from www.wikipedia.org
  • 11. Calculating Distances Between Colors • Euclidean distance works reasonably well in any color space distRGB = sqrt((r -r 1 )^2 + (g 2 -g 1 )^2 + (b 2 -b 1 )^2) 2 distHSL = sqrt((h -h 1 )^2 + (s 2 -s 1 )^2 + (l 2 -l 1 )^2) 2 distLCH = sqrt((L -L 1 )^2 + (C 2 -C 1 )^2 + (H 2 -H 1 )^2) 2 distLAB = sqrt((L -L 1 )^2 + (a 2 -a 1 )^2 + (b 2 -b 1 )^2) 2 • More sophisticated equations that better account for human perception can be found at http://en.wikipedia.org/wiki/Color_difference
  • 12. Images are just numbers [ [[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], [[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], [[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], [[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], [[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], [[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], [[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], [[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], [[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], [[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], [[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], [[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], [[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], [[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], [[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], [[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], [[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], [[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], [[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], ]
  • 13. Any operation you can do on a set of numbers, you can do on an image • getting histograms • computing median values • standard deviations / variance • other statistics
  • 14.
  • 16. Tools & Libraries • ImageMagick • Python Image Library • ImageJ
  • 17. # python example to get a histogram from an image import PIL from PIL import Image from pprint import pprint image = Image.open('./samplephoto.jpg') width, height = image.size colors = image.getcolors(width*height) hist = {} for i, c in enumerate(colors): hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) hist[hex] = c[0] pprint(hist)
  • 19. Indexing color histograms • index colors just like you would index text • amount of color = frequency of the term color_txt = "cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 95bf40 95bf40 95bf40 95bf40 95bf40 95bf40 2e6b2e 2e6b2e 2e6b2e ff0000 …"
  • 20. Solr Schema & Queries <field name="color" type="text_ws" …> • Can use solr’s default ranking effectively /solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… • or use term frequencies directly for specific sort functions: sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
  • 21. Indexing color statistics Represent aggregate statistics of each image lightness: median: 2 standard dev: 1 largest bin: 0 largest bin size: 50 saturation median: 0 standard dev: 0 largest bin: 0 largest bin size: 100 …
  • 22. Solr Fields & Queries <field name=”hue_median” type=”int” …> • Sort by the distance between input param and median value for each image /solr/select?q=*&sort=abs(sub($query,hue_median)) asc
  • 24. How much of the image has the color ? image from www.shutterstock.com
  • 25. is this relevant if I search for ? image from www.shutterstock.com
  • 26. which image is more relevant if I search for ? image from www.shutterstock.com
  • 27. is this relevant if I search for ? image from www.shutterstock.com
  • 28. How do we account for these factors?
  • 29. How much of the image contains the selected color? • Score each color by the number of pixels sort=tf(color,"cfebc2") desc
  • 30. Balance Precision and Recall • Reduce your colorspace enough to balance: • color accuracy • index size • query complexity • result counts • only need 100-200 colors for a good UX ✓
  • 31. Weighing Multiple Colors Together • If you search for 2 or more colors, the top result should have the most even distribution of those colors ✓ • simple option: sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc • more complex: compute the standard deviation or variance of the term frequencies of matching color values for each image, and sort the results with the lowest variance first.
  • 32. Weighing Similar & Different Colors • The score for one color should reflect all the colors in the image. • At indexing time, increase the score based on similar colors; decrease it based on differing colors.
  • 34. Conclusion • Steps for building color search in Solr: • Extract colors using a tool like the Python Image Library • Score colors based on the number of pixels • Adjust scores based on similar / different colors • Index colors into Solr as text document • In your query, sort by the term frequency values for each color