SlideShare a Scribd company logo
Searching Images by Color 
Chris Becker 
Search Engineering @ Shutterstock
What is Shutterstock? 
• Shutterstock sells stock images, videos & music. 
• Crowdsourced from artists around the world 
• Shutterstock reviews and indexes them for search 
• Customers buy a subscription and download them
Why search by color?
Stock photography on the internet… 
images from www.shutterstock.com
Stock photography on the internet… 
images from www.shutterstock.com
Color is one of many visual 
attributes that you can use 
to create an engaging 
image search experience
Shutterstock Labs 
Spectrum 
Palette
Diving into Color Data
Color Spaces 
• RGB 
• HSL 
• Lab 
• LCH 
images from www.wikipedia.org
Calculating Distances Between Colors 
• Euclidean distance works reasonably well in any color space 
distRGB = sqrt((r 
-r 
1 
)^2 + (g 
2 
-g 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
distHSL = sqrt((h 
-h 
1 
)^2 + (s 
2 
-s 
1 
)^2 + (l 
2 
-l 
1 
)^2) 
2 
distLCH = sqrt((L 
-L 
1 
)^2 + (C 
2 
-C 
1 
)^2 + (H 
2 
-H 
1 
)^2) 
2 
distLAB = sqrt((L 
-L 
1 
)^2 + (a 
2 
-a 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
• More sophisticated equations that better account for human 
perception can be found at 
http://en.wikipedia.org/wiki/Color_difference
Images are just numbers 
[ 
[[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], 
[[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], 
[[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], 
[[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], 
[[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], 
[[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], 
[[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], 
[[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], 
[[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], 
[[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], 
[[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], 
[[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], 
[[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], 
[[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], 
[[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], 
[[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], 
[[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], 
[[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], 
[[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], 
]
Any operation you can do on a set of 
numbers, you can do on an image 
• getting histograms 
• computing median values 
• standard deviations / variance 
• other statistics
Extracting Color Data
Tools & Libraries 
• ImageMagick 
• Python Image Library 
• ImageJ
# python example to get a histogram from an image 
import PIL 
from PIL import Image 
from pprint import pprint 
image = Image.open('./samplephoto.jpg') 
width, height = image.size 
colors = image.getcolors(width*height) 
hist = {} 
for i, c in enumerate(colors): 
hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) 
hist[hex] = c[0] 
pprint(hist)
Indexing & Searching 
in Solr
Indexing color histograms 
• index colors just like you would index text 
• amount of color = frequency of the term 
color_txt = "cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
95bf40 95bf40 95bf40 
95bf40 95bf40 95bf40 
2e6b2e 2e6b2e 2e6b2e 
ff0000 …"
Solr Schema & Queries 
<field name="color" type="text_ws" …> 
• Can use solr’s default ranking effectively 
/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… 
• or use term frequencies directly for specific sort functions: 
sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
Indexing color statistics 
Represent aggregate statistics of each image 
lightness: 
median: 2 
standard dev: 1 
largest bin: 0 
largest bin size: 50 
saturation 
median: 0 
standard dev: 0 
largest bin: 0 
largest bin size: 100 
…
Solr Fields & Queries 
<field name=”hue_median” type=”int” …> 
• Sort by the distance between input param 
and median value for each image 
/solr/select?q=*&sort=abs(sub($query,hue_median)) asc
Ranking & Relevance
How much of the image has the color ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
which image is more relevant if I search for ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
How do we account for these factors?
How much of the image contains the 
selected color? 
• Score each color by the number of pixels 
sort=tf(color,"cfebc2") desc
Balance Precision and Recall 
• Reduce your colorspace enough 
to balance: 
• color accuracy 
• index size 
• query complexity 
• result counts 
• only need 100-200 colors for a good UX 
✓
Weighing Multiple Colors Together 
• If you search for 2 or more colors, the top result should have 
the most even distribution of those colors 
✓ 
• simple option: 
sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc 
• more complex: compute the standard deviation or variance 
of the term frequencies of matching color values for each 
image, and sort the results with the lowest variance first.
Weighing Similar & Different Colors 
• The score for one color should reflect all the colors in the image. 
• At indexing time, increase the score based on similar colors; 
decrease it based on differing colors.
Conclusion
Conclusion 
• Steps for building color search in Solr: 
• Extract colors using a tool like the Python Image Library 
• Score colors based on the number of pixels 
• Adjust scores based on similar / different colors 
• Index colors into Solr as text document 
• In your query, sort by the term frequency values for each 
color
One more demo…

More Related Content

Similar to Searching Images by Color Using Solr

Style Guide
Style GuideStyle Guide
Style Guide
JP Stones
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...
WiLS
 
Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...
Shlomo Pongratz
 
Ch2
Ch2Ch2
Ch2
teba
 
Helvetia
HelvetiaHelvetia
Helvetia
ESUG
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
Jay Nagar
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
shelfrog
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
W M Harris
 
5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx
SidoriOne
 
What Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionWhat Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel Discussion
Cindy Foster-Warthen
 
Building Composable Abstractions
Building Composable AbstractionsBuilding Composable Abstractions
Building Composable Abstractions
Eric Normand
 
Introduction to Coding
Introduction to CodingIntroduction to Coding
Introduction to Coding
Fabio506452
 
Multimedia
MultimediaMultimedia
Multimedia
MR Z
 
Lecture 02 visualization and programming
Lecture 02   visualization and programmingLecture 02   visualization and programming
Lecture 02 visualization and programming
Smee Kaem Chann
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
SKILL2021
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
AkashVerma916093
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
nishashreyan1
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Savvas Chatzichristofis
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
Dr. Naushad Varish
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
MalleshBettadapura1
 

Similar to Searching Images by Color Using Solr (20)

Style Guide
Style GuideStyle Guide
Style Guide
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...
 
Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...
 
Ch2
Ch2Ch2
Ch2
 
Helvetia
HelvetiaHelvetia
Helvetia
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx
 
What Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionWhat Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel Discussion
 
Building Composable Abstractions
Building Composable AbstractionsBuilding Composable Abstractions
Building Composable Abstractions
 
Introduction to Coding
Introduction to CodingIntroduction to Coding
Introduction to Coding
 
Multimedia
MultimediaMultimedia
Multimedia
 
Lecture 02 visualization and programming
Lecture 02   visualization and programmingLecture 02   visualization and programming
Lecture 02 visualization and programming
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
 

Recently uploaded

guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
nhiyenphan2005
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
harveenkaur52
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 

Recently uploaded (20)

guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
Bài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docxBài tập unit 1 English in the world.docx
Bài tập unit 1 English in the world.docx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 

Searching Images by Color Using Solr

  • 1.
  • 2. Searching Images by Color Chris Becker Search Engineering @ Shutterstock
  • 3. What is Shutterstock? • Shutterstock sells stock images, videos & music. • Crowdsourced from artists around the world • Shutterstock reviews and indexes them for search • Customers buy a subscription and download them
  • 4. Why search by color?
  • 5. Stock photography on the internet… images from www.shutterstock.com
  • 6. Stock photography on the internet… images from www.shutterstock.com
  • 7. Color is one of many visual attributes that you can use to create an engaging image search experience
  • 10. Color Spaces • RGB • HSL • Lab • LCH images from www.wikipedia.org
  • 11. Calculating Distances Between Colors • Euclidean distance works reasonably well in any color space distRGB = sqrt((r -r 1 )^2 + (g 2 -g 1 )^2 + (b 2 -b 1 )^2) 2 distHSL = sqrt((h -h 1 )^2 + (s 2 -s 1 )^2 + (l 2 -l 1 )^2) 2 distLCH = sqrt((L -L 1 )^2 + (C 2 -C 1 )^2 + (H 2 -H 1 )^2) 2 distLAB = sqrt((L -L 1 )^2 + (a 2 -a 1 )^2 + (b 2 -b 1 )^2) 2 • More sophisticated equations that better account for human perception can be found at http://en.wikipedia.org/wiki/Color_difference
  • 12. Images are just numbers [ [[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], [[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], [[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], [[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], [[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], [[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], [[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], [[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], [[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], [[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], [[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], [[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], [[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], [[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], [[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], [[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], [[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], [[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], [[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], ]
  • 13. Any operation you can do on a set of numbers, you can do on an image • getting histograms • computing median values • standard deviations / variance • other statistics
  • 14.
  • 16. Tools & Libraries • ImageMagick • Python Image Library • ImageJ
  • 17. # python example to get a histogram from an image import PIL from PIL import Image from pprint import pprint image = Image.open('./samplephoto.jpg') width, height = image.size colors = image.getcolors(width*height) hist = {} for i, c in enumerate(colors): hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) hist[hex] = c[0] pprint(hist)
  • 19. Indexing color histograms • index colors just like you would index text • amount of color = frequency of the term color_txt = "cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 95bf40 95bf40 95bf40 95bf40 95bf40 95bf40 2e6b2e 2e6b2e 2e6b2e ff0000 …"
  • 20. Solr Schema & Queries <field name="color" type="text_ws" …> • Can use solr’s default ranking effectively /solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… • or use term frequencies directly for specific sort functions: sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
  • 21. Indexing color statistics Represent aggregate statistics of each image lightness: median: 2 standard dev: 1 largest bin: 0 largest bin size: 50 saturation median: 0 standard dev: 0 largest bin: 0 largest bin size: 100 …
  • 22. Solr Fields & Queries <field name=”hue_median” type=”int” …> • Sort by the distance between input param and median value for each image /solr/select?q=*&sort=abs(sub($query,hue_median)) asc
  • 24. How much of the image has the color ? image from www.shutterstock.com
  • 25. is this relevant if I search for ? image from www.shutterstock.com
  • 26. which image is more relevant if I search for ? image from www.shutterstock.com
  • 27. is this relevant if I search for ? image from www.shutterstock.com
  • 28. How do we account for these factors?
  • 29. How much of the image contains the selected color? • Score each color by the number of pixels sort=tf(color,"cfebc2") desc
  • 30. Balance Precision and Recall • Reduce your colorspace enough to balance: • color accuracy • index size • query complexity • result counts • only need 100-200 colors for a good UX ✓
  • 31. Weighing Multiple Colors Together • If you search for 2 or more colors, the top result should have the most even distribution of those colors ✓ • simple option: sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc • more complex: compute the standard deviation or variance of the term frequencies of matching color values for each image, and sort the results with the lowest variance first.
  • 32. Weighing Similar & Different Colors • The score for one color should reflect all the colors in the image. • At indexing time, increase the score based on similar colors; decrease it based on differing colors.
  • 34. Conclusion • Steps for building color search in Solr: • Extract colors using a tool like the Python Image Library • Score colors based on the number of pixels • Adjust scores based on similar / different colors • Index colors into Solr as text document • In your query, sort by the term frequency values for each color