5. Fast Access & Linear
Search
• Efficient coding of serialization
– transformation to byte[]
– run length coding for sparse vectors
• Custom Lucene codec
– Lucene field compression
– update to DocValues in v1.0
6. Search with sub Linear
T ime Complexity
• Hashing based approach for global features
– Locality sensitive hashing
• bit sampling
– Proximity based hashing
• nearest neighbors as “buckets”,
• cp. work of G. Amato
• Local features supported
– SIFT, SURF, k-means, VLAD
7. Tools
• Parallel Indexing
– consumer-producer based
– up to the capabilities of the VM / HDD
• Intermediate byte based data format
– small footprint, efficient, relative paths
8. Extending LIRE
• Implement a global feature
– extraction, distance function, serialization
• Lire takes care of the rest
– Parallel indexing, hashing, search
9. Using Parts of LIRE
Take what you need …
• Feature implementations
– cp. work of Xinchao Li et al. at Mediaeval 2013
• Image processing
– Canny Edge Detector, SWT (coming soon),
• Tools & code base
– FastMap, Suffix Tree Clustering, …
10. UCID Data Set
MAP
precision 10
ER
CEDD
0,431
0,420
0,553
CEDD
Color Correlogram
0,586
0,480
0,370
Color Correlogram
Color Layout
0,277
0,285
0,679
Color Layout
Edge Histogram
0,180
0,202
0,813
Edge Histogram
FCTH
0,447
0,415
0,531
FCTH
JCD
0,470
0,435
0,508
JCD
Joint Histogram
0,348
0,313
0,603
Joint Histogram
LBP Opponent Joined
0,266
0,267
0,729
LBP Opponent Joined
Local Binary Patterns (LBP)
0,228
0,221
0,714
Local Binary Patterns (LBP)
Opponent Histogram
0,319
0,309
0,649
Opponent Histogram
PHOG
0,232
0,235
0,725
PHOG
RGB Color Histogram
0,403
0,358
0,550
RGB Color Histogram
Rotation Invariant LBP
0,165
0,174
0,813
Rotation Invariant LBP
Scalable Color
0,172
0,183
0,840
Scalable Color
SPCEDD
0,575
0,487
0,366
SPCEDD
SPLBP
0,264
0,251
0,683
SPLBP
Surf BoVW
0,348
0,313
0,634
Surf BoVW
VLAD-SURF
0,370
0,356
0,603
VLAD-SURF
11. SIMPLICity Data Set
MAP
precision 10
ER
CEDD
0,513
0,706
0,193
Color Correlogram
0,498
0,740
0,159
Color Layout
0,439
0,612
0,303
Edge Histogram
0,333
0,500
0,401
FCTH
0,499
0,703
0,207
JCD
0,520
0,730
0,183
JCD
Joint Histogram
0,449
0,689
0,197
Joint Histogram
LBP Opponent Joined
0,418
0,569
0,347
LBP Opponent Joined
Local Binary Patterns (LBP)
0,358
0,587
0,295
Local Binary Patterns (LBP)
OpponentHistogram
0,450
0,635
0,270
OpponentHistogram
PHOG
0,365
0,547
0,355
PHOG
RGB Color Histogram
0,450
0,704
0,191
RGB Color Histogram
Rotation Invariant LBP
0,338
0,520
0,375
Rotation Invariant LBP
Scalable Color
0,305
0,470
0,464
Scalable Color
SPCEDD
0,599
0,772
0,144
SPCEDD
SPLBP
0,395
0,556
0,348
SPLBP
SURF BoVW
0,338
0,464
0,475
SURF BoVW
VLAD-SURF
0,365
0,518
0,407
VLAD-SURF
CEDD
Color Correlogram
Color Layout
Edge Histogram
FCTH
14. Apache Solr Integration
• Motivation:
– Use a search and retrieval server with all its tools
• Objectives:
– indexing & management
– efficient content based image search
– content based ranking of results
15. Solr Plugin
• Custom Request Handler
– Uses Solr’s request and response framework
– Allows for content based retrieval
• Custom ValueSourceFunction
– Added to text based search queries
– Allows for ranking based on the distance function
16. Solr Plugin
• Custom type of index field
– DocValue based binary field
– transmission base64 encoded
• Custom Indexer
– XML documents to be uploaded to Solr
18. Future Work
• DocValues based indexing
– make linear search faster
• Proximity hashing
– metric spaces approach
– more accurate
• Release version 1.0
– adding docs & features freeze
19. Acknowledgements
I’d like to thank Anna-Maria Pasterk, Arthur Li, Arthur Pitman,
Bastian Hösch, Benjamin Sznajder, Christian Penz, Christine
Keim, Christoph Kofler, Dan Hanley, Daniel Pötzinger, Fabrizio
Falchi, Franz Graf, Giuseppe Amato, Glenn Macstravic, James
Charters, Janine Lachner, Katharina Tomanec, Lukas Esterle,
Manuel Oraze, Marian Kogler, Marko Keuschnig, Michael Riegler,
Rodrigo Carvalho Rezende, Roman Divotkey, Roman Kern,
Savvas Chatzichristofis and Sandeep Gupta.