Context Adaptation in Image Search

                       arjen@acm.org
Context Adaptation
GOAL:

Present different photos to a sports
journalist who queries for Beckham, than
the glossy magazin...
IPTC Categories
• ACE (arts, culture,           • LIF (lifestyle & leisure)
   entertainment)               • POL (politic...
What Context?
• Collection context
  – One “main” IPTC category per image
    • 96,351 out of 97,760 images in 100k Belga
...
Filter on IPTC?
 //image[@IPTC eq SPO][about(.,Beckham)]
• Bad for recall:
  – Not all images have been assigned IPTC
    ...
Retrieval Model
• Re-rank results based on cluster
  membership
   λρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d)
       P(Q|D) ...
Retrieval Model
• Cluster formation:
  – IPTC-image categories; forms disjoint clusters
  – IPTC-user categories of users ...
Results on Click Prediction
                 image    image     image     image     user      user     user      User
NDCG...
No Adaptation
    “Greece”
SPO Adaptation
“Greece, collection-based clusters, λ=0.1”
SPO Adaptation
“Greece, collection-based clusters, λ=0.0”
SPO Adaptation
“Greece, user-based clusters, λ=0.1”
SPO Adaptation
“Greece, user-based clusters, λ=0.0”
SPO Observations
• Re-ranking pushes the sports-related
  images to the top
  – No more images about the fires
  – When λ=...
POL Adaptation
“Greece, collection-based clusters, λ=0.1”
POL Adaptation
“Greece, collection-based clusters, λ=0.0”
POL Adaptation
“Greece, user-based clusters, λ=0.1”
POL Adaptation
“Greece, user-based clusters, λ=0.0”
POL Observations
• Re-ranking for a politics context shows a
  difference in interpretation between the
  archivist and th...
ACE Adaptation
“Greece, collection-based clusters, λ=0.1”
ACE Adaptation
“Greece, collection-based clusters, λ=0.0”
ACE Observations
• Re-ranking for arts, culture and
  entertainment requires λ=0.0, to ignore
  the initial ranking and le...
No Adaptation
   “Beckham”
SPO Adaptation
“Beckham, collection-based clusters, λ=0.1”
SPO Adaptation
“Beckham, collection-based clusters, λ=0.0”
HUM Adaptation
“Beckham, collection-based clusters, λ=0.1”
Conclusions this far
• Adaptation also retrieves images not
  assigned IPTC category, by considering
  clusters formed by ...
Potential for Personalization
• Which queries have the potential to
  benefit by context adaptation
  (personalisation)?
•...
P4P in Belga 100K
P4P in Belga 100K
                 nDCG high: low potential


        Dean (0.8067)

        King albert ii (0.7810)


   ...
No Adaptation
  “King Albert II”
EBF Adaptation
   “King Albert II”
POL Adaptation
   “King Albert II”
No Adaptation
    “Dean”
ACE Adaptation
“Dean, user-based clusters”
ACE Adaptation
“Dean, collection-based clusters”
Dean: Temporal Effect
• Log files: “Dean” = “Hurricane Dean”
• Still, query is quite ambiguous:
  –   James Dean
  –   Agy...
Future Work
• Address various normalization issues
  – In context adaptation (due to NLLR
    approximation)
  – In “poten...
See also
“CWI” Vitalas demonstrations:
 http://www.ins.cwi.nl/projects/M4/vitalas/

Collection context instead of user con...
Upcoming SlideShare
Loading in …5
×

Context Adaptation in Image Search

892 views

Published on

Presentation about context-adaptation in image search, given at the 4th Twente/Siks workshop (held for the occasion of Robin Aly's PhD defense).

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
892
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Context Adaptation in Image Search

  1. 1. Context Adaptation in Image Search arjen@acm.org
  2. 2. Context Adaptation GOAL: Present different photos to a sports journalist who queries for Beckham, than the glossy magazine editor issuing the same query
  3. 3. IPTC Categories • ACE (arts, culture, • LIF (lifestyle & leisure) entertainment) • POL (politics) • CLJ (crime, law & justice) • REL (religion) • DIS (disasters & accidents) • SCI (science & technology) • EBF (economy, business & • SOI (social issues) finance) • SPO (sports) • EDU (education) • WAR (unrest, conflicts, • ENV (environment) war) • HTH (health) • WEA (weather) • HUM (human interest) • LAB (labour, work)
  4. 4. What Context? • Collection context – One “main” IPTC category per image • 96,351 out of 97,760 images in 100k Belga Collection • Note: noisy data, in spite of it being edited content! E.g., we found lifestyle Beckham images annotated as SPO, and even typos in IPTC category assignment! • User context – Classified 813 users into IPTC categories to represent their main interest (based on Belga input about the user’s organizations)
  5. 5. Filter on IPTC? //image[@IPTC eq SPO][about(.,Beckham)] • Bad for recall: – Not all images have been assigned IPTC categories • Bad for precision: – Noisy assignment of IPTC categories to images • At least 4 of the top 10 SPO Beckham results do not show Beckham taking part in sporting activities
  6. 6. Retrieval Model • Re-rank results based on cluster membership λρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d) P(Q|D) P(D|c) P(Q|c) – Modify scores based on document’s context Oren Kurland and Lillian Lee. ACM Transactions on Information Systems (TOIS), 27(3), 2009. • Novelty in Vitalas: – Modify scores based on user’s context • Cluster formation based on user clicks • Cluster selection based on user context
  7. 7. Retrieval Model • Cluster formation: – IPTC-image categories; forms disjoint clusters – IPTC-user categories of users who clicked the image; gives overlapping clusters • Cluster selection: – {d∈c}: cluster contains document – {u∈c}: cluster/@category corresponds to user's interests
  8. 8. Results on Click Prediction image image image image user user user User NDCG D 0.0 0.1 0.4 0.7 0.0 0.1 0.4 0.7 ACE 0.1724 0.1423 0.1741 0.1721 0.1721 0.2070 0.1978 0.1767 0.1747 EBF 0.5527 0.4744 0.5460 0.5497 0.5504 0.4882 0.5519 0.5509 0.5509 EDU 0.0145 0.0163 0.0145 0.0145 0.0145 0.0165 0.0167 0.0155 0.0146 HTH 0.1308 0.1347 0.1308 0.1308 0.1308 0.6342 0.3712 0.1934 0.1414 HUM 0.1849 0.1612 0.1798 0.1772 0.1849 0.2109 0.2043 0.1776 0.1760 LAB 0.1331 0.1543 0.1331 0.1331 0.1331 0.2164 0.2339 0.1817 0.1380 LIF 0.1245 0.0888 0.1234 0.1233 0.1232 0.1894 0.1555 0.1121 0.1253 POL 0.0723 0.0586 0.0704 0.0717 0.0721 0.1054 0.0990 0.0916 0.0769 SOI 0.2880 0.1806 0.2883 0.2880 0.2880 0.2964 0.2970 0.2968 0.3008 SPO 0.1811 0.1801 0.1809 0.1806 0.1807 0.2151 0.2005 0.1839 0.1820 Related literature on evaluation methodology: Carterette and Jones, NIPS 2007, and, Carterette, Allan, and Sitaraman, SIGIR 2006.
  9. 9. No Adaptation “Greece”
  10. 10. SPO Adaptation “Greece, collection-based clusters, λ=0.1”
  11. 11. SPO Adaptation “Greece, collection-based clusters, λ=0.0”
  12. 12. SPO Adaptation “Greece, user-based clusters, λ=0.1”
  13. 13. SPO Adaptation “Greece, user-based clusters, λ=0.0”
  14. 14. SPO Observations • Re-ranking pushes the sports-related images to the top – No more images about the fires – When λ=0.0 the initial retrieval score is not taken into account (initial text ranking ignored) • Minimal differences between collection- based and user-based cluster formation – Archivists consider as sports-related those images that users with sports-related interests click on
  15. 15. POL Adaptation “Greece, collection-based clusters, λ=0.1”
  16. 16. POL Adaptation “Greece, collection-based clusters, λ=0.0”
  17. 17. POL Adaptation “Greece, user-based clusters, λ=0.1”
  18. 18. POL Adaptation “Greece, user-based clusters, λ=0.0”
  19. 19. POL Observations • Re-ranking for a politics context shows a difference in interpretation between the archivist and the user group – Archivists focussed on the actual political rallies etc. – Users focussed on the forest fires
  20. 20. ACE Adaptation “Greece, collection-based clusters, λ=0.1”
  21. 21. ACE Adaptation “Greece, collection-based clusters, λ=0.0”
  22. 22. ACE Observations • Re-ranking for arts, culture and entertainment requires λ=0.0, to ignore the initial ranking and let the right images shine
  23. 23. No Adaptation “Beckham”
  24. 24. SPO Adaptation “Beckham, collection-based clusters, λ=0.1”
  25. 25. SPO Adaptation “Beckham, collection-based clusters, λ=0.0”
  26. 26. HUM Adaptation “Beckham, collection-based clusters, λ=0.1”
  27. 27. Conclusions this far • Adaptation also retrieves images not assigned IPTC category, by considering clusters formed by the images clicked by users with the same interests • Alternative cluster formation approaches can be investigated; e.g., using visual features • Method easily adapted for personalised and/or collaborative search
  28. 28. Potential for Personalization • Which queries have the potential to benefit by context adaptation (personalisation)? • The ones for which different users click on different results – Can be studied looking at nDCG of one user assuming another user’s clicks are ideal Jaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for Personalization. ACM Transactions on Computer-Human Interaction (ToCHI) special issue on Data Mining for Understanding User Needs, 17(1), March 2010. • Novel in Vitalas: compare IPTC-defined user groups (instead of individual users)
  29. 29. P4P in Belga 100K
  30. 30. P4P in Belga 100K nDCG high: low potential Dean (0.8067) King albert ii (0.7810) greece (0.3910) nDCG low: high potential
  31. 31. No Adaptation “King Albert II”
  32. 32. EBF Adaptation “King Albert II”
  33. 33. POL Adaptation “King Albert II”
  34. 34. No Adaptation “Dean”
  35. 35. ACE Adaptation “Dean, user-based clusters”
  36. 36. ACE Adaptation “Dean, collection-based clusters”
  37. 37. Dean: Temporal Effect • Log files: “Dean” = “Hurricane Dean” • Still, query is quite ambiguous: – James Dean – Agyness Dean (a model) – a (university) dean – Dean Dealannoi – Howard Dean – Dean Martin • Context adaptation for “Dean” requires archivist
  38. 38. Future Work • Address various normalization issues – In context adaptation (due to NLLR approximation) – In “potential for personalization”/adaptation • Explore temporal dimension – Combinations of collection and user context? • Explore cross-media cluster-based retrieval – Use visual features in cluster formation
  39. 39. See also “CWI” Vitalas demonstrations: http://www.ins.cwi.nl/projects/M4/vitalas/ Collection context instead of user context: http://www.ins.cwi.nl/projects/M4/vitalas/context_adap tation.html Detectors trained by query log http://olympus.ee.auth.gr/diou/civr2009/

×