More Related Content


Similar to CERTH/CEA LIST at MediaEval Placing Task 2015(20)

More from Symeon Papadopoulos(20)


CERTH/CEA LIST at MediaEval Placing Task 2015

  1. CERTH/CEA LIST at MediaEval Placing Task 2015 Giorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and Yiannis Kompatsiaris1 1 Information Technologies Institute (ITI), CERTH, Greece 2 CEA LIST, 91190 Gif-sur-Yvette, France MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany
  2. Summary #2 Tag-based location estimation (2 runs) • Based on a geographic Language Model • Built upon the scheme of our 2014 participation [2] (Kordopatis-Zilos et al., MediaEval 2014) • Extensions from [3]: improved feature selection and weighting (Kordopatis-Zilos et al., PAISI 2015) Visual-based location estimation (1 run) • Geospatial clustering scheme of the most visually similar images Hybrid location estimation (2 run) • Combination of the textual and visual approaches Training sets • Training set released by the organisers (≈4.7M geotagged items) • YFCC dataset, excl. images from users in test set (≈40M geotagged items)
  3. Tag-based location estimation #3 • Processing steps of the approach – Offline: language model construction – Online: location estimation
  4. Language Model (LM) • LM generation scheme – divide earth surface in rectangular cells with a side length of 0.01° – calculate tag-cell probabilities based on the users that used the tag inside the cell • LM-based estimation – the probability of each cell is calculated from the summation of the respective tag-cell probabilities – Most Likely Cell (MLC) considered the cell with the highest probability and used to produce the estimation Inspired from [4]: (Popescu, MediaEval 2013) #4
  5. Feature Selection and Weighting Feature Selection • The final tag set 𝑇 is the intersection of the two tag sets 𝑇 = 𝑇𝑎 ∩ 𝑇𝑙 Feature Weighting • Locality weight function, sort tags in 𝑇 based on their locality score 𝑤𝑙 = 𝑇 − (𝑗 − 1) |𝑇| • Normalize the weights from the Spatial Entropy (SE) function 𝑤𝑠𝑒 = 𝑁(𝑒(𝑡), 𝜇, 𝜎) max 𝑡∈𝑇 (𝑁(𝑒(𝑡), 𝜇, 𝜎)) • Combine the two weighting functions 𝑤 = 𝜔 ∗ 𝑤𝑠𝑒 + (1 − 𝜔) ∗ 𝑤𝑙 #5 accuracy locality
  6. Accuracy • Partition training set into p folds (p = 10) • Keep one partition at a time, and build LM with the rest p − 1 • Estimate the location of every item of the withheld partition • Accuracy score of every tag tgeo 𝑡 = 𝑁𝑟 𝑁𝑡 𝑁𝑟: correctly geotagged items 𝑁𝑡: total items tagged with 𝑡 • Tags with non-zero accuracy score form the tag set 𝑇𝑎 From [3]: Kordopatis-Zilos et al., PAISI 2015 #6 Estimated Locations
  7. Locality #7 • Captures the spatial awareness of tags • When a user uses a tag, he/she is assigned to the respective location cell • Each cell has a set of users assigned to it • All users assigned to the same cell are considered neighbours • Locality score of every tag loc 𝑡 = 𝑁𝑡 ∗ 𝑐∈𝐶 𝑢∈𝑈𝑡,𝑐 |{𝑢′|𝑢′ ∈ 𝑈𝑡,𝑐, 𝑢′ ≠ 𝑢}| 𝑁𝑡 2 𝑁𝑡: total occurrences of 𝑡 𝐶 : set of all cells 𝑈𝑡,𝑐: set of users that used tag 𝑡 inside cell c • Tags with non-zero locality score form the tag set 𝑇𝑙
  8. Locality – value distribution #8 london (6975), paris (5452), nyc (3917) luminancehdr (0.0035), dsc6362 (0.003), air photo (0.002)
  9. Extensions • Spatial Entropy (SE) function – calculate entropy values applying the Shannon entropy formula in the tag-cell probabilities – build a Gaussian weight function based on the values of the tag SE #9 • Internal Grid – Built an additional LM using a finer grid, cell side length of 0.001° – combine the MLC of the individual language models • Similarity search [6] (Van Laere et al., ICMR 2011) – determine 𝑘 most similar training images in the MLC – their center-of-gravity is the final location estimation From [2]: (Kordopatis-Zilos et al., MediaEval 2014)
  10. Visual-based location estimation #10 Model building • CNN features adapted by fine-tuning the VGG model [5] (Simonyan & Zisserman, ICLR 2015) • Training: ~1K Points Of Interest (POIs), ~1200 images/POI • Caffe [1] (Jia et al., arxiv 2014) is fed directly with the CNN features • Compressed outputs of fc7 layer (4096d) to 128d using PCA • CNN features used to compute image similarities 𝑠 𝑣𝑖𝑠,𝑖𝑗 Location Estimation • Geospatial clustering of 𝑘 = 20 visually most similar images • If 𝑗-th image is within 1km from the closest one of the previous j − 1 images, it is assigned to its cluster, otherwise it forms its own cluster • The largest cluster (or the first in case of equal size) is selected and its centroid is used as the location estimate
  11. Hybrid-based location estimation Model building • Combination of the textual and visual approaches • Build LM model using the tag-based approach above and use it for MLC selection Similarity Calculation • Combination of the visual and textual similarities. • Normalize the visual similarities to the range [0, 1] • Similarity between two images 𝑠𝑖𝑗 = 𝑠𝑡𝑒𝑥,𝑖𝑗 + 𝑠 𝑣𝑖𝑠,𝑖𝑗 2 • The final estimation is the center-of-gravity of the 𝑘 = 5 most similar images Low Confidence Estimations • For those test images, with no estimate or confidence lower than 0.02 (≈10% of the test set), the visual approach is used to produce the estimated locations #11
  12. Confidence • Evaluate the confidence of the LM estimation of each query image • Measures how localized are the language model cell estimations, based on cell probabilities • Confidence measure conf 𝑖 = 𝑐∈𝐶{𝑝 𝑐 𝑖 |dist 𝑐, mlc < 𝑙} 𝑐∈𝐶 𝑝 𝑐 𝑖 𝑝(𝑐|𝑖): cell probability of cell c for image 𝑖 𝑑𝑖𝑠𝑡(𝑐1, 𝑐2): distance between 𝑐1 and 𝑐2 mlc: Most Likely Cell #12
  13. Runs and Results #13 measure RUN-1 RUN-2 RUN-3 RUN-4 RUN-5 acc(1m) 0.15 0.01 0.15 0.16 0.16 acc(10m) 0.61 0.08 0.62 0.75 0.76 acc(100m) 6.40 1.76 6.52 7.73 7.83 acc(1km) 24.33 5.19 24.61 27.30 27.54 acc(10km) 43.07 7.43 43.41 46.48 46.77 m. error (km) 69 5663 61 24 22 RUN-1: Tag-based location estimation + released training set RUN-2: Visual-based location estimation + released training set RUN-3: Hybrid location estimation + released training set RUN-4: Tag-based location estimation + YFCC dataset RUN-5: Hybrid location estimation + YFCC dataset
  14. Thank you! • Code: • Get in touch: @sympapadopoulos / @georgekordopatis / #14
  15. References #15 [1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. [2] G. Kordopatis-Zilos, G. Orfanidis, S. Papadopoulos, and Y. Kompatsiaris. Socialsensor at mediaeval placing task 2014. In MediaEval 2014 Placing Task, 2014. [3] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social media content with a refined language modelling approach. In Intelligence and Security Informatics, pages 21–40, 2015. [4] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In MediaEval 2013 Placing Task, 2013. [5] K. Simonyan and A. Zisserman. Very deep convolutional networks for large- scale image recognition. In International Conference on Learning Representations, 2015. [6] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. ICMR ’11, pages 48:1–48:8, New York, NY, USA, 2011. ACM.

Editor's Notes

  1. Different kinds of user classification: topic-oriented (e.g., interest/expertise) role-based/behavioral (e.g., bot/spammer) geographical location Useful for advertising, user recommendation, expert search, etc. For personal accounts, user classification raises privacy concerns Challenges multi-linguality Brevity informal language