Tye Rattenbury, Nathan Good, Mor Naaman* Yahoo! Research Berkeley (Yahoo! Advanced Development Division) Towards Automatic...
This Talk is Not About Photos <ul><li>Geo-referenced data </li></ul><ul><ul><li>Photos </li></ul></ul><ul><ul><li>Blog pos...
Find the Differences “ Yahoo! Headquarters” “ Dog” “ SIGIR 2007”
Data Description
Simple Model (photo_id, user_id, time,  latitude, longitude) (photo_id, tag)
Information? <ul><ul><li>Flickr “geotagged” in San Francisco </li></ul></ul>
Tag Maps - San Francisco
Tag-based Semantics <ul><li>Derive meaningful data about individual tags </li></ul><ul><li>Based on the tag’s metadata pat...
Tag Semantics: Why? <ul><li>Improved image (and web!) search through query semantics </li></ul><ul><li>Automatic place- an...
Next Few Minutes <ul><li>Extended model </li></ul><ul><li>Sample data </li></ul><ul><li>Problem definition </li></ul>
Extended Model (photo_id, user_id, time,  latitude, longitude) (photo_id, tag) (tag, location) (tag, time)
Tag Patterns
Tag Patterns
Tag Patterns
Definitions <ul><li>Event tag: expected to exhibit significant time-based patterns </li></ul><ul><li>Location tag: expecte...
Problem Definition <ul><li>Can we extract event/place semantics from unstructured tags given their usage distribution (tim...
Fundamental Issues <ul><li>“ Regions” of semantic relevance </li></ul><ul><ul><li>“ church” in a neighborhood is a place <...
Proposed Solutions <ul><li>Naïve Scan  </li></ul><ul><ul><li>Burst detection </li></ul></ul><ul><li>Spatial Scan Statistic...
Next Few Minutes <ul><li>Intuition behind the three methods </li></ul><ul><li>Evaluation </li></ul><ul><li>Summary </li></ul>
Naïve Scan
Spatial Scan
Issues <ul><li>First two methods just look for a local burst in usage (in either time or space) </li></ul>
Scale-structure Identification <ul><li>Select a scale </li></ul><ul><ul><li>Find the structure of the data for the scale <...
Scale-structure Identification
Scale-structure Identification Entropy = 1.421621 Entropy = 0.766263 Entropy = 0.555915 Entropy = 0.062760 “Barry Bonds” tag
San Francisco Experiments <ul><li>~43k photos </li></ul><ul><li>~800 tags </li></ul>San Francisco Dataset: 42,000 Photos 8...
Experiments byobw Ground Truth: BYOBW?
Experiments <ul><li>Given a ground truth: </li></ul><ul><ul><li>Run the three methods, vary thresholds </li></ul></ul><ul>...
The Obligatory Slide Where I Show You that Our Curve is Above the Other Curves
Experiments
Future Work <ul><li>Multiple regions </li></ul><ul><li>Different spatially and temporally encoded data (e.g., blogs) </li>...
Questions? <ul><li>Tye Rattenbury, Nathan Good, Mor Naaman   </li></ul><ul><li>Y!RB / Y!A.D.D. </li></ul><ul><li>Read more...
Tag Maps - Paris - Les Blogs?
Issues <ul><li>Noisy data </li></ul><ul><li>Photographer biases </li></ul><ul><ul><li>In locations </li></ul></ul><ul><ul>...
Upcoming SlideShare
Loading in...5
×

Extracting event and place semantics from Flickr tags

2,498

Published on

Presentation at SIGIR 2007.

Published in: Technology, Spiritual
2 Comments
7 Likes
Statistics
Notes
  • Dog photo by CC license from http://flickr.com/photos/pellegrini/5419830/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Seems like Slideshare still don't have some Gotham fonts. Sorry, reader!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,498
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
2
Likes
7
Embeds 0
No embeds

No notes for slide
  • Mor… star interns One way to extract knowledge from Flickr tags. This is not about image search - it’s not even about photos
  • Extracting event and place semantics from Flickr tags

    1. 1. Tye Rattenbury, Nathan Good, Mor Naaman* Yahoo! Research Berkeley (Yahoo! Advanced Development Division) Towards Automatic Extraction of Event and Place Semantics from Flickr Tags
    2. 2. This Talk is Not About Photos <ul><li>Geo-referenced data </li></ul><ul><ul><li>Photos </li></ul></ul><ul><ul><li>Blog posts (GeoRSS) </li></ul></ul><ul><ul><li>Geo-annotated Web pages </li></ul></ul><ul><ul><li>KML-based collections </li></ul></ul><ul><li>Similar issues, similar potential </li></ul>
    3. 3. Find the Differences “ Yahoo! Headquarters” “ Dog” “ SIGIR 2007”
    4. 4. Data Description
    5. 5. Simple Model (photo_id, user_id, time, latitude, longitude) (photo_id, tag)
    6. 6. Information? <ul><ul><li>Flickr “geotagged” in San Francisco </li></ul></ul>
    7. 7. Tag Maps - San Francisco
    8. 8. Tag-based Semantics <ul><li>Derive meaningful data about individual tags </li></ul><ul><li>Based on the tag’s metadata patterns </li></ul><ul><li>E.g., ”Yahoo! Headquarters” , “SIGIR2007” , “Dog” . </li></ul>
    9. 9. Tag Semantics: Why? <ul><li>Improved image (and web!) search through query semantics </li></ul><ul><li>Automatic place- and event-gazetteers </li></ul><ul><li>Association of missing time/place data based on tags </li></ul><ul><li>Collection visualization </li></ul><ul><li>… </li></ul>
    10. 10. Next Few Minutes <ul><li>Extended model </li></ul><ul><li>Sample data </li></ul><ul><li>Problem definition </li></ul>
    11. 11. Extended Model (photo_id, user_id, time, latitude, longitude) (photo_id, tag) (tag, location) (tag, time)
    12. 12. Tag Patterns
    13. 13. Tag Patterns
    14. 14. Tag Patterns
    15. 15. Definitions <ul><li>Event tag: expected to exhibit significant time-based patterns </li></ul><ul><li>Location tag: expected to exhibit significant place-based patterns </li></ul>
    16. 16. Problem Definition <ul><li>Can we extract event/place semantics from unstructured tags given their usage distribution (time and location patterns)? </li></ul>
    17. 17. Fundamental Issues <ul><li>“ Regions” of semantic relevance </li></ul><ul><ul><li>“ church” in a neighborhood is a place </li></ul></ul><ul><ul><li>“ carnival” in Rio de Janeiro is an event </li></ul></ul><ul><ul><li>“ California” in San Francsico is not a place </li></ul></ul><ul><li>Semantic ambiguity </li></ul><ul><ul><li>“ apple” </li></ul></ul><ul><ul><li>“ turkey” </li></ul></ul><ul><ul><li>“ pride” </li></ul></ul>
    18. 18. Proposed Solutions <ul><li>Naïve Scan </li></ul><ul><ul><li>Burst detection </li></ul></ul><ul><li>Spatial Scan Statistics </li></ul><ul><ul><li>Borrowed from disease outbreak detection </li></ul></ul><ul><li>Scale-structure method </li></ul><ul><ul><li>Examine the “structure” of the tag’s patterns in multiple scales </li></ul></ul>
    19. 19. Next Few Minutes <ul><li>Intuition behind the three methods </li></ul><ul><li>Evaluation </li></ul><ul><li>Summary </li></ul>
    20. 20. Naïve Scan
    21. 21. Spatial Scan
    22. 22. Issues <ul><li>First two methods just look for a local burst in usage (in either time or space) </li></ul>
    23. 23. Scale-structure Identification <ul><li>Select a scale </li></ul><ul><ul><li>Find the structure of the data for the scale </li></ul></ul><ul><ul><li>Calculate the entropy of the structure </li></ul></ul><ul><li>Increase scale, repeat </li></ul><ul><li>Is the information “well organized”? </li></ul>
    24. 24. Scale-structure Identification
    25. 25. Scale-structure Identification Entropy = 1.421621 Entropy = 0.766263 Entropy = 0.555915 Entropy = 0.062760 “Barry Bonds” tag
    26. 26. San Francisco Experiments <ul><li>~43k photos </li></ul><ul><li>~800 tags </li></ul>San Francisco Dataset: 42,000 Photos 800+ popular tags
    27. 27. Experiments byobw Ground Truth: BYOBW?
    28. 28. Experiments <ul><li>Given a ground truth: </li></ul><ul><ul><li>Run the three methods, vary thresholds </li></ul></ul><ul><ul><li>Precision vs. recall </li></ul></ul>
    29. 29. The Obligatory Slide Where I Show You that Our Curve is Above the Other Curves
    30. 30. Experiments
    31. 31. Future Work <ul><li>Multiple regions </li></ul><ul><li>Different spatially and temporally encoded data (e.g., blogs) </li></ul><ul><li>Other metadata domains besides time and space </li></ul><ul><ul><li>Visual features: shape primitives, color </li></ul></ul><ul><ul><li>Semantic features </li></ul></ul>
    32. 32. Questions? <ul><li>Tye Rattenbury, Nathan Good, Mor Naaman </li></ul><ul><li>Y!RB / Y!A.D.D. </li></ul><ul><li>Read more, follow: http://www.whyrb.com </li></ul><ul><li>Slides: http://slideshare.net/mor </li></ul><ul><li>Contact: mor@yahoo-inc.com </li></ul>
    33. 33. Tag Maps - Paris - Les Blogs?
    34. 34. Issues <ul><li>Noisy data </li></ul><ul><li>Photographer biases </li></ul><ul><ul><li>In locations </li></ul></ul><ul><ul><li>In Tags </li></ul></ul><ul><li>Wrong data </li></ul>Whatever, color, city, spectrum, santa barbara, california, usa, Lookatme, Herbert Bayer Chromatic Gate

    ×