NTNU@MediaEval 2011: Social Event Detection Task (SED) Massimiliano Ruocco,  Heri Ramampiaro Data and Information Manageme...
MediaEval2011 – SED Task Outline <ul><li>Proposed Approach </li></ul><ul><li>Experiments  </li></ul><ul><li>Results </li><...
MediaEval2011 – SED Task Framework - Workflow Query Expansion Search Refinement Categorization Categorization Dataset Last...
Challenge 1
MediaEval2011 – SED Task Framework –  Challenge  1 <ul><li>Query Expansion </li></ul><ul><li>Football venues  names  based...
MediaEval2011 – SED Task Framework –  Challenge  1 <ul><li>Search + Categorization </li></ul><ul><li>Terms in  OR , ( Term...
MediaEval2011 – SED Task Framework –  Challenge  1 <ul><li>Clustering </li></ul><ul><li>Results grouped by  temporal  tag ...
Challenge  2
MediaEval2011 – SED Task Framework –  Challenge  2 <ul><li>Query Expansion </li></ul><ul><li>Location  and  venue names  e...
MediaEval2011 – SED Task Framework –  Challenge  2 <ul><li>Search </li></ul><ul><li>(Terms in  OR  +   Spatial constraint)...
MediaEval2011 – SED Task Framework –  Challenge  2 <ul><li>Clustering + Semantic Merge </li></ul><ul><li>Results grouped b...
MediaEval2011 – SED Task Framework –  Challenge  2 <ul><li>Refinement </li></ul><ul><li>Refinement query for each cluster:...
MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 :  No  Refinement step </li></u...
MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 :  No  Refinement step </li></u...
MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 :  No  Refinement step </li></u...
MediaEval2011 – SED Task Conclusions and Future Works <ul><li>Tag  metadata  more representative </li></ul><ul><li>Better ...
Thanks for the attention Questions? http://www.idi.ntnu.no/~ruocco/
Upcoming SlideShare
Loading in...5
×

NTNU @ Social Event Detection Task (SED)

391

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
391
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • For each venue name in each different language the related spatial location a query is built. The tetxual terms of the venue name compose a boolean query by using the OR operator. A spatial and textual search is performed by the search engine for each input query related to each venue. A threshold on the search engine integrated score for the results occurrencies is used to select the results. The categorization is performed by using the three textual metadata surrounding every picture. Output is a set of flickr pictures grouped by venues which has SOCCER as topic.
  • Then the similarity between two clusters is based on the number of shared entity names.
  • In both refinement step is not in the process Giving a look over all the measures we can see a benefit using the refinement step and even a benefit using entity names instead of pure most frequent tags to refine the clusters. (observe the interesting increase of the recall measure) I would even remarks some point. Clustering evaluation is not an easy task since there are different metrics and different constraint can be verified. In particular two constraint that an evaluation metrics shoould evaluate are COMPLETENESS () and HOMOGEINITY (). For the first, Recall is a good indicator ad for the second recall and NMI are good indicator. With NMI it seems even to have increasing performarmce in term of completeness from the refinement by using top-100 tags and entity names
  • In both refinement step is not in the process Giving a look over all the measures we can see a benefit using the refinement step and even a benefit using entity names instead of pure most frequent tags to refine the clusters. (observe the interesting increase of the recall measure) I would even remarks some point. Clustering evaluation is not an easy task since there are different metrics and different constraint can be verified. In particular two constraint that an evaluation metrics shoould evaluate are COMPLETENESS () and HOMOGEINITY (). For the first, Recall is a good indicator ad for the second recall and NMI are good indicator. With NMI it seems even to have increasing performarmce in term of completeness from the refinement by using top-100 tags and entity names
  • In both refinement step is not in the process Giving a look over all the measures we can see a benefit using the refinement step and even a benefit using entity names instead of pure most frequent tags to refine the clusters. (observe the interesting increase of the recall measure) I would even remarks some point. Clustering evaluation is not an easy task since there are different metrics and different constraint can be verified. In particular two constraint that an evaluation metrics shoould evaluate are COMPLETENESS () and HOMOGEINITY (). For the first, Recall is a good indicator ad for the second recall and NMI are good indicator. With NMI it seems even to have increasing performarmce in term of completeness from the refinement by using top-100 tags and entity names
  • NTNU @ Social Event Detection Task (SED)

    1. 1. NTNU@MediaEval 2011: Social Event Detection Task (SED) Massimiliano Ruocco, Heri Ramampiaro Data and Information Management Group Department Of Computer and Information Science Norwegian University of Science and Technology [email_address] MediaEval 2011 Workshop - Pisa
    2. 2. MediaEval2011 – SED Task Outline <ul><li>Proposed Approach </li></ul><ul><li>Experiments </li></ul><ul><li>Results </li></ul><ul><li>Conclusions and Future Works </li></ul>
    3. 3. MediaEval2011 – SED Task Framework - Workflow Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia Clustered List Clustering Semantic Merge
    4. 4. Challenge 1
    5. 5. MediaEval2011 – SED Task Framework – Challenge 1 <ul><li>Query Expansion </li></ul><ul><li>Football venues names based in Rome and Barcelona </li></ul><ul><li>Location of the venues ( Latitude and Longitude ) </li></ul><ul><li>Output : list of venues names in different languages with related location </li></ul><ul><ul><li>V = {(v 1 1 ,…,v N 1 1 , g 1 ),…,(v 1 M ,…,v N M M , g M )} </li></ul></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia Clustered List <ul><li>Resources </li></ul><ul><li>Query Language : SparQL </li></ul><ul><li>Database : DBPedia </li></ul><ul><li>Java Interface : Jena </li></ul>
    6. 6. MediaEval2011 – SED Task Framework – Challenge 1 <ul><li>Search + Categorization </li></ul><ul><li>Terms in OR , ( Terms in OR + Spatial constraint) </li></ul><ul><li>Categorization over different textual metadata ( Title , Tag , Description ) </li></ul><ul><li>Output : result list grouped by venue (topic: soccer ) </li></ul><ul><ul><li>R = {(r 1 1 ,...,r N 1 1 ),…, (r 1 M ,...,r N M M )} </li></ul></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia <ul><li>Resources </li></ul><ul><li>Index : Solr </li></ul><ul><li>Categorization : SemanticHacker API </li></ul><ul><li>Categories : Open Directory Project </li></ul>
    7. 7. MediaEval2011 – SED Task Framework – Challenge 1 <ul><li>Clustering </li></ul><ul><li>Results grouped by temporal tag </li></ul><ul><li>Quality Threshold Clustering (Qt Clustering) </li></ul><ul><li>Output : pictures grouped by temporal tag and venue </li></ul>Query Expansion Search Clustering Semantic Merge Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia
    8. 8. Challenge 2
    9. 9. MediaEval2011 – SED Task Framework – Challenge 2 <ul><li>Query Expansion </li></ul><ul><li>Location and venue names extraction of “ Paradiso ” and “ Parc del Fórum ” </li></ul><ul><li>Output : list of venues names with related location </li></ul><ul><ul><li>V = {(v 1 1 ,…,v N 1 1 , g 1 ),…,(v 1 M ,…,v N M M , g M )} </li></ul></ul><ul><li>Resources </li></ul><ul><li>Service : LastFM API </li></ul><ul><li>Database : LastFM </li></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia
    10. 10. MediaEval2011 – SED Task Framework – Challenge 2 <ul><li>Search </li></ul><ul><li>(Terms in OR + Spatial constraint), (Terms in AND) </li></ul><ul><li>Output : result list grouped by venue </li></ul><ul><ul><li>R = {(r 1 1 ,...,r N 1 1 ),…, (r 1 M ,...,r N M M )} </li></ul></ul><ul><li>Resources </li></ul><ul><li>Index : Solr </li></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia Clustered List
    11. 11. MediaEval2011 – SED Task Framework – Challenge 2 <ul><li>Clustering + Semantic Merge </li></ul><ul><li>Results grouped by temporal tag </li></ul><ul><li>Quality Threshold Clustering ( Qt Clustering ) </li></ul><ul><li>Semantic merge : based on entity names representing artist and event name </li></ul><ul><ul><li>Semantic similarity : number of shared entity names </li></ul></ul><ul><li>Output : pictures grouped by temporal tag and venue </li></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia Clustering Semantic Merge
    12. 12. MediaEval2011 – SED Task Framework – Challenge 2 <ul><li>Refinement </li></ul><ul><li>Refinement query for each cluster: </li></ul><ul><ul><li>Top-k most frequent tags </li></ul></ul><ul><ul><li>Top-k most frequent entity names </li></ul></ul><ul><li>Categorization: filter over the query result (using search engine score) </li></ul>Query Expansion Search Refinement Categorization Categorization Dataset LastFM SparQL Endpoint DBPedia Clustered List <ul><li>Resources </li></ul><ul><li>Index : Solr </li></ul>
    13. 13. MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 : No Refinement step </li></ul><ul><li>Run 2 : Refinement with top-100 tags </li></ul><ul><li>Run 3 : Refinement with entity names </li></ul><ul><li>Challenge 1 </li></ul><ul><li>Run 1 : Categorization with only Tag </li></ul><ul><li>Run 2 : Categorization with all textual metadata </li></ul>
    14. 14. MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 : No Refinement step </li></ul><ul><li>Run 2 : Refinement with top-100 tags </li></ul><ul><li>Run 3 : Refinement with entity names </li></ul><ul><li>Challenge 1 </li></ul><ul><li>Run 1 : Categorization with only Tag </li></ul><ul><li>Run 2 : Categorization with all textual metadata </li></ul>homogeinity
    15. 15. MediaEval2011 – SED Task Results - Experiments <ul><li>Challenge 2 </li></ul><ul><li>Run 1 : No Refinement step </li></ul><ul><li>Run 2 : Refinement with top-100 tags </li></ul><ul><li>Run 3 : Refinement with entity names </li></ul><ul><li>Challenge 1 </li></ul><ul><li>Run 1 : Categorization with only Tag </li></ul><ul><li>Run 2 : Categorization with all textual metadata </li></ul>completeness
    16. 16. MediaEval2011 – SED Task Conclusions and Future Works <ul><li>Tag metadata more representative </li></ul><ul><li>Better performance using of entity names in event cluster refinement </li></ul><ul><li>Refinement block useful for better completeness </li></ul><ul><li>Use of Refinement block for general event clustering purpose </li></ul>
    17. 17. Thanks for the attention Questions? http://www.idi.ntnu.no/~ruocco/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×