Your SlideShare is downloading. ×
0
Getting the Most Out of Social Annotations for Web                Page Classification                       DocEng 2009    ...
IntroductionIndex1   Introduction2   Dataset3   Experiments4   Conclusions5   Future WorkZubiaga, Mart´             ınez, ...
IntroductionWhat is Web Page Classification?      We have a set of documents:                                      D = {d1 ...
IntroductionWhat are Social Bookmarking Sites? (I)        Web sites that allow us to save web links, defining metadata to t...
IntroductionWhat are Social Bookmarking Sites? (II)Zubiaga, Mart´             ınez, Fresno (UNED)   Social Annotations for...
IntroductionSocial Annotations      Tags: Keywords. E.g., photography, web2.0, images.      Notes: Free texts describing w...
IntroductionMotivation      Classical web page classification methods rely on web pages’ content.      Motivation: Could so...
IntroductionRelated Work      Some works (Bao et al., 2007; Heymann et al., 2008) show the      usefulness of tags for inf...
DatasetIndex1   Introduction2   Dataset3   Experiments4   Conclusions5   Future WorkZubiaga, Mart´             ınez, Fresn...
DatasetDataset       December 2008 - January 2009: monitoring URLs with more than       100 users annotating it on Delicio...
ExperimentsIndex1   Introduction2   Dataset3   Experiments4   Conclusions5   Future WorkZubiaga, Mart´             ınez, F...
ExperimentsConfiguration        Support Vector Machines (SVM).              SVMmulticlass6        Evaluation: Accuracy.    ...
ExperimentsClassifying with Tags (I)      Unweighted tags.      Ranked tags.      Tag fractions.      Weighted tags (Top 1...
ExperimentsClassyfing with tags (II)Zubiaga, Mart´             ınez, Fresno (UNED)   Social Annotations for WPC   September...
ExperimentsClassifying with Comments (I)      Only notes.      Both notes and reviews.Zubiaga, Mart´             ınez, Fre...
ExperimentsClassifying with Comments (II)Zubiaga, Mart´             ınez, Fresno (UNED)   Social Annotations for WPC   Sep...
ExperimentsComparison with the Baseline (Content) (I)      Content.      Comments.      Tags.Zubiaga, Mart´             ın...
ExperimentsComparison with the Baseline (Content) (II)Zubiaga, Mart´             ınez, Fresno (UNED)   Social Annotations ...
ExperimentsCombining Classifiers (I)      Tags + content.      Tags + comments.      Comment + content.      Tags + comment...
ExperimentsCombining Classifiers (II)Zubiaga, Mart´             ınez, Fresno (UNED)   Social Annotations for WPC   Septembe...
ConclusionsIndex1   Introduction2   Dataset3   Experiments4   Conclusions5   Future WorkZubiaga, Mart´             ınez, F...
ConclusionsConclusions      We analyzed and evaluated the use of social annotations for web page      classification.      ...
Future WorkIndex1   Introduction2   Dataset3   Experiments4   Conclusions5   Future WorkZubiaga, Mart´             ınez, F...
Future WorkFuture Work      Classifying in a lower level.      Filtering tags and comments (misbehavior detection).Zubiaga...
Future WorkThank YouAchiu    Arigato                   Danke Dhannvaad Dua Netjer en ek Efcharisto      Gracias Gr`cies   ...
Upcoming SlideShare
Loading in...5
×

Getting the Most Out of Social Annotations for Web Page Classification

306

Published on

My presentation at DocEng 2009 on September 16th, 2009

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
306
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Getting the Most Out of Social Annotations for Web Page Classification"

  1. 1. Getting the Most Out of Social Annotations for Web Page Classification DocEng 2009 Arkaitz Zubiaga, Raquel Mart´ ınez, V´ ıctor Fresno NLP & IR Group @ UNED September 16th, 2009
  2. 2. IntroductionIndex1 Introduction2 Dataset3 Experiments4 Conclusions5 Future WorkZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 2 / 25
  3. 3. IntroductionWhat is Web Page Classification? We have a set of documents: D = {d1 , ..., d|D| } And a set of predefined categories: C = {c1 , ..., c|C | } Web page classification is known as: dj , ci ∈ D × CZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 3 / 25
  4. 4. IntroductionWhat are Social Bookmarking Sites? (I) Web sites that allow us to save web links, defining metadata to them. Delicious1 1 http://delicious.comZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 4 / 25
  5. 5. IntroductionWhat are Social Bookmarking Sites? (II)Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 5 / 25
  6. 6. IntroductionSocial Annotations Tags: Keywords. E.g., photography, web2.0, images. Notes: Free texts describing web pages. E.g., Flickr is a website for photo sharing and photo online management. Highlights: Selecting relevant parts of a page. Reviews: Free texts with subjective descriptions. E.g., Interesting web page with photos. Ratings: Gradings. E.g., 1 to 5.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 6 / 25
  7. 7. IntroductionMotivation Classical web page classification methods rely on web pages’ content. Motivation: Could social annotations help improving the results?Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 7 / 25
  8. 8. IntroductionRelated Work Some works (Bao et al., 2007; Heymann et al., 2008) show the usefulness of tags for information retrieval. (Ramage et al., 2009) show that tags can improved clustering tasks. (Noll and Meinell, 2008) make a study on tags, concluding that they could be interesting for web page classification tasks.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 8 / 25
  9. 9. DatasetIndex1 Introduction2 Dataset3 Experiments4 Conclusions5 Future WorkZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 9 / 25
  10. 10. DatasetDataset December 2008 - January 2009: monitoring URLs with more than 100 users annotating it on Delicious’ recent feed. 87,096 URLs. Their classification on the Open Directory Project2 (ODP). 12,616 URLs matching. 17 first-level categories. Unbalanced. Annotations retrieval: Number of users annotating it3 . Top 10 list of tags3 . Full Tag Activity (FTA)3 . Notes3 . Reviews4 . Highlights5 . 2 http://www.dmoz.org 3 Delicious 4 StumbleUpon - http://www.stumbleupon.com 5 Diigo - http://diigo.comZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 10 / 25
  11. 11. ExperimentsIndex1 Introduction2 Dataset3 Experiments4 Conclusions5 Future WorkZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 11 / 25
  12. 12. ExperimentsConfiguration Support Vector Machines (SVM). SVMmulticlass6 Evaluation: Accuracy. Several training sets. 6 executions for each set. 6 http://svmlight.joachims.orgZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 12 / 25
  13. 13. ExperimentsClassifying with Tags (I) Unweighted tags. Ranked tags. Tag fractions. Weighted tags (Top 10). Weighted tags (FTA).Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 13 / 25
  14. 14. ExperimentsClassyfing with tags (II)Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 14 / 25
  15. 15. ExperimentsClassifying with Comments (I) Only notes. Both notes and reviews.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 15 / 25
  16. 16. ExperimentsClassifying with Comments (II)Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 16 / 25
  17. 17. ExperimentsComparison with the Baseline (Content) (I) Content. Comments. Tags.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 17 / 25
  18. 18. ExperimentsComparison with the Baseline (Content) (II)Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 18 / 25
  19. 19. ExperimentsCombining Classifiers (I) Tags + content. Tags + comments. Comment + content. Tags + comments + content.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 19 / 25
  20. 20. ExperimentsCombining Classifiers (II)Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 20 / 25
  21. 21. ConclusionsIndex1 Introduction2 Dataset3 Experiments4 Conclusions5 Future WorkZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 21 / 25
  22. 22. ConclusionsConclusions We analyzed and evaluated the use of social annotations for web page classification. Some of the annotations are not popular enough. Tags and comments are popular. Both tags and comments outperform the results by the content. Combining the 3 data inputs performs even better. We corroborate the conclusions by (Noll and Meinell, 2008), showing in a quantitative way that social annotations are useful for web page classification.Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 22 / 25
  23. 23. Future WorkIndex1 Introduction2 Dataset3 Experiments4 Conclusions5 Future WorkZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 23 / 25
  24. 24. Future WorkFuture Work Classifying in a lower level. Filtering tags and comments (misbehavior detection).Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 24 / 25
  25. 25. Future WorkThank YouAchiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto Gracias Gr`cies a Gratia Grazie Guishepeli Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila o o o e esker Obrigado Shukran Tack Tak Takk Shukriya T¨nan Tapadh leat Tesekk¨r ederim Thank a u you TodaZubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 25 / 25
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×