Finding High-Quality Content in Social Media

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

2 comments

Comments 1 - 2 of 2 previous next Post a comment

  • + jboutelle Jonathan Boutelle 2 years ago
    Yahoo avatars are SO CUTE!
    They look like little anime people...
  • + ChaToX Carlos Castillo 2 years ago
    This talk corresponds to the following article:

    Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne: 'Finding High-Quality Content in Social Media'. First ACM Conference on Web Search and Data Mining. Stanford, CA, USA. February 2008.

    http://www.citeulike.org/user/ChaTo/article/1692096
Post a comment
Embed Video
Edit your comment Cancel

7 Favorites & 1 Group

Finding High-Quality Content in Social Media - Presentation Transcript

  1. Eugene Agichtein Emory University Atlanta, USA Carlos Castillo Debora Donato Aris Gionis Yahoo! Research Barcelona, Spain Gilad Mishne Yahoo! S&A Sciences Santa Clara, CA, USA
  2. User-generated content ≠ Traditional publishing E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  3. Chris Anderson: “The Long Tail”. Hyperion, 2006. Frequency Traditional publishing User- generated Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  4. Chris Anderson: “The Long Tail”. Hyperion, 2006. Quantity User- generated Traditional publishing Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  5. <!-- E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  6. Quantity ? Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  7. Chris Martin from Coldplay in The Rolling Stone, Fortieth Aniversary, July 2007. “We think it's all about quality over quantity now, because there's so much noise everywhere, there's no point in putting anything out unless it's fucking amazing.” Quantity Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  8. Quantity User- A generated Traditional publishing Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  9. Quantity User- F.A. generated Traditional publishing Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  10. --> E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  11. ? Quantity Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  12. (Hard) problem Quantity Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  13. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  14. Question + “Stars” Best answer Picked by votes -or- Picked by asker All answers + “Thumbs up” + “Thumbs down” E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  15. ¼ questions want an opinion: informal polls ¾ questions seek for information or advice E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  16. Q. Su, D. Pavlov, J.-H. Chow, W. C. Baker. “Internet-scale collection of human-reviewed data”.WWW'07. 17%-45% of answers were correct 65%-90% of questions had at least one correct answer E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  17. There are top contributors ... ... but they don't have all the answers E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  18. Task: find high-quality items Quantity Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  19. Existing tools ● Link-based ranking methods ● Propagation of trust/distrust ● Automatic text analysis ● Usage mining ● ... E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  20. Sources of information ● Content analysis ● Usage data (clicks) ● Community ratings E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  21. Sources of information ● Content analysis (with errors) ● Clicks (with noise) ● Community ratings (sparse, with spam) E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  22. Text analysis Clicks Community E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  23. Text analysis Clicks Community E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  24. Text analysis Readability statistics Language modeling E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  25. Text analysis Readability statistics Language modeling Punctuation density Capitalization errors Number of words + spacing density, sylablles per word,... E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  26. G. Mishne, D. Carmel, R. Lempel: “Blocking blog spam with language model disagreement”. AIRWeb'05 Text analysis Readability statistics Language modeling Language model disagreement Distributions of word n-grams and part-of-speech sequences when|how|why -- “to” -- verb “how to identify ...” when|how|why – verb – verb – pronoun – verb “how do I remove ...” E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  27. Text analysis Clicks Community E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  28. Clicks If we know that a question is clicked 100 times, and another question is clicked 10,000 times ... ... we still know nothing E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  29. Clicks Clicks Per-category average Per-category stdev. Question age E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  30. Text analysis Clicks Community E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  31. Power laws E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  32. P. Jurczyk, E. Agichtein: “Discovering authorities in Q.A. communities by using link analysis” CIKM'07 Askers Answerers E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  33. Community answers votes + votes - picks as best E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  34. Community Degree-based metrics # answers given # answers received # votes + given # votes + received etc... E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  35. Community Propagation-based metrics 1. Pagerank score 2. HITS hub score 3. HITS authority score Computed on each graph E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  36. Text analysis Clicks Community Relations Training labels Learning E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  37. Text analysis Clicks Community Relations Training labels Learning E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  38. Question quality High Medium Low High 15% Answer Medium 76% quality Low 9% 100% E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  39. Question quality High Medium Low High 15% 8% Answer Medium 76% 74% quality Low 9% 18% 100% 100% E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  40. Question quality High Medium Low High 41% 15% 8% Answer Medium 53% 76% 74% quality Low 6% 9% 18% 100% 100% 100% E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  41. Question quality High Medium Low High 41% 15% 8% Answer Medium 53% 76% 74% quality Low 6% 9% 18% 100% 100% 100% Question quality and answer quality are not independent E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  42. Relations: questions Answers to AQ questions asked Answers to the question being evaluated Questions asked Q Question being A A U evaluated Answers given AQ U Q A U Votes given VV User asking question A U Answerers of question being evaluated E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  43. Relations: answers Answers to AQ questions asked Answer being QA Questions asked evaluated A U AQ Answers given Answerer VV Votes given Asker of question A being answered Other answers to the U Q A same question Question being A answered E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  44. Text analysis Clicks Community Relations Training labels Learning E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  45. J. H. Friedman: “Stochastic gradient boosting”. Comp. Stat. Data. Anal., 38(4), 367-378, 2002. Text analysis Clicks Community Relations Labeled data: 6K questions Learning: 8K answers stochastic gradient boosted trees Evaluation: Precision, Recall (F1); Area under ROC curve E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  46. Task: high-quality questions Precision Recall AUC N-grams (N) 65% 48% 0.52 N+ text analysis 76% 65% 0.65 N+ clicks 68% 57% 0.58 N+ relations 74% 65% 0.66 All 79% 77% 0.76 E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  47. Task: high-quality answers Precision Recall AUC N-grams (N) 67% 86% 0.81 N + text analysis 71% 93% 0.88 N + clicks - - - N + relations 69% 85% 0.82 All 73% 91% 0.87 E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  48. In the paper ... ● Framework for quality estimation in social media ● Graph-based model of contributor relationships ● Details on the relative importance of (sets of) features E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  49. What did we learn? ● Human assessments for this task – ... have relatively low agreement ● Classifying questions/answers – ... is substantially different from document classification ● Look at orthogonal feature spaces E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  50. Future work Text analysis Clicks Community Relations Relational learning Learning E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  51. Thank you! E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  52. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  53. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  54. ROC curve: high-quality questions Best N-grams
  55. ROC curve: high-quality answers Best N-grams

+ Carlos CastilloCarlos Castillo, 2 years ago

custom

4768 views, 7 favs, 2 embeds more stats

Eugene Agichtein, Carlos Castillo, Debora Donato, A more

More info about this document

CC Attribution License

Go to text version

  • Total Views 4768
    • 4622 on SlideShare
    • 146 from embeds
  • Comments 2
  • Favorites 7
  • Downloads 93
Most viewed embeds
  • 114 views on http://www.chato.cl
  • 32 views on http://chato.cl

more

All embeds
  • 114 views on http://www.chato.cl
  • 32 views on http://chato.cl

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories

Groups / Events