SlideShare a Scribd company logo
1 of 62
Download to read offline
Eugene Agichtein
Emory University
Atlanta, USA
Carlos Castillo
Debora Donato
Aris Gionis
Yahoo! Research
Barcelona, Spain
Gilad Mishne
Yahoo! S&A Sciences
Santa Clara, CA, USA
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
User-generated content Traditional publishing≠
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Chris Anderson: “The Long Tail”. Hyperion, 2006.
Frequency
Quality
Traditional
publishing
User-
generated
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Chris Anderson: “The Long Tail”. Hyperion, 2006.
Quantity
Quality
User-
generated
Traditional
publishing
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
<!--
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
?
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Chris Martin from Coldplay in The Rolling Stone, Fortieth Aniversary, July 2007.
Quantity
Quality
“We think it's all about
quality over quantity
now, because there's
so much noise
everywhere, there's no
point in putting
anything out unless
it's fucking amazing.”
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
AUser-
generated
Traditional
publishing
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
F.A.User-
generated
Traditional
publishing
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
-->
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
?
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
(Hard) problem
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Best answer
Picked by votes
-or-
Picked by asker
All answers
+ “Thumbs up”
+ “Thumbs down”
Question
+ “Stars”
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
¼ questions want an
opinion: informal polls
¾ questions seek for
information or advice
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Q. Su, D. Pavlov, J.-H. Chow, W. C. Baker. “Internet-scale collection of human-reviewed data”.WWW'07.
17%-45% of
answers were correct
65%-90% of
questions had
at least one
correct answer
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
There are top contributors ...
... but they don't have all the answers
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Quantity
Quality
Task: find high-quality items
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Existing tools
● Link-based ranking methods
● Propagation of trust/distrust
● Automatic text analysis
● Usage mining
● ...
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Sources of information
● Content analysis
● Usage data (clicks)
● Community ratings
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Sources of information
● Content analysis
(with errors)
● Clicks
(with noise)
● Community ratings
(sparse, with spam)
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Language modeling
Text analysis
Readability statistics
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis
Language modelingReadability statistics
Punctuation density
Capitalization errors
Number of words
+ spacing density, sylablles per word,...
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis
Language modelingReadability statistics
G. Mishne, D. Carmel, R. Lempel: “Blocking blog spam with language model disagreement”. AIRWeb'05
Language model disagreement
Distributions of word n-grams
and part-of-speech sequences
when|how|why -- “to” -- verb
“how to identify ...”
when|how|why – verb – verb – pronoun – verb
“how do I remove ...”
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Clicks
If we know that a question is clicked 100 times,
and another question is clicked 10,000 times ...
... we still know nothing
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Clicks
Per-category average
Clicks
Per-category stdev.
Question age
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Power laws
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
P. Jurczyk, E. Agichtein: “Discovering authorities in Q.A. communities by using link analysis” CIKM'07
Askers
Answerers
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Community
answers
votes +
votes -
picks as best
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Community
Degree-based metrics
# answers given
# answers received
# votes + given
# votes + received
etc...
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Community
Propagation-based metrics
1. Pagerank score
2. HITS hub score
3. HITS authority score
Computed on each graph
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
Relations
Learning
Training labels
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
Relations
Learning
Training labels
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
High Medium Low
High 15%
Medium 76%
Low 9%
100%
Answer
quality
Question quality
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
High Medium Low
High 15% 8%
Medium 76% 74%
Low 9% 18%
100% 100%
Answer
quality
Question quality
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
High Medium Low
High 41% 15% 8%
Medium 53% 76% 74%
Low 6% 9% 18%
100% 100% 100%
Answer
quality
Question quality
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
High Medium Low
High 41% 15% 8%
Medium 53% 76% 74%
Low 6% 9% 18%
100% 100% 100%
Answer
quality
Question quality
Question quality and answer quality are not independent
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Relations: questions
A
Q
V
Q
A
A
AQ
A
V
U
Answers to the
question being
evaluated
User asking question
Question being
evaluated
Questions asked
Answers given
Votes given
QAAnswers to
questions asked
U
U
U
Answerers of
question being
evaluated
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
A
Q
V
A
Q
A
A
A
U
Q
A
V
U
Other answers to the
same question
Asker of question
being answered
Question being
answered
Answerer
Answer being
evaluated
Questions asked
Answers given
Votes given
QA
Answers to
questions asked
Relations: answers
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
Relations
Learning
Training labels
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
Relations
Learning:
stochastic gradient
boosted trees
Labeled data:
6K questions
8K answers
J. H. Friedman: “Stochastic gradient boosting”. Comp. Stat. Data. Anal., 38(4), 367-378, 2002.
Evaluation:
Precision, Recall (F1);
Area under ROC curve
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Precision Recall AUC
N-grams (N) 65% 48% 0.52
N+ text analysis 76% 65% 0.65
N+ clicks 68% 57% 0.58
N+ relations 74% 65% 0.66
All 79% 77% 0.76
Task: high-quality questions
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Precision Recall AUC
N-grams (N) 67% 86% 0.81
N + text analysis 71% 93% 0.88
N + clicks - - -
N + relations 69% 85% 0.82
All 73% 91% 0.87
Task: high-quality answers
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
In the paper ...
● Framework for quality estimation in
social media
● Graph-based model of contributor
relationships
● Details on the relative importance of
(sets of) features
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
What did we learn?
● Human assessments for this task
– ... have relatively low agreement
● Classifying questions/answers
– ... is substantially different from
document classification
● Look at orthogonal feature spaces
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Text analysis Clicks Community
Relations
Learning
Future work
Relational
learning
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
Thank you!
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
ROC curve: high-quality questions
Best
N-grams
ROC curve: high-quality answers
Best
N-grams

More Related Content

Similar to Finding High-Quality Content in Social Media

The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...
The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...
The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...JodiMasters
 
Its Generational Rotary Relationships
Its Generational Rotary RelationshipsIts Generational Rotary Relationships
Its Generational Rotary RelationshipsRILearn
 
Social Networks and Student Enrollment
Social Networks and Student EnrollmentSocial Networks and Student Enrollment
Social Networks and Student EnrollmentAndrew Careaga
 
Dynamics of Cause & Engagement
Dynamics of Cause & EngagementDynamics of Cause & Engagement
Dynamics of Cause & EngagementJulesCL
 
Forester Lecture at Huntington University
Forester Lecture at Huntington UniversityForester Lecture at Huntington University
Forester Lecture at Huntington UniversityAndrew Hoffman
 
Esse For You Visual Rhetorical
Esse For You Visual RhetoricalEsse For You Visual Rhetorical
Esse For You Visual RhetoricalKatie Fernandez
 
Curriculum VitaeMoore-11-15-16
Curriculum VitaeMoore-11-15-16Curriculum VitaeMoore-11-15-16
Curriculum VitaeMoore-11-15-16Amzie Moore
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like ThisDavid Millard
 
iOme Story 2015
iOme Story 2015iOme Story 2015
iOme Story 2015Adi Redzic
 
Online Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital WorldOnline Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital Worldkatiequigley33
 
Online Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital WorldOnline Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital Worldkatiequigley33
 
Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Sarah Jackson
 
A Historic Moment: The Values Shift in Pandemic America
A Historic Moment: The Values Shift in Pandemic AmericaA Historic Moment: The Values Shift in Pandemic America
A Historic Moment: The Values Shift in Pandemic AmericaZeno Group
 
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012Gilad Lotan
 
New Ways to Fund Media
New Ways to Fund MediaNew Ways to Fund Media
New Ways to Fund MediaSharon Chan
 
Modeling Human Values with Social Media
Modeling Human Values with Social MediaModeling Human Values with Social Media
Modeling Human Values with Social MediaYelena Mejova
 
Writing Skills Improve Writing Skills, Improve Writi
Writing Skills Improve Writing Skills, Improve WritiWriting Skills Improve Writing Skills, Improve Writi
Writing Skills Improve Writing Skills, Improve WritiAngie Willis
 
55 Mla In Text Citation Website Example No Author
55 Mla In Text Citation Website Example No Author55 Mla In Text Citation Website Example No Author
55 Mla In Text Citation Website Example No AuthorJill Turner
 

Similar to Finding High-Quality Content in Social Media (20)

The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...
The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...
The ALS Ice Bucket Challenge: The Social Media Impact on a Grassroots Viral V...
 
Its Generational Rotary Relationships
Its Generational Rotary RelationshipsIts Generational Rotary Relationships
Its Generational Rotary Relationships
 
Social Networks and Student Enrollment
Social Networks and Student EnrollmentSocial Networks and Student Enrollment
Social Networks and Student Enrollment
 
Dynamics of Cause & Engagement
Dynamics of Cause & EngagementDynamics of Cause & Engagement
Dynamics of Cause & Engagement
 
Forester Lecture at Huntington University
Forester Lecture at Huntington UniversityForester Lecture at Huntington University
Forester Lecture at Huntington University
 
Esse For You Visual Rhetorical
Esse For You Visual RhetoricalEsse For You Visual Rhetorical
Esse For You Visual Rhetorical
 
Curriculum VitaeMoore-11-15-16
Curriculum VitaeMoore-11-15-16Curriculum VitaeMoore-11-15-16
Curriculum VitaeMoore-11-15-16
 
Most Final-FINALBATEMAN
Most Final-FINALBATEMANMost Final-FINALBATEMAN
Most Final-FINALBATEMAN
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like This
 
iOme Story 2015
iOme Story 2015iOme Story 2015
iOme Story 2015
 
Online Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital WorldOnline Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital World
 
Online Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital WorldOnline Trust and Public Health: Communicating in a Digital World
Online Trust and Public Health: Communicating in a Digital World
 
Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019
 
A Historic Moment: The Values Shift in Pandemic America
A Historic Moment: The Values Shift in Pandemic AmericaA Historic Moment: The Values Shift in Pandemic America
A Historic Moment: The Values Shift in Pandemic America
 
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
 
New Ways to Fund Media
New Ways to Fund MediaNew Ways to Fund Media
New Ways to Fund Media
 
Modeling Human Values with Social Media
Modeling Human Values with Social MediaModeling Human Values with Social Media
Modeling Human Values with Social Media
 
Writing Skills Improve Writing Skills, Improve Writi
Writing Skills Improve Writing Skills, Improve WritiWriting Skills Improve Writing Skills, Improve Writi
Writing Skills Improve Writing Skills, Improve Writi
 
Dagan "'Alexa, get me the articles': user experience and voice interfaces in ...
Dagan "'Alexa, get me the articles': user experience and voice interfaces in ...Dagan "'Alexa, get me the articles': user experience and voice interfaces in ...
Dagan "'Alexa, get me the articles': user experience and voice interfaces in ...
 
55 Mla In Text Citation Website Example No Author
55 Mla In Text Citation Website Example No Author55 Mla In Text Citation Website Example No Author
55 Mla In Text Citation Website Example No Author
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Finding High-Quality Content in Social Media

  • 1. Eugene Agichtein Emory University Atlanta, USA Carlos Castillo Debora Donato Aris Gionis Yahoo! Research Barcelona, Spain Gilad Mishne Yahoo! S&A Sciences Santa Clara, CA, USA
  • 2.
  • 3. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. User-generated content Traditional publishing≠
  • 4. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Chris Anderson: “The Long Tail”. Hyperion, 2006. Frequency Quality Traditional publishing User- generated
  • 5. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Chris Anderson: “The Long Tail”. Hyperion, 2006. Quantity Quality User- generated Traditional publishing
  • 6. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. <!--
  • 7. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality ?
  • 8. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Chris Martin from Coldplay in The Rolling Stone, Fortieth Aniversary, July 2007. Quantity Quality “We think it's all about quality over quantity now, because there's so much noise everywhere, there's no point in putting anything out unless it's fucking amazing.”
  • 9. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality AUser- generated Traditional publishing
  • 10. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality F.A.User- generated Traditional publishing
  • 11. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. -->
  • 12. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality ?
  • 13. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality (Hard) problem
  • 14. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  • 15.
  • 16. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Best answer Picked by votes -or- Picked by asker All answers + “Thumbs up” + “Thumbs down” Question + “Stars”
  • 17. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. ¼ questions want an opinion: informal polls ¾ questions seek for information or advice
  • 18.
  • 19. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Q. Su, D. Pavlov, J.-H. Chow, W. C. Baker. “Internet-scale collection of human-reviewed data”.WWW'07. 17%-45% of answers were correct 65%-90% of questions had at least one correct answer
  • 20. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. There are top contributors ... ... but they don't have all the answers
  • 21. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Quantity Quality Task: find high-quality items
  • 22. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Existing tools ● Link-based ranking methods ● Propagation of trust/distrust ● Automatic text analysis ● Usage mining ● ...
  • 23. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Sources of information ● Content analysis ● Usage data (clicks) ● Community ratings
  • 24. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Sources of information ● Content analysis (with errors) ● Clicks (with noise) ● Community ratings (sparse, with spam)
  • 25. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community
  • 26. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community
  • 27. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Language modeling Text analysis Readability statistics
  • 28. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Language modelingReadability statistics Punctuation density Capitalization errors Number of words + spacing density, sylablles per word,...
  • 29. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Language modelingReadability statistics G. Mishne, D. Carmel, R. Lempel: “Blocking blog spam with language model disagreement”. AIRWeb'05 Language model disagreement Distributions of word n-grams and part-of-speech sequences when|how|why -- “to” -- verb “how to identify ...” when|how|why – verb – verb – pronoun – verb “how do I remove ...”
  • 30. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community
  • 31. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Clicks If we know that a question is clicked 100 times, and another question is clicked 10,000 times ... ... we still know nothing
  • 32. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Clicks Per-category average Clicks Per-category stdev. Question age
  • 33. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community
  • 34.
  • 35.
  • 36.
  • 37. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Power laws
  • 38.
  • 39. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. P. Jurczyk, E. Agichtein: “Discovering authorities in Q.A. communities by using link analysis” CIKM'07 Askers Answerers
  • 40. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Community answers votes + votes - picks as best
  • 41. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Community Degree-based metrics # answers given # answers received # votes + given # votes + received etc...
  • 42. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Community Propagation-based metrics 1. Pagerank score 2. HITS hub score 3. HITS authority score Computed on each graph
  • 43. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community Relations Learning Training labels
  • 44. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community Relations Learning Training labels
  • 45. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. High Medium Low High 15% Medium 76% Low 9% 100% Answer quality Question quality
  • 46. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. High Medium Low High 15% 8% Medium 76% 74% Low 9% 18% 100% 100% Answer quality Question quality
  • 47. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. High Medium Low High 41% 15% 8% Medium 53% 76% 74% Low 6% 9% 18% 100% 100% 100% Answer quality Question quality
  • 48. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. High Medium Low High 41% 15% 8% Medium 53% 76% 74% Low 6% 9% 18% 100% 100% 100% Answer quality Question quality Question quality and answer quality are not independent
  • 49. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Relations: questions A Q V Q A A AQ A V U Answers to the question being evaluated User asking question Question being evaluated Questions asked Answers given Votes given QAAnswers to questions asked U U U Answerers of question being evaluated
  • 50. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. A Q V A Q A A A U Q A V U Other answers to the same question Asker of question being answered Question being answered Answerer Answer being evaluated Questions asked Answers given Votes given QA Answers to questions asked Relations: answers
  • 51. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community Relations Learning Training labels
  • 52. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community Relations Learning: stochastic gradient boosted trees Labeled data: 6K questions 8K answers J. H. Friedman: “Stochastic gradient boosting”. Comp. Stat. Data. Anal., 38(4), 367-378, 2002. Evaluation: Precision, Recall (F1); Area under ROC curve
  • 53. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Precision Recall AUC N-grams (N) 65% 48% 0.52 N+ text analysis 76% 65% 0.65 N+ clicks 68% 57% 0.58 N+ relations 74% 65% 0.66 All 79% 77% 0.76 Task: high-quality questions
  • 54. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Precision Recall AUC N-grams (N) 67% 86% 0.81 N + text analysis 71% 93% 0.88 N + clicks - - - N + relations 69% 85% 0.82 All 73% 91% 0.87 Task: high-quality answers
  • 55. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. In the paper ... ● Framework for quality estimation in social media ● Graph-based model of contributor relationships ● Details on the relative importance of (sets of) features
  • 56. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. What did we learn? ● Human assessments for this task – ... have relatively low agreement ● Classifying questions/answers – ... is substantially different from document classification ● Look at orthogonal feature spaces
  • 57. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Text analysis Clicks Community Relations Learning Future work Relational learning
  • 58. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08. Thank you!
  • 59. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  • 60. E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne: Finding High-Quality Content in Social Media. WSDM'08.
  • 61. ROC curve: high-quality questions Best N-grams
  • 62. ROC curve: high-quality answers Best N-grams