Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Capturing the Ineffable:
Collecting, Analysing, and Automating
Web Document Quality Assessments
Davide Ceolin, Julia Noord...
• Introduction
• Nichesourcing Web Document Quality Assessments
• User studies
• Conclusion and Future Work
Outlin
e
Captu...
Introduction
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
Web Document Quality Assessment
• Source criticism
• Methodological practice from the humanities
• e.g., from the American...
Web Document Quality Assessment
What is the quality of each of these documents?
Capturing the Ineffable: Collecting, Analy...
• We adapt source criticism to Web documents &
aim at automating the process of quality
estimation by:
• Gathering quality...
Objectives
• Analyse the consistency of quality assessments.
• Are quality assessments consistent among users, over
time, ...
Nichesourcing Web
Document Quality
Assessments
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document...
• Dataset: documents about vaccinations
• Initially, 50 docs, various sources (blogs, authorities, etc.)
• Features
• Info...
• Setup:
• 6 documents per participant.
• Random selection.
• Even distribution of assessments.
• Scenario:
Suppose you ar...
• Documents are anonymized.
• Users choose documents that meet their quality
criteria based on features only.
• All featur...
• Read each of the 6 articles.
• Assess it.
• Rate completeness, accuracy, etc.
• Likert scale 1-5.
• Annotate the article...
User Studies
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality
Assessments
• User Study 1
• Participants: 20 last-year UvA journalism
students.
• Duration: 60’.
• User Study 2
• Participants: 20 RM...
• Data collected:
• 104 (US1) + 47 (US2) assessments.
• 238 (US1) + 89 (US2) annotations.
• No significant difference betw...
• Highest correlation with overall quality:
• Accuracy
• Trustworthiness
• Precision
• Completeness
• Given the task at ha...
Conclusion
Capturing the Ineffable
• We collected Web document quality
assessments.
• WebQ – Nichesourcing application.
• 2 user studies with experts.
• Clea...
• We plan to and are currently working on:
• Extending the dataset (currently ~1,500 documents).
• Scaling up the experime...
https://qupid-project.net/
d.ceolin@vu.nl
Thank you!
Capturing the Ineffable: Collecting, Analysing, and Automating Web Do...
Upcoming SlideShare
Loading in …5
×

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

805 views

Published on

Presentation at EKAW 2016.

Published in: Data & Analytics
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website! http://bit.ly/resumpro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

  1. 1. Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments Davide Ceolin, Julia Noordegraaf, Lora Aroyo
  2. 2. • Introduction • Nichesourcing Web Document Quality Assessments • User studies • Conclusion and Future Work Outlin e Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  3. 3. Introduction Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  4. 4. Web Document Quality Assessment • Source criticism • Methodological practice from the humanities • e.g., from the American Library Association: • How was the source located? • What type of source is it? • Who is the author and what are the qualifications of the author in regard to the topic that is discussed? • When was the information published? • In which country was it published? • What is the reputation of the publisher? • Does the source show a particular cultural or political bias?. • How does it apply to Web sources? Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  5. 5. Web Document Quality Assessment What is the quality of each of these documents? Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments Authoritative source ✓ Accurate ✓ Precise ✓ Complete ✓ Neutral (?) Blog Post (?) Accurate (?) Precise (?) Complete (?) Neutral ✗
  6. 6. • We adapt source criticism to Web documents & aim at automating the process of quality estimation by: • Gathering quality assessments (mostly from experts). • Looking for markers (document features) that correlate with them. Quality and Quality Markers Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  7. 7. Objectives • Analyse the consistency of quality assessments. • Are quality assessments consistent among users, over time, etc.? • Analyse user ability to interpret document features. • Can the users estimate the quality of a document from its sentiment or trustworthiness level? • Analyse the predictability of quality assessments. • Can we automatically estimate the quality of a document? Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  8. 8. Nichesourcing Web Document Quality Assessments Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  9. 9. • Dataset: documents about vaccinations • Initially, 50 docs, various sources (blogs, authorities, etc.) • Features • Information (automatically) extracted from documents using AlchemyAPI & Web of Trust. • Entities, Topics, Sentiment, Emotions, Trustworthiness. • Quality dimensions • Overall quality, accuracy, completeness, precision, trustworthiness, readability, neutrality. Dataset, Features, and Quality Dimensions Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  10. 10. • Setup: • 6 documents per participant. • Random selection. • Even distribution of assessments. • Scenario: Suppose you are asked to write an article about debate on vaccinations triggered by the measles outbreak in 2015 at Disneyland in California. WebQ: Nichesourcing Web Quality Assessments Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  11. 11. • Documents are anonymized. • Users choose documents that meet their quality criteria based on features only. • All feature values are shown, alone and together. WebQ: Task 1 Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  12. 12. • Read each of the 6 articles. • Assess it. • Rate completeness, accuracy, etc. • Likert scale 1-5. • Annotate the article to explain the ratings • Articles are proxied & annotated through AnnotatorJS. WebQ: Task 2 Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  13. 13. User Studies Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  14. 14. • User Study 1 • Participants: 20 last-year UvA journalism students. • Duration: 60’. • User Study 2 • Participants: 20 RMA media scholars. • Duration: 45’. • Improvements (learnt from user study 1). Setup Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  15. 15. • Data collected: • 104 (US1) + 47 (US2) assessments. • 238 (US1) + 89 (US2) annotations. • No significant difference between Use Cases (Wilcoxon signed-rank test). • Assessments are assimilable. • Assessment predictability (SVC) • Up to 63% accuracy (5-classes) • Up to 89% accuracy (2-classes) • Promising predictability. We will try other algorithms. Results Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  16. 16. • Highest correlation with overall quality: • Accuracy • Trustworthiness • Precision • Completeness • Given the task at hand, neutrality is not relevant. • Weak correlation task 1 - overall quality (task 2). • Users were mostly unable to interpret those features. Results Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  17. 17. Conclusion Capturing the Ineffable
  18. 18. • We collected Web document quality assessments. • WebQ – Nichesourcing application. • 2 user studies with experts. • Clear defined task. • Controlled dataset. • We analysed the assessments, and automated their prediction. • The task matters more than subjectivity. • Assessments are quite uniform and coherent. • Features in isolation are not very meaningful. • The application setup is important. Conclusion Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  19. 19. • We plan to and are currently working on: • Extending the dataset (currently ~1,500 documents). • Scaling up the experiments and gathering more assessments. • Involving laymen via crowdsourcing. • Extending the analyses. • Utilising other automated reasoning approaches. (Current and) Future Work Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments
  20. 20. https://qupid-project.net/ d.ceolin@vu.nl Thank you! Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

×