Measuring the Quality of Web Content using Factual Information

609 views
500 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
609
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Measuring the Quality of Web Content using Factual Information

  1. 1. 16. April 2012 www.know-center.at Measuring the Quality of Web Content using Factual Information WebQuality 2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer© Know-Center 2012 gefördert durch das Kompetenzzentrenprogramm
  2. 2. AgendaMotivationApproachResultsSummary and Outlook 2 © Know-Center 2012
  3. 3. MotivationPeople‘s decisions often based on Web content  lacking quality control, no verification  Inaccurate, incorrect infomation  No fact checkingMeasures needed to capture credibility and quality aspects  In respect to facts! 3 © Know-Center 2012
  4. 4. ApproachMeasure information quality based on factual information3 Approaches:  Use simple statistics about the facts obtained from text  Exploit relational information contained in facts  Use semantic relationships like meronymy and hypernymyFirst approach:  Use simple statistical features about facts in a document  Indicates how informative a document is  Derive facts from Web content using Open Information Extraction 4 © Know-Center 2012
  5. 5. Definition of Factual DensityFact CountFactual Density 5 © Know-Center 2012
  6. 6. ExperimentsWikipedia: 1000 Featured and Good articles versus 1000 Non-Featured (randomly selected)  Featured: a comprehensive coverage of the major facts in the context of the article’s subjectBaseline: Word Count [Blumenstock 2008]  Featured articles longer than non-featured  Bias: longer docs contain more factsEvaluation: 2 Datasets  Unbalanced: articles differ in length  Balanced: articles similar in length 6 © Know-Center 2012
  7. 7. Distributions of docs in both datasets inrespect to word count 7 © Know-Center 2012
  8. 8. Precision/Recall curves of Factual Density 8 © Know-Center 2012
  9. 9. ResultsFactual Density on balanced corpus 9 © Know-Center 2012
  10. 10. Experiments – Relational FeaturesApproach 2: exploiting relational information contained in factsExtract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2)Use them to train a classifier to discriminate betweenfeatured/good and non-featured 10 © Know-Center 2012
  11. 11. Experiments – Relational FeaturesApproach 2: exploiting relational information contained in factsExtract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2)Use them to train a classifier to discriminate betweenfeatured/good and non-featured 11 © Know-Center 2012
  12. 12. SummarySimple fact related measure: Factual DensityBased on Factual Density, featured/good articles can be separatedfrom non-featured if article length similarIf articles differ in length, word count!  For future work,combination of bothPlan to incorporate edit history: more editors, higher factual densityPreliminary experiments with relational features  Promising results, more work in this direction Goal here is to bring semantics in to the field of Information Quality We expect this to unlock several IQ dimensions, e.g. generality vs specificity 12 © Know-Center 2012
  13. 13. Thank you for your attention! Elisabeth Lex elex@know-center.at 13 © Know-Center 2012

×