Measuring the Quality of Web Content using Factual Information
Upcoming SlideShare
Loading in...5
×
 

Measuring the Quality of Web Content using Factual Information

on

  • 460 views

 

Statistics

Views

Total Views
460
Views on SlideShare
458
Embed Views
2

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Measuring the Quality of Web Content using Factual Information Measuring the Quality of Web Content using Factual Information Presentation Transcript

    • 16. April 2012 www.know-center.at Measuring the Quality of Web Content using Factual Information WebQuality 2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer© Know-Center 2012 gefördert durch das Kompetenzzentrenprogramm
    • AgendaMotivationApproachResultsSummary and Outlook 2 © Know-Center 2012
    • MotivationPeople‘s decisions often based on Web content  lacking quality control, no verification  Inaccurate, incorrect infomation  No fact checkingMeasures needed to capture credibility and quality aspects  In respect to facts! 3 © Know-Center 2012
    • ApproachMeasure information quality based on factual information3 Approaches:  Use simple statistics about the facts obtained from text  Exploit relational information contained in facts  Use semantic relationships like meronymy and hypernymyFirst approach:  Use simple statistical features about facts in a document  Indicates how informative a document is  Derive facts from Web content using Open Information Extraction 4 © Know-Center 2012
    • Definition of Factual DensityFact CountFactual Density 5 © Know-Center 2012
    • ExperimentsWikipedia: 1000 Featured and Good articles versus 1000 Non-Featured (randomly selected)  Featured: a comprehensive coverage of the major facts in the context of the article’s subjectBaseline: Word Count [Blumenstock 2008]  Featured articles longer than non-featured  Bias: longer docs contain more factsEvaluation: 2 Datasets  Unbalanced: articles differ in length  Balanced: articles similar in length 6 © Know-Center 2012
    • Distributions of docs in both datasets inrespect to word count 7 © Know-Center 2012
    • Precision/Recall curves of Factual Density 8 © Know-Center 2012
    • ResultsFactual Density on balanced corpus 9 © Know-Center 2012
    • Experiments – Relational FeaturesApproach 2: exploiting relational information contained in factsExtract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2)Use them to train a classifier to discriminate betweenfeatured/good and non-featured 10 © Know-Center 2012
    • Experiments – Relational FeaturesApproach 2: exploiting relational information contained in factsExtract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2)Use them to train a classifier to discriminate betweenfeatured/good and non-featured 11 © Know-Center 2012
    • SummarySimple fact related measure: Factual DensityBased on Factual Density, featured/good articles can be separatedfrom non-featured if article length similarIf articles differ in length, word count!  For future work,combination of bothPlan to incorporate edit history: more editors, higher factual densityPreliminary experiments with relational features  Promising results, more work in this direction Goal here is to bring semantics in to the field of Information Quality We expect this to unlock several IQ dimensions, e.g. generality vs specificity 12 © Know-Center 2012
    • Thank you for your attention! Elisabeth Lex elex@know-center.at 13 © Know-Center 2012