SlideShare a Scribd company logo
16. April 2012
                                                                                               www.know-center.at




                        Measuring the Quality of Web
                        Content using Factual
                        Information


                        WebQuality 2012 workshop
                        at WWW 2012




                     Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia
                     Cagnina, Christopher Horn, Benno Stein and Michael Granitzer
© Know-Center 2012                                            gefördert durch das Kompetenzzentrenprogramm
Agenda


Motivation
Approach
Results
Summary and Outlook




                      2

                      © Know-Center 2012
Motivation


People‘s decisions often based on Web content
  lacking quality control, no verification

           Inaccurate, incorrect infomation
           No fact checking

Measures needed to capture credibility and quality aspects
  In respect to facts!




                                                             3

                                                             © Know-Center 2012
Approach

Measure information quality based on factual information
3 Approaches:
  Use simple statistics about the facts obtained from text
  Exploit relational information contained in facts
  Use semantic relationships like meronymy and hypernymy
First approach:
  Use simple statistical features about facts in a document
  Indicates how informative a document is
  Derive facts from Web content using Open Information
   Extraction


                                                               4

                                                               © Know-Center 2012
Definition of Factual Density


Fact Count




Factual Density




                                5

                                © Know-Center 2012
Experiments


Wikipedia: 1000 Featured and Good articles versus 1000 Non-
Featured (randomly selected)
  Featured: a comprehensive coverage of the major facts in
   the context of the article’s subject
Baseline: Word Count [Blumenstock 2008]
  Featured articles longer than non-featured
  Bias: longer docs contain more facts
Evaluation: 2 Datasets
  Unbalanced: articles differ in length
  Balanced: articles similar in length

                                                              6

                                                              © Know-Center 2012
Distributions of docs in both datasets in
respect to word count




                                            7

                                            © Know-Center 2012
Precision/Recall curves of Factual Density




                                             8

                                             © Know-Center 2012
Results
Factual Density on balanced corpus




                                     9

                                     © Know-Center 2012
Experiments – Relational Features


Approach 2: exploiting relational information contained in facts
Extract relational features from articles
  Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured




                                                                   10

                                                                   © Know-Center 2012
Experiments – Relational Features


Approach 2: exploiting relational information contained in facts
Extract relational features from articles
  Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between
featured/good and non-featured




                                                                   11

                                                                   © Know-Center 2012
Summary

Simple fact related measure: Factual Density
Based on Factual Density, featured/good articles can be separated
from non-featured if article length similar
If articles differ in length, word count!  For future work,
combination of both
Plan to incorporate edit history: more editors, higher factual density
Preliminary experiments with relational features
  Promising results, more work in this direction
 Goal here is to bring semantics in to the field of Information
  Quality
 We expect this to unlock several IQ dimensions, e.g. generality
  vs specificity
                                                                   12

                                                                   © Know-Center 2012
Thank you for your attention!

          Elisabeth Lex
       elex@know-center.at




                                13

                                © Know-Center 2012

More Related Content

Similar to Measuring the Quality of Web Content using Factual Information

Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Laurie Bennett
 
Mastery of Common Core Assessments
Mastery of Common Core AssessmentsMastery of Common Core Assessments
Mastery of Common Core Assessments
School Improvement Network
 
SMX Landing Page Optimization
SMX Landing Page OptimizationSMX Landing Page Optimization
SMX Landing Page Optimization
Datalicious
 
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
eMadrid network
 
M12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part TwoM12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part Two
MER Conference
 
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
Don Day
 
Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3
Yamika Mehra
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
National Information Standards Organization (NISO)
 
Greenwich Digital Learning Share
Greenwich Digital Learning ShareGreenwich Digital Learning Share
Greenwich Digital Learning Share
EdAdvance
 
Chapter 20 Presentation
Chapter 20 Presentation Chapter 20 Presentation
Chapter 20 Presentation
LizbethKate
 
ai-one presentation
ai-one presentationai-one presentation
ai-one presentation
diggelmann
 
Community research
Community researchCommunity research
Community research
Steven Taylor
 
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
IJCI JOURNAL
 
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital LearningEnabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Julie Evans
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
Zhenv5
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
wiqei
 
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
veronicarp
 
Ideal-Analytics Product Training
Ideal-Analytics Product TrainingIdeal-Analytics Product Training
Ideal-Analytics Product Training
Yamika Mehra
 
Capstone Project
Capstone ProjectCapstone Project
Capstone Project
Digital Disciple Network
 
Module 6 - Communication and effective presentations
Module 6 - Communication and effective presentationsModule 6 - Communication and effective presentations
Module 6 - Communication and effective presentations
Paul Brown
 

Similar to Measuring the Quality of Web Content using Factual Information (20)

Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
Adjusting the Focus: Usability Study Aligns Organization Vision with Communit...
 
Mastery of Common Core Assessments
Mastery of Common Core AssessmentsMastery of Common Core Assessments
Mastery of Common Core Assessments
 
SMX Landing Page Optimization
SMX Landing Page OptimizationSMX Landing Page Optimization
SMX Landing Page Optimization
 
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
2011 11 11 (uc3m) emadrid slindstaedt kmi tug computational support for work-...
 
M12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part TwoM12S07 - Retention & ESI - Paths to Success - Part Two
M12S07 - Retention & ESI - Paths to Success - Part Two
 
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
LavaCon 2012: How to Deliver the Wrong Content to the Wrong Person at the Wro...
 
Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3Ideal-Analytics - Introduction to Version 3.3
Ideal-Analytics - Introduction to Version 3.3
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 
Greenwich Digital Learning Share
Greenwich Digital Learning ShareGreenwich Digital Learning Share
Greenwich Digital Learning Share
 
Chapter 20 Presentation
Chapter 20 Presentation Chapter 20 Presentation
Chapter 20 Presentation
 
ai-one presentation
ai-one presentationai-one presentation
ai-one presentation
 
Community research
Community researchCommunity research
Community research
 
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
 
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital LearningEnabled, Engaged, Empowered: The Student Vision for Digital Learning
Enabled, Engaged, Empowered: The Student Vision for Digital Learning
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
A Framework for Applying Quantified Self Approaches to Support Reflective Lea...
 
Ideal-Analytics Product Training
Ideal-Analytics Product TrainingIdeal-Analytics Product Training
Ideal-Analytics Product Training
 
Capstone Project
Capstone ProjectCapstone Project
Capstone Project
 
Module 6 - Communication and effective presentations
Module 6 - Communication and effective presentationsModule 6 - Communication and effective presentations
Module 6 - Communication and effective presentations
 

Recently uploaded

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 

Recently uploaded (20)

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 

Measuring the Quality of Web Content using Factual Information

  • 1. 16. April 2012 www.know-center.at Measuring the Quality of Web Content using Factual Information WebQuality 2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer © Know-Center 2012 gefördert durch das Kompetenzzentrenprogramm
  • 3. Motivation People‘s decisions often based on Web content  lacking quality control, no verification  Inaccurate, incorrect infomation  No fact checking Measures needed to capture credibility and quality aspects  In respect to facts! 3 © Know-Center 2012
  • 4. Approach Measure information quality based on factual information 3 Approaches:  Use simple statistics about the facts obtained from text  Exploit relational information contained in facts  Use semantic relationships like meronymy and hypernymy First approach:  Use simple statistical features about facts in a document  Indicates how informative a document is  Derive facts from Web content using Open Information Extraction 4 © Know-Center 2012
  • 5. Definition of Factual Density Fact Count Factual Density 5 © Know-Center 2012
  • 6. Experiments Wikipedia: 1000 Featured and Good articles versus 1000 Non- Featured (randomly selected)  Featured: a comprehensive coverage of the major facts in the context of the article’s subject Baseline: Word Count [Blumenstock 2008]  Featured articles longer than non-featured  Bias: longer docs contain more facts Evaluation: 2 Datasets  Unbalanced: articles differ in length  Balanced: articles similar in length 6 © Know-Center 2012
  • 7. Distributions of docs in both datasets in respect to word count 7 © Know-Center 2012
  • 8. Precision/Recall curves of Factual Density 8 © Know-Center 2012
  • 9. Results Factual Density on balanced corpus 9 © Know-Center 2012
  • 10. Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured 10 © Know-Center 2012
  • 11. Experiments – Relational Features Approach 2: exploiting relational information contained in facts Extract relational features from articles  Use relations from ReVerb: binary relations (e1, relation, e2) Use them to train a classifier to discriminate between featured/good and non-featured 11 © Know-Center 2012
  • 12. Summary Simple fact related measure: Factual Density Based on Factual Density, featured/good articles can be separated from non-featured if article length similar If articles differ in length, word count!  For future work, combination of both Plan to incorporate edit history: more editors, higher factual density Preliminary experiments with relational features  Promising results, more work in this direction Goal here is to bring semantics in to the field of Information Quality We expect this to unlock several IQ dimensions, e.g. generality vs specificity 12 © Know-Center 2012
  • 13. Thank you for your attention! Elisabeth Lex elex@know-center.at 13 © Know-Center 2012