Information Quality Assessment in the WIQ-EI EU Project
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Information Quality Assessment in the WIQ-EI EU Project

  • 394 views
Uploaded on

http://www.dirinfo.unsl.edu.ar/noticias/articulo/charla-dra-elisabeth-lex-know-center-austria.html

http://www.dirinfo.unsl.edu.ar/noticias/articulo/charla-dra-elisabeth-lex-know-center-austria.html

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
394
On Slideshare
394
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. November 17th, 2011 www.know-center.atInformation Quality inSocial MediaPresentation at UNSLElisabeth Lex
  • 2. Agenda The Know-Center The WIQ-EI project Why Information Quality on the Web? Selected Results Conclusion 2
  • 3. The Know Center – We are...Austria’s Competence Center for Knowledge Managementand Knowledge TechnologiesLink between Science and IndustryA multi-disciplinary team of 40+ Scientists and DevelopersOver 575 publications since 2001100 Master theses, 26 Phd theses, 4 habilitationsEditors of 2 Journals: Journal of Universal KnowledgeManagement, Journal of Universal Computer ScienceOrganizer of the International Conference on KnowledgeManagement and Knowledge Technologies (I-KNOW) 3
  • 4. The Know Center2 Areas of Research:  Knowledge Relationship Discovery:  Detecting semantic entities, semantic relations in unstructured data  Cross-language and cross-domain search and retrieval  Automatic analysis of information structure and quality  User interfaces for visual analysis of large information repositories  Knowledge Services:  Web 2.0, Collective Intelligence and Social Network Analysis  Semantic Technologies, Semantic Web, Semantic Retrieval  Communication and Collaboration Technologies  Mobile Technologies 4
  • 5. The WIQ-EI Project - GoalsWeb Information Quality Evaluation Initiative3 Objectives:  Development of Web Content Information Quality Measures  Plagiarism Detection and Authorship Attribution  Multilingual Opinion and Sentiment Mining  Derive algorithms, tools and test data sets 5
  • 6. The WIQ-EI Project - ImplementationOn a global scale:  Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)  Carry out research secondments, training and dissemination activites, challenges, workshops 6
  • 7. Agenda The Know-Center Why Information Quality on the Web? Selected Results Conclusion 7
  • 8. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media 8
  • 9. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis 9
  • 10. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysisInformation Quality ofSocial Media is questionable! 10
  • 11. What is Information Quality?A multi-dimensional concept [Klein, 2001]Different Types of Information Quality (IQ) [Knight2005]E.g. [Wang1996]:  Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation  Accessibility IQ: Accessibility, Security  Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]  Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation 11
  • 12. Information Quality – Link to InformationRetrieval, Data Mining The Information Retrieval Process 12
  • 13. Information Quality – Link to InformationRetrieval, Text Mining Text Mining The Information Retrieval Process 13
  • 14. Information Quality – Link to InformationRetrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction - Clustering - ... The Information Retrieval Process 14
  • 15. Information Quality – Link to InformationRetrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction Faceted Search - Clustering - ... The Information Retrieval Process 15
  • 16. Information Quality – Link to InformationRetrieval, Data Mining Text Mining Faceted Search The Information Retrieval Process 16
  • 17. Information Quality – Link to InformationRetrieval, Data Mining IQ Dimensions: - Objectivity - Accuracy ... Text Mining Faceted Search The Information Retrieval Process 17
  • 18. Our work – Focus on Media DomainGoal: Assess intrinsic Information Quality in socialmedia, traditional media, arbitrary Web contentSeveral IQ dimensions:  Objectivity  Emotionality  Credibility  Readibility  Indepth versus Shallow  Expert versus Non-Expert  Personal versus Official 18
  • 19. Agenda The Know-Center Why Information Quality in Media Domain? Selected Results Conclusion 19
  • 20. ResultsInformation Quality Dimension: ObjectivityTask:  Objectivity Classification in BlogsUse features based on styleproperties:Dataset: Trec Blogs08 - 83 blogs,12844 blog postsResults: Accuracy of 87% for Objectivity Classification in Blogs 20
  • 21. Results Information Quality Dimension: Credibility Rank blogs by credibility  Compare blogs with credible source:  Quantity structure  Content similarity: Nouns, Verbs+ Adjectives Dataset: APA news articles, crawled blogs Results:  Average precision of 83% for blog credibility ranking  Correlation between quantity structures of blogs and news e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79 21[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
  • 22. ResultsWeb Genre and Quality Classification ECML/PKDD Discovery Challenge 2010  Task 1: Web Genre and Quality Facets  News/Editorial, Educational, Discussion, Commercial, Personal /Leisure, Web Spam  Bias, Trustworthiness, Neutrality  Task 2: English Content Quality: Combination of Facets  Quality Score  Task 3: Multilingual Content Quality: German, French Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links Approach: Ensemble Classifier Approach (J48, CFC, SVM) 22
  • 23. Combined Quality Score  Use Case: Web Archival 23
  • 24. Results Web Genre and Quality Classification Challenges:  Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)  News and Educational hard to separate  Too few training data for German and French hosts Results:  Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)  English quality task: NDCG 0.844  Multilingual quality task: Use topic independent features from English hosts  German: NDCG 0.792  French: NDCG: 0.823 24[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
  • 25. Agenda The Know-Center Why Information Quality in Social Media? Selected Results Conclusion 25
  • 26. ConclusionsSummary Information Quality (IQ) consists of multiple dimensions Depends on Use Case  BUT: Several dimensions are commonly agreed upon IQ dimensions can be combined in one quality score Supervised Classification often used to assess IQ  However, training data needed! Simple style based features suited to assess IQ dimensions 26
  • 27. Thank you for your attention! 27