November 17th, 2011                       www.know-center.atInformation Quality inSocial MediaPresentation at UNSLElisabet...
Agenda The Know-Center The WIQ-EI project Why Information Quality on the Web? Selected Results Conclusion                 ...
The Know Center – We are...Austria’s Competence Center for Knowledge Managementand Knowledge TechnologiesLink between Scie...
The Know Center2 Areas of Research:  Knowledge Relationship Discovery:          Detecting semantic entities, semantic re...
The WIQ-EI Project - GoalsWeb Information Quality Evaluation Initiative3 Objectives:  Development of Web Content Informat...
The WIQ-EI Project - ImplementationOn a global scale:  Researcher exchanges between organisations from   European (Austri...
Agenda The Know-Center Why Information Quality on the Web? Selected Results Conclusion                                    ...
Introduction On the Web - large amount of potentially useful content   Navigating is challenging Web is changing: User Ge...
Introduction On the Web - large amount of potentially useful content   Navigating is challenging Web is changing: User Ge...
Introduction On the Web - large amount of potentially useful content   Navigating is challenging Web is changing: User Ge...
What is Information Quality?A multi-dimensional concept [Klein, 2001]Different Types of Information Quality (IQ) [Knight20...
Information Quality – Link to InformationRetrieval, Data Mining                The Information Retrieval Process          ...
Information Quality – Link to InformationRetrieval, Text Mining                                     Text Mining           ...
Information Quality – Link to InformationRetrieval, Data Mining                                                    Enables...
Information Quality – Link to InformationRetrieval, Data Mining                                                    Enables...
Information Quality – Link to InformationRetrieval, Data Mining                                     Text Mining           ...
Information Quality – Link to InformationRetrieval, Data Mining             IQ Dimensions:             - Objectivity      ...
Our work – Focus on Media DomainGoal: Assess intrinsic Information Quality in socialmedia, traditional media, arbitrary We...
Agenda The Know-Center Why Information Quality in Media Domain? Selected Results Conclusion                               ...
ResultsInformation Quality Dimension: ObjectivityTask:  Objectivity Classification in   BlogsUse features based on stylep...
Results     Information Quality Dimension: Credibility       Rank blogs by credibility           Compare blogs with credi...
ResultsWeb Genre and Quality Classification ECML/PKDD Discovery Challenge 2010    Task 1: Web Genre and Quality Facets   ...
Combined Quality Score             Use Case: Web Archival   23
Results  Web Genre and Quality Classification   Challenges:      Unbalanced and low quality training data (Training data ...
Agenda The Know-Center Why Information Quality in Social Media? Selected Results Conclusion                               ...
ConclusionsSummary Information Quality (IQ) consists of multiple dimensions Depends on Use Case   BUT: Several dimensions...
Thank you for your attention!                                27
Upcoming SlideShare
Loading in...5
×

Information Quality Assessment in the WIQ-EI EU Project

284

Published on

http://www.dirinfo.unsl.edu.ar/noticias/articulo/charla-dra-elisabeth-lex-know-center-austria.html

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
284
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Information Quality Assessment in the WIQ-EI EU Project

  1. 1. November 17th, 2011 www.know-center.atInformation Quality inSocial MediaPresentation at UNSLElisabeth Lex
  2. 2. Agenda The Know-Center The WIQ-EI project Why Information Quality on the Web? Selected Results Conclusion 2
  3. 3. The Know Center – We are...Austria’s Competence Center for Knowledge Managementand Knowledge TechnologiesLink between Science and IndustryA multi-disciplinary team of 40+ Scientists and DevelopersOver 575 publications since 2001100 Master theses, 26 Phd theses, 4 habilitationsEditors of 2 Journals: Journal of Universal KnowledgeManagement, Journal of Universal Computer ScienceOrganizer of the International Conference on KnowledgeManagement and Knowledge Technologies (I-KNOW) 3
  4. 4. The Know Center2 Areas of Research:  Knowledge Relationship Discovery:  Detecting semantic entities, semantic relations in unstructured data  Cross-language and cross-domain search and retrieval  Automatic analysis of information structure and quality  User interfaces for visual analysis of large information repositories  Knowledge Services:  Web 2.0, Collective Intelligence and Social Network Analysis  Semantic Technologies, Semantic Web, Semantic Retrieval  Communication and Collaboration Technologies  Mobile Technologies 4
  5. 5. The WIQ-EI Project - GoalsWeb Information Quality Evaluation Initiative3 Objectives:  Development of Web Content Information Quality Measures  Plagiarism Detection and Authorship Attribution  Multilingual Opinion and Sentiment Mining  Derive algorithms, tools and test data sets 5
  6. 6. The WIQ-EI Project - ImplementationOn a global scale:  Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)  Carry out research secondments, training and dissemination activites, challenges, workshops 6
  7. 7. Agenda The Know-Center Why Information Quality on the Web? Selected Results Conclusion 7
  8. 8. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media 8
  9. 9. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis 9
  10. 10. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysisInformation Quality ofSocial Media is questionable! 10
  11. 11. What is Information Quality?A multi-dimensional concept [Klein, 2001]Different Types of Information Quality (IQ) [Knight2005]E.g. [Wang1996]:  Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation  Accessibility IQ: Accessibility, Security  Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]  Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation 11
  12. 12. Information Quality – Link to InformationRetrieval, Data Mining The Information Retrieval Process 12
  13. 13. Information Quality – Link to InformationRetrieval, Text Mining Text Mining The Information Retrieval Process 13
  14. 14. Information Quality – Link to InformationRetrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction - Clustering - ... The Information Retrieval Process 14
  15. 15. Information Quality – Link to InformationRetrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction Faceted Search - Clustering - ... The Information Retrieval Process 15
  16. 16. Information Quality – Link to InformationRetrieval, Data Mining Text Mining Faceted Search The Information Retrieval Process 16
  17. 17. Information Quality – Link to InformationRetrieval, Data Mining IQ Dimensions: - Objectivity - Accuracy ... Text Mining Faceted Search The Information Retrieval Process 17
  18. 18. Our work – Focus on Media DomainGoal: Assess intrinsic Information Quality in socialmedia, traditional media, arbitrary Web contentSeveral IQ dimensions:  Objectivity  Emotionality  Credibility  Readibility  Indepth versus Shallow  Expert versus Non-Expert  Personal versus Official 18
  19. 19. Agenda The Know-Center Why Information Quality in Media Domain? Selected Results Conclusion 19
  20. 20. ResultsInformation Quality Dimension: ObjectivityTask:  Objectivity Classification in BlogsUse features based on styleproperties:Dataset: Trec Blogs08 - 83 blogs,12844 blog postsResults: Accuracy of 87% for Objectivity Classification in Blogs 20
  21. 21. Results Information Quality Dimension: Credibility Rank blogs by credibility  Compare blogs with credible source:  Quantity structure  Content similarity: Nouns, Verbs+ Adjectives Dataset: APA news articles, crawled blogs Results:  Average precision of 83% for blog credibility ranking  Correlation between quantity structures of blogs and news e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79 21[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
  22. 22. ResultsWeb Genre and Quality Classification ECML/PKDD Discovery Challenge 2010  Task 1: Web Genre and Quality Facets  News/Editorial, Educational, Discussion, Commercial, Personal /Leisure, Web Spam  Bias, Trustworthiness, Neutrality  Task 2: English Content Quality: Combination of Facets  Quality Score  Task 3: Multilingual Content Quality: German, French Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links Approach: Ensemble Classifier Approach (J48, CFC, SVM) 22
  23. 23. Combined Quality Score  Use Case: Web Archival 23
  24. 24. Results Web Genre and Quality Classification Challenges:  Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)  News and Educational hard to separate  Too few training data for German and French hosts Results:  Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)  English quality task: NDCG 0.844  Multilingual quality task: Use topic independent features from English hosts  German: NDCG 0.792  French: NDCG: 0.823 24[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
  25. 25. Agenda The Know-Center Why Information Quality in Social Media? Selected Results Conclusion 25
  26. 26. ConclusionsSummary Information Quality (IQ) consists of multiple dimensions Depends on Use Case  BUT: Several dimensions are commonly agreed upon IQ dimensions can be combined in one quality score Supervised Classification often used to assess IQ  However, training data needed! Simple style based features suited to assess IQ dimensions 26
  27. 27. Thank you for your attention! 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×