SlideShare a Scribd company logo
November 17th, 2011
                       www.know-center.at




Information Quality in
Social Media
Presentation at UNSL


Elisabeth Lex
Agenda

 The Know-Center
 The WIQ-EI project
 Why Information Quality on the Web?
 Selected Results
 Conclusion




                                       2
The Know Center – We are...

Austria’s Competence Center for Knowledge Management
and Knowledge Technologies
Link between Science and Industry
A multi-disciplinary team of 40+ Scientists and Developers
Over 575 publications since 2001
100 Master theses, 26 Phd theses, 4 habilitations
Editors of 2 Journals: Journal of Universal Knowledge
Management, Journal of Universal Computer Science
Organizer of the International Conference on Knowledge
Management and Knowledge Technologies (I-KNOW)

                                                             3
The Know Center

2 Areas of Research:
  Knowledge Relationship Discovery:

          Detecting semantic entities, semantic relations in
           unstructured data
          Cross-language and cross-domain search and retrieval
          Automatic analysis of information structure and quality
          User interfaces for visual analysis of large information
           repositories

  Knowledge Services:

          Web 2.0, Collective Intelligence and Social Network Analysis
          Semantic Technologies, Semantic Web, Semantic Retrieval
          Communication and Collaboration Technologies
          Mobile Technologies
                                                                          4
The WIQ-EI Project - Goals

Web Information Quality Evaluation Initiative
3 Objectives:
  Development of Web Content Information Quality Measures
  Plagiarism Detection and Authorship Attribution
  Multilingual Opinion and Sentiment Mining




  Derive algorithms, tools and test data sets



                                                             5
The WIQ-EI Project - Implementation


On a global scale:
  Researcher exchanges between organisations from
   European (Austria, Germany, Spain, Greece) and
   non European countries with expertise in topic
   relevant fields (Argentina, Mexico, India)
  Carry out research secondments, training and
   dissemination activites, challenges, workshops




                                                     6
Agenda

 The Know-Center
 Why Information Quality on the Web?
 Selected Results
 Conclusion




                                       7
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media




                                                           8
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media



  - Social media up to date
  - Wide audience, highly dynamic
  - Open to (almost) anyone
  - Powerful e.g. for media resonance
  analysis




                                                           9
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media



  - Social media up to date
  - Wide audience, highly dynamic
  - Open to (almost) anyone
  - Powerful e.g. for media resonance
  analysis



Information Quality of
Social Media is questionable!                              10
What is Information Quality?

A multi-dimensional concept [Klein, 2001]
Different Types of Information Quality (IQ) [Knight2005]
E.g. [Wang1996]:
  Intrinsic IQ: Accuracy, Objectivity, Believability,
   Reputation
  Accessibility IQ: Accessibility, Security
  Contextual IQ: Relevancy, Value-Added, Timeliness,
   Completness, Amount of Information, Presence of Author
   information [Katerattanakul1999]
  Representational IQ: Interpretability, Ease of
   Understanding, Concise Representation, Consistent
   Representation                                           11
Information Quality – Link to Information
Retrieval, Data Mining




                The Information Retrieval Process




                                                    12
Information Quality – Link to Information
Retrieval, Text Mining




                                     Text Mining




                The Information Retrieval Process




                                                    13
Information Quality – Link to Information
Retrieval, Data Mining


                                                    Enables to retrieve core
                                                    information from
                                                    unstructured text
                                     Text Mining    -   Information Extraction
                                                    -   Clustering
                                                    -   ...




                The Information Retrieval Process




                                                                      14
Information Quality – Link to Information
Retrieval, Data Mining


                                                    Enables to retrieve core
                                                    information from
                                                    unstructured text
                                     Text Mining    -   Information Extraction
           Faceted Search                           -   Clustering
                                                    -   ...




                The Information Retrieval Process




                                                                      15
Information Quality – Link to Information
Retrieval, Data Mining




                                     Text Mining
           Faceted Search




                The Information Retrieval Process




                                                    16
Information Quality – Link to Information
Retrieval, Data Mining
             IQ Dimensions:
             - Objectivity
             - Accuracy
             ...                      Text Mining
           Faceted Search




                 The Information Retrieval Process




                                                     17
Our work – Focus on Media Domain

Goal: Assess intrinsic Information Quality in social
media, traditional media, arbitrary Web content
Several IQ dimensions:
  Objectivity
  Emotionality
  Credibility
  Readibility
  Indepth versus Shallow
  Expert versus Non-Expert
  Personal versus Official
                                                       18
Agenda

 The Know-Center
 Why Information Quality in Media Domain?
 Selected Results
 Conclusion




                                            19
Results
Information Quality Dimension: Objectivity

Task:
  Objectivity Classification in
   Blogs
Use features based on style
properties:
Dataset: Trec Blogs08 - 83 blogs,
12844 blog posts



Results:
 Accuracy of 87% for Objectivity
  Classification in Blogs




                                             20
Results
     Information Quality Dimension: Credibility

       Rank blogs by credibility
           Compare blogs with credible source:

                        Quantity structure
                        Content similarity: Nouns, Verbs+ Adjectives


       Dataset: APA news articles, crawled blogs


       Results:
           Average precision of 83% for blog credibility ranking
           Correlation between quantity structures of blogs and news
                e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79
                                                                                                                       21

[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
Results
Web Genre and Quality Classification

 ECML/PKDD Discovery Challenge 2010
    Task 1: Web Genre and Quality Facets

             News/Editorial, Educational, Discussion, Commercial,
              Personal/Leisure, Web Spam
             Bias, Trustworthiness, Neutrality

    Task 2: English Content Quality: Combination of Facets 
     Quality Score
    Task 3: Multilingual Content Quality: German, French

 Dataset: English, German, French Web hosts: NLP Features,
 Content Features, Terms, Links

 Approach: Ensemble Classifier Approach (J48, CFC, SVM)

                                                                     22
Combined Quality Score




             Use Case: Web Archival   23
Results
  Web Genre and Quality Classification
   Challenges:
      Unbalanced and low quality training data (Training data contained
       also Hungarian, Czech,.. Hosts)
      News and Educational hard to separate
      Too few training data for German and French hosts

   Results:
      Methods performs best for Educational/Research (NDCG 0.688),
       Commercial (0.694), and Personal/Leisure (0.583)
      English quality task: NDCG 0.844
      Multilingual quality task: Use topic independent features from English
       hosts

                   German: NDCG 0.792
                   French: NDCG: 0.823                                                                          24

[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
Agenda

 The Know-Center
 Why Information Quality in Social Media?
 Selected Results
 Conclusion




                                            25
Conclusions
Summary

 Information Quality (IQ) consists of multiple dimensions
 Depends on Use Case
   BUT: Several dimensions are commonly agreed
    upon
 IQ dimensions can be combined in one quality score
 Supervised Classification often used to assess IQ
   However, training data needed!
 Simple style based features suited to assess IQ
 dimensions
                                                            26
Thank you for your attention!




                                27

More Related Content

What's hot

Digitální kompetence
Digitální kompetenceDigitální kompetence
Digitální kompetence
Michal Černý
 
Lecture semantic lifting_presentation
Lecture semantic lifting_presentationLecture semantic lifting_presentation
Lecture semantic lifting_presentation
IKS - Project
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit system
Marco Grassi
 
Orsi PersDB11
Orsi PersDB11Orsi PersDB11
Orsi PersDB11
Giorgio Orsi
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
DatiGovIT
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
SEO CAMP
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
Meena Nagarajan
 
What Apache Stanbol Can Do for You
What Apache Stanbol Can Do for YouWhat Apache Stanbol Can Do for You
What Apache Stanbol Can Do for You
Fabian Christ
 
Introduction to the IKS 7.0 Technology Stack
Introduction to the IKS 7.0 Technology StackIntroduction to the IKS 7.0 Technology Stack
Introduction to the IKS 7.0 Technology Stack
Fabian Christ
 
Archives on the Web and users expectations: towards a convergence with digita...
Archives on the Web and users expectations: towards a convergence with digita...Archives on the Web and users expectations: towards a convergence with digita...
Archives on the Web and users expectations: towards a convergence with digita...
Pierluigi Feliciati
 
Knowledge management and knowledge sharing
Knowledge management and knowledge sharingKnowledge management and knowledge sharing
Knowledge management and knowledge sharing
Hazel Hall
 

What's hot (11)

Digitální kompetence
Digitální kompetenceDigitální kompetence
Digitální kompetence
 
Lecture semantic lifting_presentation
Lecture semantic lifting_presentationLecture semantic lifting_presentation
Lecture semantic lifting_presentation
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit system
 
Orsi PersDB11
Orsi PersDB11Orsi PersDB11
Orsi PersDB11
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
What Apache Stanbol Can Do for You
What Apache Stanbol Can Do for YouWhat Apache Stanbol Can Do for You
What Apache Stanbol Can Do for You
 
Introduction to the IKS 7.0 Technology Stack
Introduction to the IKS 7.0 Technology StackIntroduction to the IKS 7.0 Technology Stack
Introduction to the IKS 7.0 Technology Stack
 
Archives on the Web and users expectations: towards a convergence with digita...
Archives on the Web and users expectations: towards a convergence with digita...Archives on the Web and users expectations: towards a convergence with digita...
Archives on the Web and users expectations: towards a convergence with digita...
 
Knowledge management and knowledge sharing
Knowledge management and knowledge sharingKnowledge management and knowledge sharing
Knowledge management and knowledge sharing
 

Similar to Information Quality Assessment in the WIQ-EI EU Project

Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
Boonlert Aroonpiboon
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
Artificial Intelligence Institute at UofSC
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
Dhaval Thakker
 
Provenance and Trust
Provenance and TrustProvenance and Trust
Provenance and Trust
Jose Manuel Gómez-Pérez
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
Pier Luca Lanzi
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
semanticsconference
 
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine  FP7 Call3 presentationiDiff 2008 conference #09 IP-Racine  FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
Benoit Michel
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your Information
Intergen
 
Rhk38
Rhk38Rhk38
Rhk38
rajib2
 
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
Think Latin America
 
Case Study: Building a Wiki
Case Study: Building a WikiCase Study: Building a Wiki
Case Study: Building a Wiki
Goodmind
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
New York University
 
Exploring the Information Ecosystem
Exploring the Information EcosystemExploring the Information Ecosystem
Exploring the Information Ecosystem
Rob Hanna, ECMs
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Fabrizio Orlandi
 
B-S-S Context Aware Information Access
B-S-S  Context Aware Information AccessB-S-S  Context Aware Information Access
B-S-S Context Aware Information Access
B-S-S Business Software Solutions GmbH
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
Willard Van De Bogart
 
TCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data WhodiniTCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data Whodini
Tata Consultancy Services
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
Università degli Studi di Milano-Bicocca
 
Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1
Dave King
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
Lee Dirks
 

Similar to Information Quality Assessment in the WIQ-EI EU Project (20)

Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Provenance and Trust
Provenance and TrustProvenance and Trust
Provenance and Trust
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine  FP7 Call3 presentationiDiff 2008 conference #09 IP-Racine  FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your Information
 
Rhk38
Rhk38Rhk38
Rhk38
 
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
 
Case Study: Building a Wiki
Case Study: Building a WikiCase Study: Building a Wiki
Case Study: Building a Wiki
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Exploring the Information Ecosystem
Exploring the Information EcosystemExploring the Information Ecosystem
Exploring the Information Ecosystem
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
B-S-S Context Aware Information Access
B-S-S  Context Aware Information AccessB-S-S  Context Aware Information Access
B-S-S Context Aware Information Access
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
TCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data WhodiniTCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data Whodini
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1Mining and analyzing social media hicss 45 tutorial – part 1
Mining and analyzing social media hicss 45 tutorial – part 1
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Information Quality Assessment in the WIQ-EI EU Project

  • 1. November 17th, 2011 www.know-center.at Information Quality in Social Media Presentation at UNSL Elisabeth Lex
  • 2. Agenda The Know-Center The WIQ-EI project Why Information Quality on the Web? Selected Results Conclusion 2
  • 3. The Know Center – We are... Austria’s Competence Center for Knowledge Management and Knowledge Technologies Link between Science and Industry A multi-disciplinary team of 40+ Scientists and Developers Over 575 publications since 2001 100 Master theses, 26 Phd theses, 4 habilitations Editors of 2 Journals: Journal of Universal Knowledge Management, Journal of Universal Computer Science Organizer of the International Conference on Knowledge Management and Knowledge Technologies (I-KNOW) 3
  • 4. The Know Center 2 Areas of Research:  Knowledge Relationship Discovery:  Detecting semantic entities, semantic relations in unstructured data  Cross-language and cross-domain search and retrieval  Automatic analysis of information structure and quality  User interfaces for visual analysis of large information repositories  Knowledge Services:  Web 2.0, Collective Intelligence and Social Network Analysis  Semantic Technologies, Semantic Web, Semantic Retrieval  Communication and Collaboration Technologies  Mobile Technologies 4
  • 5. The WIQ-EI Project - Goals Web Information Quality Evaluation Initiative 3 Objectives:  Development of Web Content Information Quality Measures  Plagiarism Detection and Authorship Attribution  Multilingual Opinion and Sentiment Mining  Derive algorithms, tools and test data sets 5
  • 6. The WIQ-EI Project - Implementation On a global scale:  Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)  Carry out research secondments, training and dissemination activites, challenges, workshops 6
  • 7. Agenda The Know-Center Why Information Quality on the Web? Selected Results Conclusion 7
  • 8. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media 8
  • 9. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis 9
  • 10. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis Information Quality of Social Media is questionable! 10
  • 11. What is Information Quality? A multi-dimensional concept [Klein, 2001] Different Types of Information Quality (IQ) [Knight2005] E.g. [Wang1996]:  Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation  Accessibility IQ: Accessibility, Security  Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]  Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation 11
  • 12. Information Quality – Link to Information Retrieval, Data Mining The Information Retrieval Process 12
  • 13. Information Quality – Link to Information Retrieval, Text Mining Text Mining The Information Retrieval Process 13
  • 14. Information Quality – Link to Information Retrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction - Clustering - ... The Information Retrieval Process 14
  • 15. Information Quality – Link to Information Retrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction Faceted Search - Clustering - ... The Information Retrieval Process 15
  • 16. Information Quality – Link to Information Retrieval, Data Mining Text Mining Faceted Search The Information Retrieval Process 16
  • 17. Information Quality – Link to Information Retrieval, Data Mining IQ Dimensions: - Objectivity - Accuracy ... Text Mining Faceted Search The Information Retrieval Process 17
  • 18. Our work – Focus on Media Domain Goal: Assess intrinsic Information Quality in social media, traditional media, arbitrary Web content Several IQ dimensions:  Objectivity  Emotionality  Credibility  Readibility  Indepth versus Shallow  Expert versus Non-Expert  Personal versus Official 18
  • 19. Agenda The Know-Center Why Information Quality in Media Domain? Selected Results Conclusion 19
  • 20. Results Information Quality Dimension: Objectivity Task:  Objectivity Classification in Blogs Use features based on style properties: Dataset: Trec Blogs08 - 83 blogs, 12844 blog posts Results:  Accuracy of 87% for Objectivity Classification in Blogs 20
  • 21. Results Information Quality Dimension: Credibility Rank blogs by credibility  Compare blogs with credible source:  Quantity structure  Content similarity: Nouns, Verbs+ Adjectives Dataset: APA news articles, crawled blogs Results:  Average precision of 83% for blog credibility ranking  Correlation between quantity structures of blogs and news e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79 21 [Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
  • 22. Results Web Genre and Quality Classification ECML/PKDD Discovery Challenge 2010  Task 1: Web Genre and Quality Facets  News/Editorial, Educational, Discussion, Commercial, Personal/Leisure, Web Spam  Bias, Trustworthiness, Neutrality  Task 2: English Content Quality: Combination of Facets  Quality Score  Task 3: Multilingual Content Quality: German, French Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links Approach: Ensemble Classifier Approach (J48, CFC, SVM) 22
  • 23. Combined Quality Score  Use Case: Web Archival 23
  • 24. Results Web Genre and Quality Classification Challenges:  Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)  News and Educational hard to separate  Too few training data for German and French hosts Results:  Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)  English quality task: NDCG 0.844  Multilingual quality task: Use topic independent features from English hosts  German: NDCG 0.792  French: NDCG: 0.823 24 [Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
  • 25. Agenda The Know-Center Why Information Quality in Social Media? Selected Results Conclusion 25
  • 26. Conclusions Summary Information Quality (IQ) consists of multiple dimensions Depends on Use Case  BUT: Several dimensions are commonly agreed upon IQ dimensions can be combined in one quality score Supervised Classification often used to assess IQ  However, training data needed! Simple style based features suited to assess IQ dimensions 26
  • 27. Thank you for your attention! 27