SlideShare a Scribd company logo
1 of 27
Download to read offline
WikiSym 2012




  Mutual Evaluation of Editors and Texts
for Assessing Quality of Wikipedia Articles

  Yu Suzuki          Nagoya University, Japan
  Masatoshi Yoshikawa Kyoto University, Japan


                       1
Have you ever use Wikipedia?
                      1.0

                                                            Wikipedia                                 blog

                      0.8
Percentage Usage of




                      0.6



                      0.4



                      0.2
                                                                              Oxford university - SPIRE Project
                                                                        Results and analysis of Web2.0 services survey
                                                                                 http://spire.conted.ox.ac.uk/

                       0
                            -18   18-24   25-34   35-44   45-54    55-64                    65-74
                                              Age (years old)
                                                    2
Have you ever use Wikipedia?
                      1.0

                                                            Wikipedia                                 blog

                      0.8
Percentage Usage of




                      0.6



                      0.4

                            Less than 18 years and more than 65 years old users
                      0.2   = novice users
                            use Wikipedia frequently.                         Oxford university - SPIRE Project
                                                                        Results and analysis of Web2.0 services survey
                                                                                 http://spire.conted.ox.ac.uk/

                       0
                            -18   18-24   25-34   35-44   45-54    55-64                    65-74
                                              Age (years old)
                                                    2
What is the main purpose?

                  56% of users use
                for work and study.




                    But really?

            3
What is the main purpose?
           Never heard
               8%
                         For Work
  Never used
     8%                    20%               56% of users use
                                           for work and study.

                                         Wikipedia is trusted by
For Fun                                       many users.
  28%
                         For Study
                           36%
                                               But really?

                                     3
Are Wikipedia articles high quality?
                    7000.00




                                  80% of
                                         all artic
                    5250.00




                                                   les are
                                                           low qua
# of Articles



                                                                  l i t y.
            値タイトル




                    3500.00




                    1750.00




                         0
                              1




                                          Quality degree
                                              カテゴリタイトル




                          low                                          high
                                                     4
                                                       (calculated using our proposed method)
Objectives

• Calculate   quality values for articles automatically, accurately.

  •   For readers: Readers may believe which articles are high quality or not.

        → Readers can assume which articles are high quality.

  •   For editors: Editors can decide which articles need to be edited.

  •   For administrators: Administrators can decide which articles are not
      appropriate for Wikipedia, for keeping the quality of articles.


                                         5
Output of Our proposed system


                              Quality Value: 40%




          High quality part
          Low quality part



                 6
What is quality?

  From Dictionary
   【Quality】the degree of excellence of something
   【Credibility】the quality of being treated and believed in
  From Psychology (Fogg 2003)
   Trustworthiness: How many users believe something
   Expertise: expert’s opinion




We use “trustworthiness” as the definition of quality
 Quality is not True or False but How many users
                      believe.
                                7
Related Work


Link analysisquality articles using[Bellomi 2005, Chin 2011]
  Identify high based method HITS, PageRank.
 This method can easily identify major articles, but cannot identify minor but high
 quality articles.
Using editor reputation [Adler2007, Wiklinson 2007]
 We use this method.
 Identify which articles are high quality using reputation of editors by editors
 themselves
Good Point: These methods can calculate accurate quality.
 Because, editors or viewers do not directly decide text quality.
Bad Point: Vandals (bad editor) can easily change text quality.




                                                                                      8
Plan for Calculating Quality
               Who evaluate?
               ・reader (voting)
                        Who evaluate?
               ・reader themselves (personalization)
               ・editor (reputation-based)



What quality we measure?                    How to evaluate?
・whole article
  What quality we                           ・reader’s voting
・a part measure?
        of article                            How to evaluate?
                                            ・article analysis
・editor                                     ・article edit history

                                  9
Plan for Calculating Quality
               Who evaluate?
               ・reader (voting)
               ・reader themselves (personalization)
               ・editor (reputation-based)



What quality we measure?                    How to evaluate?
・whole article
  What quality we                           ・reader’s voting
・a part measure?
        of article                            How to evaluate?
                                            ・article analysis
・editor                                     ・article edit history

                                  9
Plan for Calculating Quality
                Who evaluate?
                ・reader (voting)
                ・reader themselves (personalization)
                ・editor (reputation-based)



What quality we measure?                     How to evaluate?
・whole article                               ・reader’s voting
・a part of article                             How to evaluate?
                                             ・article analysis
・editor                                      ・article edit history

                                   9
Plan for Calculating Quality
                Who evaluate?
                ・reader (voting)
                ・reader themselves (personalization)
                ・editor (reputation-based)



What quality we measure?                     How to evaluate?
・whole article                               ・reader’s voting
・a part of article                           ・article analysis
・editor                                      ・article edit history

                                   9
Plan to Measure Quality
• Why   we use reputation-based approach?

 • Users   voting are not always true.

   • In YouTube, almost   all votes are 5 stars (highest scores).

• Why   we calculate editor’s quality?

 • We    assume that same editor writes same quality of articles.

• Why   we use edit history?

 • Our   proposed system should language independent.
                                         10
Overview
           Quality degree 55%
                                5.         1. Identify editors of articles.
                                           2. Get edit history of each editor.
                                           3. Calculate text’s Quality Value; QV.
                                           4. Calculate editor’s QV.
                                           5. Calculate article’s QV.

                                                                              QV of        = 70%
                                                                              QV of        = 40%
                                                           QV= 60%
Editor:A                        Edit history                                          4.
           Editor:B                                           3.
      1.                              2.
                                               11
Key Idea

                       High quality texts survive beyond
          Editor A     multiple edits
add
                       ・if a text remain - QV of the text ↑
          Editor B     ・if a text is deleted - QV of the text ↓
 delete

          Editor C


                        12
Calculate Text’s quality values
                                                               •A   writes 100 letters
                     write A
                                                                 • Texts   of A do not gain QV
                     100
               100

                                 80
                                      deleted by B               •A    cannot evaluate A herself
                75         20↓
                                                               •B   deletes 20 letters
                                          deleted by C
# of letters




                                                                 •B    remain A’s 80 letters
                50
                                                                 •B    evaluate A’s 80 letters is good
                25
                                                               •C   deletes 60 letters
                                        60↓
                                              20     20


                                                                 •C    remain A’s 20 letters
                 0                                               •C    evaluate A’s 20 letters is good
                     1           2            3      4
                             version number
                                                               • A’s   text QV = log80 + log20
                                                          13
Problem
• Editor’s   quality is not considered.

                            •C   deletes A’s text.
                        A
                             • A’s    QV decreases.
      add
                        B    • If    C has low quality, C may delete high quality texts.

        delete                   •   A’s QV should NOT be decreased.

                        C    •   If C has high quality, C should delete low quality texts.

                                 •   A’s QV should be decreased.

                                              14
Use editor’s QV for text’s QV
                     write A
                                                  without editor’s QV
                     100
                     100
                                                  with editor’s QV      • If   B’s QV is 100%
               100

                                 80
                                 80
                                      deleted by B                           •B   should deletes low quality texts.
                75         20↓
                                                                             • A’s   text is deleted 25 letters by B.
                                          deleted by C
# of letters




                                                                        • If   C’s QV is 50%
                                              50               50
                50

                                                                             •C  may delete 50% of high quality
                25
                                        60↓                                    texts.
                                              20               20




                 0                                                           • A’stext is deleted 30 letters
                     1           2            3                 4              (60 letters × 50%) by C.
                             version number
                                                                        15
Chicken-or-the-egg problem
                                                                    QV of        = 70%
                                                                    QV of        = 40%
                                                      = 60%
Editor:A                       Edit history                                 4.
                 Editor:B                              3. use ’s CV
           1.                        2.

• Text’s   QV is calculated by both edit history and editor’s QV.

• Editor’s   QV is calculated by text’s QV.

 • Editor’s     QV ⇆ text’s QV are a chicken-or-the-egg problem.


       Mutually calculate editor’s and text’s QV until converge.
                                              16
Our proposed method
1. Identify editors of articles.
2. Get edit history of each editor.
3. Calculate Text’s QV using editor’s QV.
  • When first time, all editor’s QV is considered as 1 (highest value).
4. Calculate editor’s QV.
5. If text’s QVs and editor’s QVs are not converged, return 3.
6. Calculate article’s QV.

                                                                                    QV of        = 70%
                                                                                    QV of        = 40%
                                                                    QV = 60%
Editor:A                              Edit history                                          4.
                 Editor:B                                                 3.
          1.                                2.                                 5.
                                                     17
Experimental Setup
• Data   set

  • Japanese Wikipedia   edit history data (at Nov. 2, 2010)

    • 1,889,129   articles, 2,178,003 editors (w/ bots, anonymous IP user)

• High   quality articles (Correct Dataset)

  • “Featured   articles” and “Good articles” selected by Wikipedians.

• Evaluation   measure

  • 11-pt   interpolated Recall-Precision graph
                                         18
Experimental result
            0.10
                                         with editor’s QV          • Precision     improves about 10%.
                                         without editor’s QV
            0.09

                                                                    •   At recall 0 to 0.5, precision improves about 20%, whereas
            0.08
                                                                        precision does not improve At recall 0.6 to 1.
            0.07
                                                                    •   When an article is about current events and is high quality,
Precision




            0.06                                                        our system can decide as high quality, but not in featured
                                                                        article.
            0.04


            0.03
                                                                    •   When one editor writes excellent texts, and the other editors
                                                                        do not edit, the article is “featured” but do not decided as
            0.02                                                        high quality.

            0.01                                                    •   Text’s and editor’s QV converges when we calculate QVs
                                                                        20 times each.
              0
                   0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9     1

                                    Recall                                   19
Conclusion
• Calculate          texts’ quality values using editor’s QV.
    •   Relation of texts’ quality values and editors’ quality values is chicken-or-the-egg.

    •   Mutually calculate text’s quality values and editor’s quality values until converge.

    •   Improved averaging precision ratio is about 10%.

        •   At low recall ratio, precision ratio improves about 20%.

•   Future Work
    •   Confidence of quality values

        •   When A edits 100 articles many times, B edits only ONE article once, and A and B has same QV,
            qualities of A and B are decided as the same by the system. But, this should be different because
            confidence is different.

    •   Other effective assumption

        •   When high quality editor confirms a text, the text should be high quality even if the text is written by
            low quality editor.                            20
Open problem
• Using   contents analysis

 • Estimate terms which appear frequently in high quality articles, but do not
   appear in low quality articles.

• Using   multiple language articles

 • If
    an article in Japanese is similar to that in English, the article is high
   quality?

• For   Web documents, SNS, ...

 • How     to calculate quality degrees without edit history?
                                         21
Thank you!

ありがとうございました!

非常感謝!

ขอบคุณ ครับ !
고마웠습니다 !
                22

More Related Content

Similar to Wikisym 2012

Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014Adam Etkin
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Wikipedia の情報信頼性検証技術
Wikipedia の情報信頼性検証技術Wikipedia の情報信頼性検証技術
Wikipedia の情報信頼性検証技術Yu Suzuki
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationGiannis Tsakonas
 
The changing scholarly communications landscape: what does this mean for peer...
The changing scholarly communications landscape: what does this mean for peer...The changing scholarly communications landscape: what does this mean for peer...
The changing scholarly communications landscape: what does this mean for peer...Research Information Network
 
Bing Social Search - Rise
Bing Social Search - Rise Bing Social Search - Rise
Bing Social Search - Rise ayazook
 
GeoWeb Community Development: How Web 2.0 are you?
GeoWeb Community Development: How Web 2.0 are you?GeoWeb Community Development: How Web 2.0 are you?
GeoWeb Community Development: How Web 2.0 are you?Allan Laframboise
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research PapersPublishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
 
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...Carolyn Hank
 
COSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR ApplicationsCOSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR ApplicationsMark Billinghurst
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research PapersPublishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papersjjuhlrich
 

Similar to Wikisym 2012 (20)

Robinson sage open uksg presentation
Robinson sage open uksg presentationRobinson sage open uksg presentation
Robinson sage open uksg presentation
 
Robinson sage open uksg presentation
Robinson sage open uksg presentationRobinson sage open uksg presentation
Robinson sage open uksg presentation
 
Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014
 
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
 
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
The Ebook, The Whole Ebook, and Nothing But The Ebook: A Holistic View of Ebo...
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Enhancing article visibility and impact
Enhancing article visibility and impactEnhancing article visibility and impact
Enhancing article visibility and impact
 
Bmc Oaspa Webinar
Bmc Oaspa WebinarBmc Oaspa Webinar
Bmc Oaspa Webinar
 
Wikipedia の情報信頼性検証技術
Wikipedia の情報信頼性検証技術Wikipedia の情報信頼性検証技術
Wikipedia の情報信頼性検証技術
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
 
The changing scholarly communications landscape: what does this mean for peer...
The changing scholarly communications landscape: what does this mean for peer...The changing scholarly communications landscape: what does this mean for peer...
The changing scholarly communications landscape: what does this mean for peer...
 
Bing Social Search - Rise
Bing Social Search - Rise Bing Social Search - Rise
Bing Social Search - Rise
 
Want to be a Wikipedian?
Want to be a Wikipedian?Want to be a Wikipedian?
Want to be a Wikipedian?
 
GeoWeb Community Development: How Web 2.0 are you?
GeoWeb Community Development: How Web 2.0 are you?GeoWeb Community Development: How Web 2.0 are you?
GeoWeb Community Development: How Web 2.0 are you?
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research PapersPublishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers
 
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...
(Nov 2011) Blogademia Today, Tomorrow? Scholar Bloggers' Preservation Percept...
 
How to check indexing of publications
How to check indexing of publicationsHow to check indexing of publications
How to check indexing of publications
 
COSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR ApplicationsCOSC 426 Lect. 7: Evaluating AR Applications
COSC 426 Lect. 7: Evaluating AR Applications
 
Publishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research PapersPublishing Scientific Research & How to Write High-Impact Research Papers
Publishing Scientific Research & How to Write High-Impact Research Papers
 

Recently uploaded

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 

Recently uploaded (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 

Wikisym 2012

  • 1. WikiSym 2012 Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles Yu Suzuki Nagoya University, Japan Masatoshi Yoshikawa Kyoto University, Japan 1
  • 2. Have you ever use Wikipedia? 1.0 Wikipedia blog 0.8 Percentage Usage of 0.6 0.4 0.2 Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 0 -18 18-24 25-34 35-44 45-54 55-64 65-74 Age (years old) 2
  • 3. Have you ever use Wikipedia? 1.0 Wikipedia blog 0.8 Percentage Usage of 0.6 0.4 Less than 18 years and more than 65 years old users 0.2 = novice users use Wikipedia frequently. Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 0 -18 18-24 25-34 35-44 45-54 55-64 65-74 Age (years old) 2
  • 4. What is the main purpose? 56% of users use for work and study. But really? 3
  • 5. What is the main purpose? Never heard 8% For Work Never used 8% 20% 56% of users use for work and study. Wikipedia is trusted by For Fun many users. 28% For Study 36% But really? 3
  • 6. Are Wikipedia articles high quality? 7000.00 80% of all artic 5250.00 les are low qua # of Articles l i t y. 値タイトル 3500.00 1750.00 0 1 Quality degree カテゴリタイトル low high 4 (calculated using our proposed method)
  • 7. Objectives • Calculate quality values for articles automatically, accurately. • For readers: Readers may believe which articles are high quality or not. → Readers can assume which articles are high quality. • For editors: Editors can decide which articles need to be edited. • For administrators: Administrators can decide which articles are not appropriate for Wikipedia, for keeping the quality of articles. 5
  • 8. Output of Our proposed system Quality Value: 40% High quality part Low quality part 6
  • 9. What is quality? From Dictionary 【Quality】the degree of excellence of something 【Credibility】the quality of being treated and believed in From Psychology (Fogg 2003) Trustworthiness: How many users believe something Expertise: expert’s opinion We use “trustworthiness” as the definition of quality Quality is not True or False but How many users believe. 7
  • 10. Related Work Link analysisquality articles using[Bellomi 2005, Chin 2011] Identify high based method HITS, PageRank. This method can easily identify major articles, but cannot identify minor but high quality articles. Using editor reputation [Adler2007, Wiklinson 2007] We use this method. Identify which articles are high quality using reputation of editors by editors themselves Good Point: These methods can calculate accurate quality. Because, editors or viewers do not directly decide text quality. Bad Point: Vandals (bad editor) can easily change text quality. 8
  • 11. Plan for Calculating Quality Who evaluate? ・reader (voting) Who evaluate? ・reader themselves (personalization) ・editor (reputation-based) What quality we measure? How to evaluate? ・whole article What quality we ・reader’s voting ・a part measure? of article How to evaluate? ・article analysis ・editor ・article edit history 9
  • 12. Plan for Calculating Quality Who evaluate? ・reader (voting) ・reader themselves (personalization) ・editor (reputation-based) What quality we measure? How to evaluate? ・whole article What quality we ・reader’s voting ・a part measure? of article How to evaluate? ・article analysis ・editor ・article edit history 9
  • 13. Plan for Calculating Quality Who evaluate? ・reader (voting) ・reader themselves (personalization) ・editor (reputation-based) What quality we measure? How to evaluate? ・whole article ・reader’s voting ・a part of article How to evaluate? ・article analysis ・editor ・article edit history 9
  • 14. Plan for Calculating Quality Who evaluate? ・reader (voting) ・reader themselves (personalization) ・editor (reputation-based) What quality we measure? How to evaluate? ・whole article ・reader’s voting ・a part of article ・article analysis ・editor ・article edit history 9
  • 15. Plan to Measure Quality • Why we use reputation-based approach? • Users voting are not always true. • In YouTube, almost all votes are 5 stars (highest scores). • Why we calculate editor’s quality? • We assume that same editor writes same quality of articles. • Why we use edit history? • Our proposed system should language independent. 10
  • 16. Overview Quality degree 55% 5. 1. Identify editors of articles. 2. Get edit history of each editor. 3. Calculate text’s Quality Value; QV. 4. Calculate editor’s QV. 5. Calculate article’s QV. QV of = 70% QV of = 40% QV= 60% Editor:A Edit history 4. Editor:B 3. 1. 2. 11
  • 17. Key Idea High quality texts survive beyond Editor A multiple edits add ・if a text remain - QV of the text ↑ Editor B ・if a text is deleted - QV of the text ↓ delete Editor C 12
  • 18. Calculate Text’s quality values •A writes 100 letters write A • Texts of A do not gain QV 100 100 80 deleted by B •A cannot evaluate A herself 75 20↓ •B deletes 20 letters deleted by C # of letters •B remain A’s 80 letters 50 •B evaluate A’s 80 letters is good 25 •C deletes 60 letters 60↓ 20 20 •C remain A’s 20 letters 0 •C evaluate A’s 20 letters is good 1 2 3 4 version number • A’s text QV = log80 + log20 13
  • 19. Problem • Editor’s quality is not considered. •C deletes A’s text. A • A’s QV decreases. add B • If C has low quality, C may delete high quality texts. delete • A’s QV should NOT be decreased. C • If C has high quality, C should delete low quality texts. • A’s QV should be decreased. 14
  • 20. Use editor’s QV for text’s QV write A without editor’s QV 100 100 with editor’s QV • If B’s QV is 100% 100 80 80 deleted by B •B should deletes low quality texts. 75 20↓ • A’s text is deleted 25 letters by B. deleted by C # of letters • If C’s QV is 50% 50 50 50 •C may delete 50% of high quality 25 60↓ texts. 20 20 0 • A’stext is deleted 30 letters 1 2 3 4 (60 letters × 50%) by C. version number 15
  • 21. Chicken-or-the-egg problem QV of = 70% QV of = 40% = 60% Editor:A Edit history 4. Editor:B 3. use ’s CV 1. 2. • Text’s QV is calculated by both edit history and editor’s QV. • Editor’s QV is calculated by text’s QV. • Editor’s QV ⇆ text’s QV are a chicken-or-the-egg problem. Mutually calculate editor’s and text’s QV until converge. 16
  • 22. Our proposed method 1. Identify editors of articles. 2. Get edit history of each editor. 3. Calculate Text’s QV using editor’s QV. • When first time, all editor’s QV is considered as 1 (highest value). 4. Calculate editor’s QV. 5. If text’s QVs and editor’s QVs are not converged, return 3. 6. Calculate article’s QV. QV of = 70% QV of = 40% QV = 60% Editor:A Edit history 4. Editor:B 3. 1. 2. 5. 17
  • 23. Experimental Setup • Data set • Japanese Wikipedia edit history data (at Nov. 2, 2010) • 1,889,129 articles, 2,178,003 editors (w/ bots, anonymous IP user) • High quality articles (Correct Dataset) • “Featured articles” and “Good articles” selected by Wikipedians. • Evaluation measure • 11-pt interpolated Recall-Precision graph 18
  • 24. Experimental result 0.10 with editor’s QV • Precision improves about 10%. without editor’s QV 0.09 • At recall 0 to 0.5, precision improves about 20%, whereas 0.08 precision does not improve At recall 0.6 to 1. 0.07 • When an article is about current events and is high quality, Precision 0.06 our system can decide as high quality, but not in featured article. 0.04 0.03 • When one editor writes excellent texts, and the other editors do not edit, the article is “featured” but do not decided as 0.02 high quality. 0.01 • Text’s and editor’s QV converges when we calculate QVs 20 times each. 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall 19
  • 25. Conclusion • Calculate texts’ quality values using editor’s QV. • Relation of texts’ quality values and editors’ quality values is chicken-or-the-egg. • Mutually calculate text’s quality values and editor’s quality values until converge. • Improved averaging precision ratio is about 10%. • At low recall ratio, precision ratio improves about 20%. • Future Work • Confidence of quality values • When A edits 100 articles many times, B edits only ONE article once, and A and B has same QV, qualities of A and B are decided as the same by the system. But, this should be different because confidence is different. • Other effective assumption • When high quality editor confirms a text, the text should be high quality even if the text is written by low quality editor. 20
  • 26. Open problem • Using contents analysis • Estimate terms which appear frequently in high quality articles, but do not appear in low quality articles. • Using multiple language articles • If an article in Japanese is similar to that in English, the article is high quality? • For Web documents, SNS, ... • How to calculate quality degrees without edit history? 21

Editor's Notes

  1. I am Yu Suzuki, in the information technology center at Nagoya University. Title of today’s presentation is quality assessment of wikipedia articles using edit history. The purpose of this presentation is how to calculate quality values to Wikipedia articles.\n
  2. I show the data about age of users and percentage usage of services. This question is done by SPIRE Project by Oxford University. Red bar shows Wikipedia, and Blue bar shows blogs. From this graph, less than 18 years old and more than 65 years old users use Wikipedia frequently than the other Web services. These users may not have enough knowledge, then if there is a wrong story in Wikipedia, these users will believe. This is a problem.\n
  3. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?\n
  4. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?\n
  5. This graph showing the relationship between quality degrees and number of articles. This quality is calculated by our proposed system which I will talk later. From this graph, if our system calculates quality values, about 80% of all articles are not credible. This means that almost all users trust Wikipedia, whereas almost all articles are not credible. So I think quality values is important for many users to prevent believing wrong articles.\n
  6. The objectives of this study is to calculate quality degrees automatically, speedy, and accurately. This quality degree is useful for readers, editors, and administrators. Readers may believe which articles are credible or not. Editors can decide which articles need to be edited. And Administrators can decide which articles are not appropriate for Wikipedia for keeping the quality of articles. \n
  7. This is the output of our proposed system. In our system, original Wikipedia article is overlaid with two kinds of color lines. Blue line shows credible part, and red line shows not credible part. Left-upper part shows overall quality degrees, and blue and red bar show ratio of credible, and not credible parts.\n
  8. First, we should define what is quality. This is a very difficult question, but from dictionary, quality is defined as the degrees of excellence of something. Credibility is defined as the quality of being treated and believed in. But this definition is ambiguous, so I carried the definition from psychology. Fogg said that quality is defined as two meanings such as trustworthiness and expertise. Trustworthiness is how many users believe something, and expertise is expert’s opinion. In our study, we use trustworthiness as the definition of quality. Therefore, quality is not true or false but how many users believe.\n
  9. Next, we introduce several related works. There are two approach, link analysis based method and editor reputation based method. Link analysis method is to identify high quality articles using link analysis method such as HITS or PageRank. This method can easily identify major articles, but cannot identify minor but high quality articles. Another method is using editor reputation. In this method, reputation of editors by the other editors. Our method is based on this method. Good point of these methods is these methods can calculate accurate quality, because editors or viewers of articles do not directly decide text quality, but using implicit decision. But bad point is that vandals, bad editor, can easily change text quality. \n
  10. To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
  11. To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
  12. To calculate quality values, I should define quality measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s quality instead of measuring articles or parts of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.\n
  13. I talk again about a plan to measure quality. I used reputation-based approach because user’s voting is not always true. In You Tube, almost all votes are highest scores. I used editor’s quality because we assume that same editor writes same quality of articles. I used edit history because this method is simple, and our proposed system should be language independent. If I use linguistic analysis, the system should be language dependent.\n
  14. This is a overview of our proposed system. First, when I analyze an article, and identify editors. In this example, I identified editor A and B from edit history. Next, I get edit history of the editors for the other articles. Then I analyze this edit history, and calculate text’s and editor’s quality values. I calculate quality value of A is 70% and B is 40%. Finally, for combining these two editor’s quality values, I calculate article quality values. In this case, this article’s quality degree is 55%.\n
  15. The key idea is the remain ratio of texts. This means if a part of articles are high quality, the part is not deleted by the other editors. If a part of articles are low quality, the part is soon deleted or replaced. I give the situation that Editor A writes this part, and editor B adds this part, and editor C delete editor A’s part and replace this part. In this case, Editor B remain Editor A’s part, Editor B decide Editor A’s part is high quality. Editor C remain Editor B’s part, Editor C decide Editor B’s part is high quality. However, Editor C delete Editor A’s part, Editor C decide Editor A’s part is low quality.\n
  16. I explain how to calculate quality value of texts. First, A writes 100 letters to an article, then B deletes 20 letters from A’s text, then editor C deletes 60 letters from A’s text. In this case, at version 1, A cannot gain quality value because A cannot evaluate A herself. Next, at version 2, B remain A’s 80 letters, then B evaluates that A’s 80 letters are good, then A gain 80 positive evaluation from A. Next, at version 3, C remain A’s 20 letters, then C evaluates that A’s 20 letters are good, then A gain 20 positive evaluation from A. As a result, from this edit history, editor A gains log 80 plus log 20 quality values from editor B and C.\n
  17. However, the problem is that this system does not consider editor’s quality. In this case C deletes A’s text. Then, Our system decreases A’s quality value. However, If C has low quality, C may delete high quality texts. In this case, A’s quality value should not be decreased. But if C has high quality, C should delete only low quality texts. Then, A’s quality value should be decreased. Therefore, editor’s quality is important to calculate text quality values.\n
  18. I explain how to calculate quality value of texts using editor quality value. If B’s quality value is 100%, this means that if B is a high quality editor, then B should deletes low quality texts. Therefore, when B deletes delete 25 letters, A’s text should be deleted 25 letters. However, if C deletes 60 letters and C’s quality value is 50%, C may delete 50% of high quality texts. Therefore, A’s text should be deleted 30 letters, a half of actual deleted letters, by C.\n
  19. However, there is another problem. Text quality value is calculated by both edit history and editor’s quality values. Editor quality value is calculated by text quality value. Therefore, calculating editor’s quality values and text quality values are a kind of chicken-or-the-egg problem. Therefore, to solve this problem, we mutually calculate editor’s and text’s quality values until converge these values.\n
  20. Using these discussions, we improve our proposed method. First, we identify editors of articles. Then, we get edit history of each editor. Then, we calculate text’s quality values using editor’s quality value. When it is a first time to calculate text quality values, we assume that all editor’s quality value is considered as 1, the highest value. Then we calculate editor’s quality value. Next, if text quality values and editor’s quality values are not converged, return to step 3. Finally, we calculate article quality values.\n
  21. I used Japanese Wikipedia edit history data from Wikipedia site. I used of 85 thousands and 28 articles, about 13.6% of all all articles. These articles are written by 705 thousands and 713 editors except bot. I used credible articles as featured articles and good articles selected by Wikipedians. In this experiment, I used Japanese Wikipedia, but I can use any language of Wikipedia. However, English version of Wikipedia edit history is not available now. So I cannot use English version of Wikipedia.\n
  22. This is an experimental result. From this recall precision graph, we can confirm that precision ratio improves about 10%. From recall 0 to 0.5, precision ratio improves about 20%, whereas precision ratio does not improve at recall 0.6 to 1. When an article is about current events and is high quality, our system can decide as high quality. But these articles are not in featured articles. When one editor writes excellent texts, and the other editors do not edit, the articles is featured articles, but do not decided as high quality in our method. Moreover, the quality value of texts and editors converges when we calculate quality values 20 times each.\n
  23. Finally, I conclude our study. In this study, we calculate history’s quality values using editor’s quality values. Relation of text and editor quality values is a kind of chicken-or-the-egg problem. To solve this problem, we mutually calculate text and editor quality values until converge. As a result, we improve averaging precision about 10%. At low recall ratio, precision ratio improves about 20%. Next, I introudce our future work. First topic is about confidence of quality values. When A edits 100 articles many times, B edits only one article once, and A and B has same quality values, the qualities of A and B are decided as the same by the system. But, this should be different because confidence of A and B is different. Another topic is about the other effective assumption. When high quality editor confirms a text, the text should be high quality even if the text is written by low quality editor.\n
  24. I consider several problems, such as content analysis techniques. In this method, I estimate terms which appear frequently in credible articles, but do not appear in not credible articles. Next I use multiple language articles. I think english Wikipedia is the richest, therefore if an article in japanese is similar to that in English, the article is credible or rich. I want to adopt my system to Web documents and SNS, but there is no edit history for Web documents. So I should discover how to calculate quality without edit history.\n
  25. Thank you!\n