SlideShare a Scribd company logo
(?)


Wikipedia


                Sep. 8 Shirahama,
            1
1.        (5           )

                       ?                    ?

2.   (20           )

                                    ?

                                        ?

3.   (5        )

4.                         (5       )
                                2
?
3
Wikipedia(blog)                                                                                        ?
 100%

                                          Wikipedia                     blog
  80%




  60%




  40%




  20%




  0%
        -18   18-24   25-34       35-44     45-54       55-64                       65-74
                                                            Oxford university - SPIRE Project
                                                      Results and analysis of Web2.0 services survey
                                                               http://spire.conted.ox.ac.uk/



                              4
Wikipedia(blog)                                                                                         ?
 18
 100%

                                           Wikipedia                     blog
  80%




 Wiki
  60%
                                    65
  40%
      pe              dia
  20%




  0%
        -18   18-24    25-34       35-44     45-54       55-64                       65-74
                                                             Oxford university - SPIRE Project
                                                       Results and analysis of Web2.0 services survey
                                                                http://spire.conted.ox.ac.uk/



                               4
?

    56%




          ?

5
?

           8%
                              56%
      8%
                20%


                          Wikipedia


28%

                36%
                                      ?

                      5
Wikipedia
   7000.00
                           ?
   6000.00




   5000.00
                 80%
   4000.00




   3000.00




   2000.00




   1000.00




        0
             1




                       6
•

    •   :
        →
    •   :
    •       :
•

    •



                7
Wikipedia       ?
            8
Credibility Degree   0.4




9
?

            ?




    ?               ?

?               ?
?




    ?           ?

?           ?
?




?           ?

        ?
?




?       ?
•                   ?

    •

        • YouTube

•                                ?

    •

•                       ?

    •

                            11
55%
                 4.
                           1.
                           2.
                           3.
                           4.
                                A        = 70%
                                B        = 40%
:A
      :B                            3.
 1.                   2.
                           12
•
A
         •

B        •



C


    13
•
                                                    30
    • Wikipedia                60
                                                  720
        • 400   GByte
                                                 43,200
    • Wikipedian        : 70                    2,592,0
                                                       00
    •                          :1   120,000 /




                                    14
10       (x 1000 editors)


• 80%        20%                                                              9



 (Ziph       )                                                                8


                                                                              7




                        Number of Editors
• 20     %
                                                                              6


                                                                              5
                                             Uncredible editors                                          Credible editors
                                                                              4

 •                                                                            3


                                                                              2


 •                                                                            1

                                                                              0

                                            -10     -8       -6   -4   -2         0         2        4   6        8        10


     •
                                                                                                                      (degrees)
                                                                        Reliability Degree




                   15
(              )
•       1:

    •

•       2:              (          )
    •

•       3:   1+2        (TF-IDF)
    •


                   16
?   ?
17
•                                          Wikiped
                                                  ia
    • Wikipedia


     • 85,028         (       13.6%) , 705,713         (Bot
         )

•


    •“            ”       “        ” 98

                                    18
•

    •

    •   :

•

    •            ?

    •                ?

            19
(         1)
                                                  (
                                                  +             2)
       1,100,000
                                                                 (      3)
(ms)


         990,000
         880,000
         770,000
         660,000
         550,000
         440,000
         330,000
         220,000
         110,000
               0
                   10   20   30   40   50   60   70       80   90 100

                                                          (%)

                                                  20
5.8

6


5


3


2


0
                 +
    1   2
                 3
            21
                           40%
•

    •           40%

•

    •            0.02

    •

        •

            •

                        22
?
23
?
•

             ?




    ?                ?




        24
•


    •       ...


        A    B


             D
        C



                  25
•


    •       ...


        A    B


             D
        C



                  25
•


    •       ...


        A    B


             D
        C



                  25
•


    •       ...


        A    B


             D
        C



                  25
•


    •                 ...


        A              B


                       D
        C


            Group A
                            25
•


    •                    ...


        A                 B


                          D
        C


            Group A   Group B
                                25
•

    •

•

    •

• Web   SNS

    •


              26
Thank you!

                         !

        !

ขอบคุณ ครับ !
                !
                    27

More Related Content

Similar to Wikipedia の情報信頼性検証技術

Tweet!tweet!
Tweet!tweet!Tweet!tweet!
Tweet!tweet!
Richard Harrington
 
Hashcaster business overview
Hashcaster business overviewHashcaster business overview
Hashcaster business overview
Hashcaster
 
Wikimedia Conference 2009 presentation
Wikimedia Conference 2009 presentationWikimedia Conference 2009 presentation
Wikimedia Conference 2009 presentation
Yu Suzuki
 
Egyptian elections presidential debate may 2012
Egyptian elections presidential debate may 2012Egyptian elections presidential debate may 2012
Egyptian elections presidential debate may 2012
SocialEyez
 
Mobile summit 2 16-13 (3b)
Mobile summit 2 16-13 (3b)Mobile summit 2 16-13 (3b)
Mobile summit 2 16-13 (3b)
popeyesm
 
Gusa20101023
Gusa20101023Gusa20101023
Gusa20101023
Katsu Kuwano
 
banthai
banthaibanthai
banthai
monsompuwach
 
Distributing Video To The Masses
Distributing Video To The MassesDistributing Video To The Masses
Distributing Video To The Masses
Richard Harrington
 
"Make problems visible and users happy" by Catherine Chabiron
"Make problems visible and users happy" by Catherine Chabiron"Make problems visible and users happy" by Catherine Chabiron
"Make problems visible and users happy" by Catherine Chabiron
Operae Partners
 
European initiatives
European initiativesEuropean initiatives
European initiatives
Edward Baker
 
Top Pages Q4 business.co.uk
Top Pages Q4 business.co.ukTop Pages Q4 business.co.uk
Top Pages Q4 business.co.uk
Xma Nottingham
 
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
Can Yuerekli
 
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
Adam Schoenfeld
 
Moral Panics over the Internet
Moral Panics over the InternetMoral Panics over the Internet
The History of App Store
The History of App StoreThe History of App Store
The History of App Store
Seungyul Kim
 
ソーシャルメディアのビジネス活用最前線Ver1.1.0
ソーシャルメディアのビジネス活用最前線Ver1.1.0ソーシャルメディアのビジネス活用最前線Ver1.1.0
ソーシャルメディアのビジネス活用最前線Ver1.1.0
Toru Saito
 
Lean principles and practices
Lean principles and practicesLean principles and practices
Lean principles and practices
Jelle Bens
 
China Online Retail Market - iResearch - Will Tao
China Online Retail Market -  iResearch - Will TaoChina Online Retail Market -  iResearch - Will Tao
China Online Retail Market - iResearch - Will Tao
iResearch
 
Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
 Web技術の現状と将来 (Open Source Conference 2011 Nagoya) Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
Rikkyo University
 
web_2.0_the_end_again
web_2.0_the_end_againweb_2.0_the_end_again
web_2.0_the_end_again
gzioni
 

Similar to Wikipedia の情報信頼性検証技術 (20)

Tweet!tweet!
Tweet!tweet!Tweet!tweet!
Tweet!tweet!
 
Hashcaster business overview
Hashcaster business overviewHashcaster business overview
Hashcaster business overview
 
Wikimedia Conference 2009 presentation
Wikimedia Conference 2009 presentationWikimedia Conference 2009 presentation
Wikimedia Conference 2009 presentation
 
Egyptian elections presidential debate may 2012
Egyptian elections presidential debate may 2012Egyptian elections presidential debate may 2012
Egyptian elections presidential debate may 2012
 
Mobile summit 2 16-13 (3b)
Mobile summit 2 16-13 (3b)Mobile summit 2 16-13 (3b)
Mobile summit 2 16-13 (3b)
 
Gusa20101023
Gusa20101023Gusa20101023
Gusa20101023
 
banthai
banthaibanthai
banthai
 
Distributing Video To The Masses
Distributing Video To The MassesDistributing Video To The Masses
Distributing Video To The Masses
 
"Make problems visible and users happy" by Catherine Chabiron
"Make problems visible and users happy" by Catherine Chabiron"Make problems visible and users happy" by Catherine Chabiron
"Make problems visible and users happy" by Catherine Chabiron
 
European initiatives
European initiativesEuropean initiatives
European initiatives
 
Top Pages Q4 business.co.uk
Top Pages Q4 business.co.ukTop Pages Q4 business.co.uk
Top Pages Q4 business.co.uk
 
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
Socialmediameasurementforallakafundatafromsxswi2013 130312113831-phpapp02 (1)
 
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
Social Media Measurement for Consumers (aka #FunData): SXSWi 2013
 
Moral Panics over the Internet
Moral Panics over the InternetMoral Panics over the Internet
Moral Panics over the Internet
 
The History of App Store
The History of App StoreThe History of App Store
The History of App Store
 
ソーシャルメディアのビジネス活用最前線Ver1.1.0
ソーシャルメディアのビジネス活用最前線Ver1.1.0ソーシャルメディアのビジネス活用最前線Ver1.1.0
ソーシャルメディアのビジネス活用最前線Ver1.1.0
 
Lean principles and practices
Lean principles and practicesLean principles and practices
Lean principles and practices
 
China Online Retail Market - iResearch - Will Tao
China Online Retail Market -  iResearch - Will TaoChina Online Retail Market -  iResearch - Will Tao
China Online Retail Market - iResearch - Will Tao
 
Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
 Web技術の現状と将来 (Open Source Conference 2011 Nagoya) Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
Web技術の現状と将来 (Open Source Conference 2011 Nagoya)
 
web_2.0_the_end_again
web_2.0_the_end_againweb_2.0_the_end_again
web_2.0_the_end_again
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Wikipedia の情報信頼性検証技術

  • 1. (?) Wikipedia Sep. 8 Shirahama, 1
  • 2. 1. (5 ) ? ? 2. (20 ) ? ? 3. (5 ) 4. (5 ) 2
  • 3. ? 3
  • 4. Wikipedia(blog) ? 100% Wikipedia blog 80% 60% 40% 20% 0% -18 18-24 25-34 35-44 45-54 55-64 65-74 Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 4
  • 5. Wikipedia(blog) ? 18 100% Wikipedia blog 80% Wiki 60% 65 40% pe dia 20% 0% -18 18-24 25-34 35-44 45-54 55-64 65-74 Oxford university - SPIRE Project Results and analysis of Web2.0 services survey http://spire.conted.ox.ac.uk/ 4
  • 6. ? 56% ? 5
  • 7. ? 8% 56% 8% 20% Wikipedia 28% 36% ? 5
  • 8. Wikipedia 7000.00 ? 6000.00 5000.00 80% 4000.00 3000.00 2000.00 1000.00 0 1 6
  • 9. • : → • : • : • • 7
  • 10. Wikipedia ? 8
  • 12. ? ? ? ? ? ?
  • 13. ? ? ? ? ?
  • 14. ? ? ? ?
  • 15. ? ? ?
  • 16. ? • • YouTube • ? • • ? • 11
  • 17. 55% 4. 1. 2. 3. 4. A = 70% B = 40% :A :B 3. 1. 2. 12
  • 18. • A • B • C 13
  • 19. 30 • Wikipedia 60 720 • 400 GByte 43,200 • Wikipedian : 70 2,592,0 00 • :1 120,000 / 14
  • 20. 10 (x 1000 editors) • 80% 20% 9 (Ziph ) 8 7 Number of Editors • 20 % 6 5 Uncredible editors Credible editors 4 • 3 2 • 1 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 • (degrees) Reliability Degree 15
  • 21. ( ) • 1: • • 2: ( ) • • 3: 1+2 (TF-IDF) • 16
  • 22. ? ? 17
  • 23. Wikiped ia • Wikipedia • 85,028 ( 13.6%) , 705,713 (Bot ) • •“ ” “ ” 98 18
  • 24. • • : • • ? • ? 19
  • 25. ( 1) ( + 2) 1,100,000 ( 3) (ms) 990,000 880,000 770,000 660,000 550,000 440,000 330,000 220,000 110,000 0 10 20 30 40 50 60 70 80 90 100 (%) 20
  • 26. 5.8 6 5 3 2 0 + 1 2 3 21 40%
  • 27. • 40% • • 0.02 • • • 22
  • 28. ? 23
  • 29. ? • ? ? ? 24
  • 30. • ... A B D C 25
  • 31. • ... A B D C 25
  • 32. • ... A B D C 25
  • 33. • ... A B D C 25
  • 34. • ... A B D C Group A 25
  • 35. • ... A B D C Group A Group B 25
  • 36. • • • • Web SNS • 26
  • 37. Thank you! ! ! ขอบคุณ ครับ ! ! 27

Editor's Notes

  1. I appreciate the opportunity to give this presentation. I am Yu Suzuki, in the information technology center at Nagoya University. Title of today’s presentation is credibility assessment of wikipedia articles using edit history. The purpose of this presentation is how to calculate credibility degrees to Wikipedia articles.
  2. Today, I would like to to talk about how to calculate credibility, one side of quality, for Wikipedia articles. First, I should mention what is credibility values, and why I should calculate? Next, I talk our proposed system. When I talk, first, I mention how to calculate credibility degrees. This method is time consuming, therefore I talk how to speed up credibility calculation. Finally, I talk about experimental evaluation, conclusion, and future work.
  3. First, I talk about the motivation of my study. In this part, I talk what is the quality of articles, why the quality of article is important, and how the quality is useful for users.
  4. I show the data about age of users and percentage usage of services. This questionnaire is done by SPIRE Project by Oxford University. Red bar shows Wikipedia, and Blue bar shows blogs. From this graph, less than 18 years old and more than 65 years old users use Wikipedia frequently than the other Web services. These users may not have enough knowledge, then if there is a wrong story in Wikipedia, these users will believe. This is a problem.
  5. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  6. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  7. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  8. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  9. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  10. I show another graph about the purpose of Wikipedia. From this graph, more than 56 percent of users use Wikipedia for work and study. This shows that Wikipedia is trusted by many users, at least 56 percent of users trust. However, do you think Wikipedia is reliable?
  11. This graph showing the relationship between credibility degrees and number of articles. This credibility is calculated by our proposed system which I will talk later. From this graph, if our system calculates accurate credibility values, about 80% of all articles are not credible. This means that almost all users trust Wikipedia, whereas almost all articles are not credible. So I think credibility values is important for many users to prevent believing wrong articles.
  12. The objectives of this study is to calculate credibility degrees automatically, speedy, and accurately. This credibility degree is useful for readers, editors, and administrators. Readers may believe which articles are credible or not. Editors can decide which articles need to be edited. And Administrators can decide which articles are not appropriate for Wikipedia for keeping the quality of articles. This study is a state-of-the-art. The goal of this study is to calculate the quality of articles, but in this presentation, I calculate credibility, a side of quality.
  13. Next, I talk our proposed system. In this part, I talk how to calculate credibility values of Wikipedia articles.
  14. This is the output of our proposed system. In our system, original Wikipedia article is overlaid with three kinds of color lines. Blue line shows credible part, red line shows not credible part, and yellow line shows unknown part. Left-upper part shows overall credibility degrees, and blue, red, and yellow bar show ratio of credible, not credible, and unknown parts.
  15. To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
  16. To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
  17. To calculate credibility values, I should define credibility measurement method. To define, I should consider three agendas, such as who evaluate articles, what quality we measure, and how to evaluate articles. Reader’s decision is used such as voting, personalization. In our system, I select editor’s reputation, because I think this method is fair. Next, I measure editor’s credibility instead of articles or a part of articles, because I think same users write same quality of articles. And I evaluate using edit history, because this method is simple and effective.
  18. I talk again about a plan to measure credibility. I used reputation-based approach because user’ voting is not always true. In You Tube, almost all votes are highest scores. I used editor’s credibility because we assume that same editor writes same quality of articles. I used edit history because this method seems simple, and our proposed system should language independent.
  19. This is a overview of our proposed system. First, when I analyze an article, I identify editors. In this example, I identify editor A and B from edit history. Next, I get edit history of the editors for the other articles. Then I analyze this edit history, and calculate editor’s credibility values. I calculate credibility value of A is 70% and B is 40%. Finally, for combining these two editor’s credibility values, I calculate article credibility values. In this case, this article’s credibility degree is 55%.
  20. The key idea is the remain ratio. This means if a part of articles are credible, the part is not deleted by the other editors. If a part of articles are not credible, the part is soon deleted or replaced. I give the situation that Editor A writes this part, and editor B adds this part, and editor C delete editor A’s part and replace this part. In this case, Editor B remain Editor A’s part, Editor B decide Editor A’s part is credible. Editor C remain Editor B’s part, Editor C decide Editor B’s part is credible. However, Editor C delete Editor A’s part, Editor C decide Editor A’s part is not credible.
  21. However, this method is time consuming, because I should analyze all editors’ remain ratio. However, number of articles in Wikipedia is more than six hundred thousand pages, more than four hundred giga bytes. Number of active Wikipedian is more than seven hundred thousand. Number of edits per person is at most one hundred and twenty thousand per month. These number shows calculation cost of this method is too large. So I think reduction of calculation time is important.
  22. To reduce calculation cost, I use a method to specify key person. This graph shows an assumption of credibility degree and the number of editors. From this graph, a number of credible and not credible editors are small. From Zipf’s law, 20% of all editors contribute 80% of articles. Therefore, if “ can identify 20% key persons, I will reduce calculation time, and I also improve the accuracy of credibility, because not key persons are seemed to be noise.
  23. I propose three methods to identify key person. Method 1 is called number of words. In this method, if editors write many words, the editors are key persons. Method 2 is called number of articles. In this method, if editors write many articles, the editors are key persons. Method 3 is a combination of Method 1 and 2. In this method, if editors write many words to small number of articles, the editors are key persons. These methods come from the idea of information retrieval research field. Method 1 comes from term frequency, method 2 comes from document frequency, and method 3 comes from TF-IDF.
  24. Next, I show the experimental evaluation.
  25. I used Japanese Wikipedia edit history data from Wikipedia site. I used of 85 thousands and 28 articles, about 13.6% of all all articles. These articles are written by 705 thousands and 713 editors except bot. I used credible articles as featured articles and good articles selected by Wikipedians. In this experiment, I used Japanese Wikipedia, but I can use any language of Wikipedia. However, English version of Wikipedia edit history is not available now. So I cannot use English version of Wikipedia.
  26. In this evaluation experiment, I use two two metrics, such as calculation time and precision. I do not use precision and recall ratio which are generally used for Information Retrieval research field, because these ratio is too small, I can’t compare these values. Next, I discuss which articles are decided as credible articles, and is ignoring editors with small contributes effective for better accuracy?
  27. This graph shows decreasing ratio of editors and calculation time. This graph shows direct proportion between decreasing ratio and calculation time, then if I set decreasing ratio to 40%, calculation ratio is about 40%. Therefore, I show if I reduce editors, I can reduce calculation time.
  28. This graph shows averasing increased rank of featured and good articles which are ordered by credibility values using method 1, 2, 3, and original. From this graph, method 3 improve about 5.8 ranks. Therefore, I can improve accuracy. This is because, many small contribute persons are ignored in this method 3.
  29. We can reduce calculation cost to 40%. However, averaging precision ratio is about 0.02, which is too small, inaccurate. This is because, I cannot cove all types of articles. Credibility degrees of several types of articles are high which are unexpected. For example, long articles, only adding articles such title list of TV shows, anime programs, and so on.
  30. In this study, I want to measure information quality, which is intuitive for human sense. I think information quality should be calculated using many factors. I already mention, who evaluate, what quality we measure, how to evaluate, is three key to consider factors. I will try several method based on these three factors. In next slide, I talk one example of future work.
  31. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  32. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  33. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  34. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  35. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  36. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  37. One future work plan is visualizing author relationship. In this study, I estimate author relationships such as opposition, subordination, and cooperative. In this example, using a method User A and B are opposite edit, and A and C are subordination edit, and B and D are cooperative edit. Using these relationships, I can categorize these groups, and I will calculate quality of groups.
  38. I consider several problems, such as content analysis techniques. In this method, I estimate terms which appear frequently in credible articles, but do not appear in not credible articles. Next I use multiple language articles. I think english Wikipedia is the richest, therefore if an article in japanese is similar to that in English, the article is credible or rich. I want to adopt my system to Web documents and SNS, but there is no edit history for Web documents. So I should discover how to calculate quality without edit history.
  39. Thank you!