Content-Driven
Author Reputation and Text Trust
          for the Wikipedia
              Luca de Alfaro
               UC...
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Text Trus...
Reputation: Our guiding principles

• Do not alter the Wikipedia user experience
  – Compute reputation from content evolu...
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose ...
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose ...
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose ...
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose ...
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose ...
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputatio...
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputatio...
Content-driven reputation mitigates
reputation wars
                                             -2
Wars in user-driven re...
Content-driven reputation mitigates
reputation wars
                                              -2
Wars in user-driven r...
Validation: Does our reputation have
predictive value?
              Time
Article 1
Article 2
Article 3
Article 4
  ...


...
Validation: Does our reputation have
predictive value?
                       Time
Article 1
Article 2
                   ...
Building a content-driven
             reputation system
                for Wikipedia



This is a summary; for details s...
What is a “contribution”?

     Text                        Edit

     bla   ei        bla   yak            bla bla


    ...
Text


version 9      bla   bla wuga boink
                5     8   9    6

 version 10    bla   bla   wuga   wuga   wuga...
Text: the destiny of a contribution


      of words
      number



                                       time
         ...
Text: Longevity

            Tk
                                                 j-k
                                   Tk...
Text: Reputation update

          Tk
                                 Tj
        of words
        number



             ...
Measuring surviving text

                                          “Dead” text
                “Live” text

Version
     ...
Edit

 k-1 d(k
        -   1,
                 k)

                      k judged
  d(k




                      d(k, j)
...
Edit: good or bad?

        the past
                                                     k judged
  k-1
                 ...
Edit: Longevity
 the past
                                                   Edit Longevity:
                        “w
  ...
Edit: Updating reputation
 the past
                                         Edit Longevity:
                        “w
  ...
Data Sets

• English till Feb 07 1,988,627 pages, 40,455,416 versions

• French till Feb 07    452,577 pages,   5,643,636 ...
Results: English Wikipedia, in detail

                                      % of edits below a given longevity

         ...
Results: English Wikipedia, in detail

                                       % of edits below a given longevity

        ...
Predictive power of low reputation

Low-reputation:      Lower 20% of range


            Short-lived edits         Short-...
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                                A


Yadda yadda wuga wuga ya...
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                           A


Yadda yadda wuga wuga yak yak...
Demo: http://trust.cse.ucsc.edu/
Text trust: How is “Fogh” spelled?
Text Trust: more examples from the demo
Text Trust: Details

Trust depends on:
• Authorship: Author lends 50% of their reputation to
  the text they create.
   – ...
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
Batch Implementation


                      periodic xml dumps
                          (to initialize)

               ...
On-Line Implementation

Process edits as they arrive:
• Benefit: real-time colorization of text
• Need to integrate the co...
My questions:
• Feedback?
• Do you like it?
• Should we try to set up a “trust server” with
  an edit feed from the Wikipe...
Upcoming SlideShare
Loading in …5
×

A Content-Driven Reputation System for the Wikipedia

1,223 views
1,100 views

Published on

http://nextlib.lifegoo.com/user/pluskid/article/1888

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,223
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Content-Driven Reputation System for the Wikipedia

  1. 1. Content-Driven Author Reputation and Text Trust for the Wikipedia Luca de Alfaro UC Santa Cruz Joint work with Bo Adler, Ian Pye, Caitlin Sadowski (UCSC) Wikimania, August 2007
  2. 2. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions.
  3. 3. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions. Text Trust: • Goal: provide a measure of the reliability of the text. • Method: computed from the reputation of the authors who create and revise the text.
  4. 4. Reputation: Our guiding principles • Do not alter the Wikipedia user experience – Compute reputation from content evolution, rather than user-to-user comments. • Be welcoming to all users – Never publicly display user reputation values. Authors know only their own reputation. • Be objective – Rely on content evolution rather than comments. – Quantitatively evaluate how well it works.
  5. 5. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article time
  6. 6. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time
  7. 7. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time B builds on A’s edit
  8. 8. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time B builds on A’s edit
  9. 9. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time + B builds on A’s edit - C reverts to A’s version
  10. 10. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B
  11. 11. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3
  12. 12. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then B re-instate A’s work, it is B’s reputation that suffers.
  13. 13. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then + B re-instate A’s work, it is B’s - reputation that suffers. others?
  14. 14. Validation: Does our reputation have predictive value? Time Article 1 Article 2 Article 3 Article 4 ... = edits by user A
  15. 15. Validation: Does our reputation have predictive value? Time Article 1 Article 2 E Article 3 Article 4 ... The longevity of an edit E The reputation of author A depends on the history at the time of an edit E depends after the edit. on the history before the edit. Can we show a correlation between author reputation and edit longevity ?
  16. 16. Building a content-driven reputation system for Wikipedia This is a summary; for details see: B.T. Adler, L. de Alfaro. A Content Driven Reputation System for the Wikipedia. In Proc. of WWW 2007.
  17. 17. What is a “contribution”? Text Edit bla ei bla yak bla bla buy viagra! bla yak ei yak bla bla bla We measure how long the added We measure how long the “edit” text survives. (reorganization) survives. Based on text Based on edit distance. tracking.
  18. 18. Text version 9 bla bla wuga boink 5 8 9 6 version 10 bla bla wuga wuga wuga boink 5 8 10 10 9 6 We label each word with the version where it was introduced. This enables us to keep track of how long it lives.
  19. 19. Text: the destiny of a contribution of words number time (versions) Amount of Amount of new text surviving text The life of the text introduced at a revision.
  20. 20. Text: Longevity Tk j-k Tk ¢ α text of words number time j k (versions) • Text longevity: the α text 2 [0,1] that yields the best geometrical approximation for the amount of residual text. • Short-lived text: α text < 0.2 (at most 20% of the text makes it from one version to the next).
  21. 21. Text: Reputation update Tk Tj of words number time j k (versions) Ak Aj (authors) As a consequence of edit j, we increase the reputation of Ak by an amount proportional to Tj and to the reputation of Aj
  22. 22. Measuring surviving text “Dead” text “Live” text Version wuga boing bla ble stored as “dead” 9 7 9 6 6 wuga boing bla ble buy viagra now! 10 7 9 6 6 10 10 10 t es h bc at m wuga boing bla ble 11 7 9 6 6 We track authorship of deleted text, and we match the text of new versions both with live and with dead text.
  23. 23. Edit k-1 d(k - 1, k) k judged d(k d(k, j) -1, k<j j) j judge We compute the edit distance between versions k-1, k, and j, with k < j (see paper for details on the distance)
  24. 24. Edit: good or bad? the past k judged k-1 the past k-1 k judged , j) d(k d(k d(k, j) d(k -1, -1, j) j) j j judge judge the future the future k is good: d(k-1, j) > d(k, j) k is bad: d(k-1, j) < d(k, j) “k went towards the future” “k went against the future”
  25. 25. Edit: Longevity the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j k ogr )-d (k,j es s The fraction of change that is in ) ” the same direction of the future. α edit ' 1: k is a good edit • j α edit ' -1: k is reverted • the future
  26. 26. Edit: Updating reputation the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j kA ogr )-d (k,j es s k Reputation update: ) ” The reputation of Ak • increases if α edit > 0, j Aj • decreases if α edit < 0. the future (see paper for details)
  27. 27. Data Sets • English till Feb 07 1,988,627 pages, 40,455,416 versions • French till Feb 07 452,577 pages, 5,643,636 versions • Italian till May 07 301,584 pages, 3,129,453 versions The entire Wikipedias, with the whole history, not just a sample (we wanted to compute the reputation using all edits of each user).
  28. 28. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 0 16.922 93.11 91.65 89.15 83.76 73.53 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14
  29. 29. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 low 0 16.922 93.11 91.65 89.15 83.76 73.53 rep 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14 Short-Lived
  30. 30. Predictive power of low reputation Low-reputation: Lower 20% of range Short-lived edits Short-lived text α text α edit · 0.2 · -0.8 (less than 20% (almost entirely undone) survives each revision)
  31. 31. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong Old text is colored according New text is colored to the reputation of its original according to the author, and of all subsequent reputation of A revisors (including A).
  32. 32. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong • On the English Wikipedia, we should be able to spot untrusted content with over 80% recall and 60% precision! – In fact, we do even better than this, as new content is always flagged lower trust (see next).
  33. 33. Demo: http://trust.cse.ucsc.edu/
  34. 34. Text trust: How is “Fogh” spelled?
  35. 35. Text Trust: more examples from the demo
  36. 36. Text Trust: Details Trust depends on: • Authorship: Author lends 50% of their reputation to the text they create. – Thus, even text from high-rep authors is only medium- rep when added: high trust is achieved only via multiple reviews, never via a single author. • Revision: When an author of reputation r preserves a word of trust t < r, the word increases in trust to t + 0.3(r – t) • The algorithms still need fine-tuning.
  37. 37. From fresh to trusted text
  38. 38. From fresh to trusted text
  39. 39. From fresh to trusted text
  40. 40. From fresh to trusted text
  41. 41. From fresh to trusted text
  42. 42. Batch Implementation periodic xml dumps (to initialize) edit feed (to keep updated) Trust server Wikipedia servers • No need to affect the main Wikipedia servers • People can click “check trust” and visit the trust server. • Good for experimenting with new ideas • Necessary to color the past (come up to speed).
  43. 43. On-Line Implementation Process edits as they arrive: • Benefit: real-time colorization of text • Need to integrate the code in MediaWiki • Time to process an edit: < 1s (not much longer than parsing it). • Storage required: proportional to the size of the last revision (not to the total history size!) • Can be easily used for other Wikis
  44. 44. My questions: • Feedback? • Do you like it? • Should we try to set up a “trust server” with an edit feed from the Wikipedia? • Try the demo: http://trust.cse.ucsc.edu/ Your questions?

×