SlideShare a Scribd company logo
Content-Driven
Author Reputation and Text Trust
          for the Wikipedia
              Luca de Alfaro
               UC Santa Cruz

               Joint work with
  Bo Adler, Ian Pye, Caitlin Sadowski (UCSC)


                            Wikimania, August 2007
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Text Trust:
• Goal: provide a measure of the reliability of the text.
• Method: computed from the reputation of the authors
  who create and revise the text.
Reputation: Our guiding principles

• Do not alter the Wikipedia user experience
  – Compute reputation from content evolution, rather
    than user-to-user comments.
• Be welcoming to all users
  – Never publicly display user reputation values.
    Authors know only their own reputation.
• Be objective
  – Rely on content evolution rather than comments.
  – Quantitatively evaluate how well it works.
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
                   +
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                  A Wikipedia article

                         A
                              edits
                   +
time




              +          B
                              builds on A’s edit
                   -
                         C    reverts to A’s version
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
                                      -3
Content-driven reputation mitigates
reputation wars
                                             -2
Wars in user-driven reputation:      A                 B
                                             -3

Wars in content-driven reputation:

                   • B can badmouth A by undoing
                      her work
         A
             -     • But this is risky: if others then
         B            re-instate A’s work, it is B’s
                      reputation that suffers.
Content-driven reputation mitigates
reputation wars
                                              -2
Wars in user-driven reputation:      A                  B
                                              -3

Wars in content-driven reputation:

                     • B can badmouth A by undoing
                       her work
         A
             -       • But this is risky: if others then
                 +
         B             re-instate A’s work, it is B’s
           -           reputation that suffers.
          others?
Validation: Does our reputation have
predictive value?
              Time
Article 1
Article 2
Article 3
Article 4
  ...


              = edits by user A
Validation: Does our reputation have
predictive value?
                       Time
Article 1
Article 2
                                   E
Article 3
Article 4
  ...


                                       The longevity of an edit E
   The reputation of author A
                                        depends on the history
at the time of an edit E depends
                                            after the edit.
 on the history before the edit.


              Can we show a correlation between
            author reputation and edit longevity ?
Building a content-driven
             reputation system
                for Wikipedia



This is a summary; for details see:
B.T. Adler, L. de Alfaro. A Content Driven Reputation
System for the Wikipedia. In Proc. of WWW 2007.
What is a “contribution”?

     Text                        Edit

     bla   ei        bla   yak            bla bla


                                        buy viagra!
    bla    yak ei   yak    bla

                                          bla bla
We measure how
long the added
                    We measure how long the “edit”
text survives.
                    (reorganization) survives.
Based on text
                    Based on edit distance.
tracking.
Text


version 9      bla   bla wuga boink
                5     8   9    6

 version 10    bla   bla   wuga   wuga   wuga boink
                5     8     10     10     9    6


 We label each word with the version where it was
 introduced. This enables us to keep track of how
 long it lives.
Text: the destiny of a contribution


      of words
      number



                                       time
                                    (versions)

    Amount of        Amount of
     new text      surviving text



    The life of the text introduced at a revision.
Text: Longevity

            Tk
                                                 j-k
                                   Tk ¢ α text
          of words
          number



                                                  time
                               j
                     k                     (versions)

• Text longevity: the α text 2 [0,1] that yields the best
  geometrical approximation for the amount of residual
  text.
• Short-lived text: α text < 0.2 (at most 20% of the text
  makes it from one version to the next).
Text: Reputation update

          Tk
                                 Tj
        of words
        number



                                         time
                            j
                   k                  (versions)
                   Ak       Aj        (authors)


As a consequence of edit j, we increase the reputation
  of Ak by an amount proportional to Tj and to the
  reputation of Aj
Measuring surviving text

                                          “Dead” text
                “Live” text

Version
         wuga     boing bla ble     stored as “dead”
 9
          7        9     6     6


                                           wuga   boing bla ble
          buy    viagra now!
 10
                                            7      9     6    6
          10       10    10           t
                                   es h
                                   bc
                                     at
                                   m
         wuga     boing bla ble
 11
          7        9     6     6
      We track authorship of deleted text, and we match the text of
      new versions both with live and with dead text.
Edit

 k-1 d(k
        -   1,
                 k)

                      k judged
  d(k




                      d(k, j)
     -1,




                                 k<j
    j)




                 j     judge




We compute the edit distance between versions k-1, k, and
 j, with k < j
                                       (see paper for details on the distance)
Edit: good or bad?

        the past
                                                     k judged
  k-1
                                 the past
                                      k-1
                 k judged




                                                    , j)
  d(k




                                    d(k
                 d(k, j)




                                                 d(k
     -1,




                                       -1,
        j)




                                       j)
                                             j
             j    judge                          judge

     the future                       the future


k is good: d(k-1, j) > d(k, j)     k is bad: d(k-1, j) < d(k, j)
 “k went towards the future”        “k went against the future”
Edit: Longevity
 the past
                                                   Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                    k
 ogr
          )-d
              (k,j
              es s




                                        The fraction of change that is in
                   )
                    ”




                                          the same direction of the future.
                                            α edit ' 1: k is a good edit
                                        •
                                j           α edit ' -1: k is reverted
                                        •
                  the future
Edit: Updating reputation
 the past
                                         Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                  kA
 ogr
          )-d
              (k,j
              es s




                                     k
                                         Reputation update:
                   )
                    ”




                                         The reputation of Ak

                                         • increases if α edit > 0,
                                j Aj
                                         • decreases if α edit < 0.
                  the future

                                          (see paper for details)
Data Sets

• English till Feb 07 1,988,627 pages, 40,455,416 versions

• French till Feb 07    452,577 pages,   5,643,636 versions

• Italian till May 07   301,584 pages,   3,129,453 versions



The entire Wikipedias, with the whole history, not just a
  sample (we wanted to compute the reputation using all edits
  of each user).
Results: English Wikipedia, in detail

                                      % of edits below a given longevity

                       Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
                         0   16.922    93.11   91.65   89.15   83.76       73.53
                         1    1.191    77.24   69.83   65.60   61.11       56.00
                         2    1.335    69.53   57.08   49.79   45.71       41.25
log (1 + reputation)




                         3    1.627    38.00   28.61   20.23   16.16       13.62
                         4    2.780    32.84   22.31   13.32    9.57        8.04
                         5    4.408    41.70   15.76    5.90    3.80        2.57
                         6    6.698    29.40   16.74    7.54    4.35        3.12
                         7    8.281    32.04   15.16    5.44    2.25        1.40
                         8   12.233    34.06   16.64    6.78    3.79        2.73
                         9   44.524    32.55   15.51    5.05    1.88        1.14
Results: English Wikipedia, in detail

                                       % of edits below a given longevity

                        Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
low                       0   16.922    93.11   91.65   89.15   83.76       73.53
rep                       1    1.191    77.24   69.83   65.60   61.11       56.00
                          2    1.335    69.53   57.08   49.79   45.71       41.25
 log (1 + reputation)




                          3    1.627    38.00   28.61   20.23   16.16       13.62
                          4    2.780    32.84   22.31   13.32    9.57        8.04
                          5    4.408    41.70   15.76    5.90    3.80        2.57
                          6    6.698    29.40   16.74    7.54    4.35        3.12
                          7    8.281    32.04   15.16    5.44    2.25        1.40
                          8   12.233    34.06   16.64    6.78    3.79        2.73
                          9   44.524    32.55   15.51    5.05    1.88        1.14

                                                                  Short-Lived
Predictive power of low reputation

Low-reputation:      Lower 20% of range


            Short-lived edits         Short-lived text
                                          α text
                  α edit                           · 0.2
                           · -0.8
                                         (less than 20%
           (almost entirely undone)
                                      survives each revision)
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                                A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong




  Old text is colored according       New text is colored
  to the reputation of its original   according to the
  author, and of all subsequent       reputation of A
  revisors (including A).
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                           A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong


• On the English Wikipedia, we should be able to spot
  untrusted content with over 80% recall and 60%
  precision!
   – In fact, we do even better than this, as new content
     is always flagged lower trust (see next).
Demo: http://trust.cse.ucsc.edu/
Text trust: How is “Fogh” spelled?
Text Trust: more examples from the demo
Text Trust: Details

Trust depends on:
• Authorship: Author lends 50% of their reputation to
  the text they create.
   – Thus, even text from high-rep authors is only medium-
     rep when added: high trust is achieved only via multiple
     reviews, never via a single author.
• Revision: When an author of reputation r preserves a
  word of trust t < r, the word increases in trust to
                          t + 0.3(r – t)
• The algorithms still need fine-tuning.
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
Batch Implementation


                      periodic xml dumps
                          (to initialize)

                           edit feed
                       (to keep updated)
                                              Trust server
Wikipedia servers

• No need to affect the main Wikipedia servers
• People can click “check trust” and visit the trust server.
• Good for experimenting with new ideas
• Necessary to color the past (come up to speed).
On-Line Implementation

Process edits as they arrive:
• Benefit: real-time colorization of text
• Need to integrate the code in MediaWiki
• Time to process an edit: < 1s (not much longer than
  parsing it).
• Storage required: proportional to the size of the last
  revision (not to the total history size!)
• Can be easily used for other Wikis
My questions:
• Feedback?
• Do you like it?
• Should we try to set up a “trust server” with
  an edit feed from the Wikipedia?
• Try the demo:

       http://trust.cse.ucsc.edu/

Your questions?

More Related Content

Viewers also liked

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenus
Régis Vansnick
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociaux
Quinchy Riya
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]
HUB INSTITUTE
 
Online Reputation Management
Online Reputation ManagementOnline Reputation Management
Online Reputation Management
Jenny-Rebecca Schmitt
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Ppt
kmlaughl
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social Media
Paul Marsden
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux
Hicham Sabre
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
HUB INSTITUTE
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vf
Emarketing.fr
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPT
varun0912
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
Ahmad Jawdat
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blanc
Aref Jdey
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by Vanksen
Vanksen
 

Viewers also liked (13)

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenus
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociaux
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]
 
Online Reputation Management
Online Reputation ManagementOnline Reputation Management
Online Reputation Management
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Ppt
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social Media
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vf
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPT
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blanc
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by Vanksen
 

More from nextlib

Nio
NioNio
Nio
nextlib
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
nextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
nextlib
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
nextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
nextlib
 
Social Graph
Social GraphSocial Graph
Social Graph
nextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
nextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
nextlib
 
SVD review
SVD reviewSVD review
SVD review
nextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
nextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
nextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
nextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
nextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Bigtable
BigtableBigtable
Bigtable
nextlib
 

More from nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Recently uploaded

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 

Recently uploaded (20)

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 

A Content-Driven Reputation System for the Wikipedia

  • 1. Content-Driven Author Reputation and Text Trust for the Wikipedia Luca de Alfaro UC Santa Cruz Joint work with Bo Adler, Ian Pye, Caitlin Sadowski (UCSC) Wikimania, August 2007
  • 2. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions.
  • 3. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions. Text Trust: • Goal: provide a measure of the reliability of the text. • Method: computed from the reputation of the authors who create and revise the text.
  • 4. Reputation: Our guiding principles • Do not alter the Wikipedia user experience – Compute reputation from content evolution, rather than user-to-user comments. • Be welcoming to all users – Never publicly display user reputation values. Authors know only their own reputation. • Be objective – Rely on content evolution rather than comments. – Quantitatively evaluate how well it works.
  • 5. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article time
  • 6. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time
  • 7. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time B builds on A’s edit
  • 8. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time B builds on A’s edit
  • 9. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time + B builds on A’s edit - C reverts to A’s version
  • 10. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B
  • 11. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3
  • 12. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then B re-instate A’s work, it is B’s reputation that suffers.
  • 13. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then + B re-instate A’s work, it is B’s - reputation that suffers. others?
  • 14. Validation: Does our reputation have predictive value? Time Article 1 Article 2 Article 3 Article 4 ... = edits by user A
  • 15. Validation: Does our reputation have predictive value? Time Article 1 Article 2 E Article 3 Article 4 ... The longevity of an edit E The reputation of author A depends on the history at the time of an edit E depends after the edit. on the history before the edit. Can we show a correlation between author reputation and edit longevity ?
  • 16. Building a content-driven reputation system for Wikipedia This is a summary; for details see: B.T. Adler, L. de Alfaro. A Content Driven Reputation System for the Wikipedia. In Proc. of WWW 2007.
  • 17. What is a “contribution”? Text Edit bla ei bla yak bla bla buy viagra! bla yak ei yak bla bla bla We measure how long the added We measure how long the “edit” text survives. (reorganization) survives. Based on text Based on edit distance. tracking.
  • 18. Text version 9 bla bla wuga boink 5 8 9 6 version 10 bla bla wuga wuga wuga boink 5 8 10 10 9 6 We label each word with the version where it was introduced. This enables us to keep track of how long it lives.
  • 19. Text: the destiny of a contribution of words number time (versions) Amount of Amount of new text surviving text The life of the text introduced at a revision.
  • 20. Text: Longevity Tk j-k Tk ¢ α text of words number time j k (versions) • Text longevity: the α text 2 [0,1] that yields the best geometrical approximation for the amount of residual text. • Short-lived text: α text < 0.2 (at most 20% of the text makes it from one version to the next).
  • 21. Text: Reputation update Tk Tj of words number time j k (versions) Ak Aj (authors) As a consequence of edit j, we increase the reputation of Ak by an amount proportional to Tj and to the reputation of Aj
  • 22. Measuring surviving text “Dead” text “Live” text Version wuga boing bla ble stored as “dead” 9 7 9 6 6 wuga boing bla ble buy viagra now! 10 7 9 6 6 10 10 10 t es h bc at m wuga boing bla ble 11 7 9 6 6 We track authorship of deleted text, and we match the text of new versions both with live and with dead text.
  • 23. Edit k-1 d(k - 1, k) k judged d(k d(k, j) -1, k<j j) j judge We compute the edit distance between versions k-1, k, and j, with k < j (see paper for details on the distance)
  • 24. Edit: good or bad? the past k judged k-1 the past k-1 k judged , j) d(k d(k d(k, j) d(k -1, -1, j) j) j j judge judge the future the future k is good: d(k-1, j) > d(k, j) k is bad: d(k-1, j) < d(k, j) “k went towards the future” “k went against the future”
  • 25. Edit: Longevity the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j k ogr )-d (k,j es s The fraction of change that is in ) ” the same direction of the future. α edit ' 1: k is a good edit • j α edit ' -1: k is reverted • the future
  • 26. Edit: Updating reputation the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j kA ogr )-d (k,j es s k Reputation update: ) ” The reputation of Ak • increases if α edit > 0, j Aj • decreases if α edit < 0. the future (see paper for details)
  • 27. Data Sets • English till Feb 07 1,988,627 pages, 40,455,416 versions • French till Feb 07 452,577 pages, 5,643,636 versions • Italian till May 07 301,584 pages, 3,129,453 versions The entire Wikipedias, with the whole history, not just a sample (we wanted to compute the reputation using all edits of each user).
  • 28. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 0 16.922 93.11 91.65 89.15 83.76 73.53 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14
  • 29. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 low 0 16.922 93.11 91.65 89.15 83.76 73.53 rep 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14 Short-Lived
  • 30. Predictive power of low reputation Low-reputation: Lower 20% of range Short-lived edits Short-lived text α text α edit · 0.2 · -0.8 (less than 20% (almost entirely undone) survives each revision)
  • 31. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong Old text is colored according New text is colored to the reputation of its original according to the author, and of all subsequent reputation of A revisors (including A).
  • 32. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong • On the English Wikipedia, we should be able to spot untrusted content with over 80% recall and 60% precision! – In fact, we do even better than this, as new content is always flagged lower trust (see next).
  • 34. Text trust: How is “Fogh” spelled?
  • 35. Text Trust: more examples from the demo
  • 36. Text Trust: Details Trust depends on: • Authorship: Author lends 50% of their reputation to the text they create. – Thus, even text from high-rep authors is only medium- rep when added: high trust is achieved only via multiple reviews, never via a single author. • Revision: When an author of reputation r preserves a word of trust t < r, the word increases in trust to t + 0.3(r – t) • The algorithms still need fine-tuning.
  • 37. From fresh to trusted text
  • 38. From fresh to trusted text
  • 39. From fresh to trusted text
  • 40. From fresh to trusted text
  • 41. From fresh to trusted text
  • 42. Batch Implementation periodic xml dumps (to initialize) edit feed (to keep updated) Trust server Wikipedia servers • No need to affect the main Wikipedia servers • People can click “check trust” and visit the trust server. • Good for experimenting with new ideas • Necessary to color the past (come up to speed).
  • 43. On-Line Implementation Process edits as they arrive: • Benefit: real-time colorization of text • Need to integrate the code in MediaWiki • Time to process an edit: < 1s (not much longer than parsing it). • Storage required: proportional to the size of the last revision (not to the total history size!) • Can be easily used for other Wikis
  • 44. My questions: • Feedback? • Do you like it? • Should we try to set up a “trust server” with an edit feed from the Wikipedia? • Try the demo: http://trust.cse.ucsc.edu/ Your questions?