SlideShare a Scribd company logo
1 of 23
STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
STiki = Huggle++ 2 STiki = Huggle, but: CENTRALIZED: STiki is always scoring edits, in bot-like fashion. QUEUING: STiki uses 15+ ML-features to set presentation order (not a static rule set) ,[object Object],.
Outline/Summary 3 Vandalism detection methodology [6]  Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
Metadata 4 Wikipedia provides metadata via dumps/API:
5 Labeling Vandalism ROLLBACK is used to label edits as vandalism: Only true-rollback, no software-based ones Edit summaries used to locate (Native, Huggle, Twinkle, ClueBot) Bad ones= {OFF. EDITS}, others = {UNLABELED} Why rollback? Automated (v. manual) High-confidence Per case (vs. definition) Why do edits need labels?: ,[object Object]
 (2) Building block of reputation buildingPrevalence/Source of Rollbacks
Simple Features ,[object Object]
Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
Edit Time, Day-of-Week 7 Use IP-geo-location  data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
Time-Since (TS)… 8 High-edit pages most often vandalized ≈2% of pages have 5+ OEs, yet these pages have 52% of all edits Other work [3] has shown these are also articles most visited ,[object Object]
“Registration”: time-stamp of first edit made by user
Sybil-attack to abuse benefits?,[object Object]
Huge contributors, but rarely vandalize,[object Object]
PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily)  Rep(group) =   Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
Category Reputation 13 ,[object Object]
Wiki provides cats./memberships – use only topical.
97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
Editor Reputation 14 Straightforward use of the rep() function, one- editor groups UnLbl UnLbl CDF of Editor Reputation ,[object Object]
Mediocre performance. Meaningful correlation with other features, however.,[object Object]
16 Off-line Performance ,[object Object]
Use as an intelligent routing (IR) toolRecall: % total OEs 	classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique

More Related Content

Viewers also liked

Viewers also liked (13)

Shimon hameiri the art of holography and photography
Shimon hameiri   the art of holography and photographyShimon hameiri   the art of holography and photography
Shimon hameiri the art of holography and photography
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Security Testing For Web Applications
Security Testing For Web ApplicationsSecurity Testing For Web Applications
Security Testing For Web Applications
 
Load Runner
Load RunnerLoad Runner
Load Runner
 
vandalism- research work
vandalism- research workvandalism- research work
vandalism- research work
 
Rest Console
Rest ConsoleRest Console
Rest Console
 
Selenium
SeleniumSelenium
Selenium
 
Havij
HavijHavij
Havij
 
Automation Testing
Automation TestingAutomation Testing
Automation Testing
 
What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?
 
Web Services Testing
Web Services TestingWeb Services Testing
Web Services Testing
 
Karahasan sa paaralan
Karahasan sa paaralanKarahasan sa paaralan
Karahasan sa paaralan
 
Children’s rights power
Children’s rights powerChildren’s rights power
Children’s rights power
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedYury Chemerkin
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinGiulio Vian
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2Majong DevJfu
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsGiulio Vian
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Roberto Casadei
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysislienhard
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2bdemchak
 
Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata (20)

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learned
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon Berlin
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOps
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
 
Machine programming
Machine programmingMachine programming
Machine programming
 
Rakuten openstack
Rakuten openstackRakuten openstack
Rakuten openstack
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

  • 1. STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
  • 2.
  • 3. Outline/Summary 3 Vandalism detection methodology [6] Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
  • 4. Metadata 4 Wikipedia provides metadata via dumps/API:
  • 5.
  • 6. (2) Building block of reputation buildingPrevalence/Source of Rollbacks
  • 7.
  • 8. Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
  • 9. Edit Time, Day-of-Week 7 Use IP-geo-location data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
  • 10.
  • 11. “Registration”: time-stamp of first edit made by user
  • 12.
  • 13.
  • 14. PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily) Rep(group) = Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
  • 15. Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
  • 16.
  • 17. Wiki provides cats./memberships – use only topical.
  • 18. 97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
  • 19.
  • 20.
  • 21.
  • 22. Use as an intelligent routing (IR) toolRecall: % total OEs classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
  • 23. STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique
  • 24.
  • 25. Popped: GUI client shows likely vandalism first
  • 26.
  • 27. STiki Performance 20 Competition inhibits maximal performance Metric: Hit-rate (% of edits displayed that are vandalism) Offline analysis shows it could be 50%+ Competing (often autonomous) tools make it ≈10% STiki successes and use-cases Has reverted over 5000+ instances of vandalism May be more appropriate in less patrolled installations Any of Wikipedia’s foreign language editions Embedded vandalism: That escaping initial detection. Median age of STiki revert is 4.25 hours, 200× RBs. Further, average STiki revert had 210 views during active duration.
  • 28. Alternative Uses 21 All code is available [4] and open source (Java) Backend (server-side) re-use Large portion of MediaWiki API implemented (bots) Trivial to add new features (including NLP ones) Frontend (client-side) re-use Useful whenever edits require human inspection Offline inspection tool for corpus building Data re-use Incorporate vandalism score into more robust tools Willing to provide data to other researchers
  • 29. Crowd-sourcing 22 Shared queue := Pending changes trial Abuse of “pass” by an edit hoarding user Do ‘reviewers’ need to be reviewed? Where does it stop? Multi-layer verification checks to find anomalies Could reviewer reputations also be created? Threshold for queue access? Registered? Auto-confirmed? Or more? Cache-22: Use vs. perceived success More users = more vandalism found. But deep in queue, vandalism unlikely = User abandonment.
  • 30. References 23 [1] S. Hao, N.A. Syed, N. Feamster, A.G. Gray, and S. Krasser. Detecting spammers with SNARE: Spatiotemporal network-level automated reputation engine. In 18th USENIX Security Symposium, 2009 [2] M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, 2008. [3] R. Priedhorsky, J. Chen, S.K. Lam, K. Achier, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP `07, 2007. [4] A.G. West. STiki: A vandalism detection tool for Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:STiki. Software, 2010. [5] A.G. West, A.J. Aviv, J. Chang, and I. Lee. Mitigating spam using spatio-temporal reputation. Technical report UPENN-MS-CIS-10-04, Feb. 2010. [6] A.G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC `10, April 2010.