SlideShare a Scribd company logo
Vandalism Detection in Wikidata
Stefan Heindorf1, Martin Potthast2, Benno Stein2, Gregor Engels1
CIKM 2016
October 25, 2016
1 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
3Stefan HeindorfVandalism Detection in Wikidata
3Stefan HeindorfVandalism Detection in Wikidata
Item head
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
Revisions
Why is it a problem?
4
Patrolling Reverting Warning Blocking Protecting
• Over 2 Mio manual edits per month
• A lot of tedious work
• Vandalism is not detected in time
Stefan HeindorfVandalism Detection in Wikidata
Research Question
How to detect damaging changes to
crowdsourced knowledge bases?
5Stefan HeindorfVandalism Detection in Wikidata
Our Approach
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
3. Experiment with ML  Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
3. Experiment with ML  Multiple-Instance Learning
4. Compare with state of the art  2 Baselines
Vandalism Detection in Wikidata Stefan Heindorf 6
Corpus [SIGIR ’15]
Revisions over time
7
Corpus [SIGIR ’15]
Revisions over time
7Month
Corpus [SIGIR ’15]
Revisions over time
7Month
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
24 million manual revisions
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
Validation
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
TestValidation
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Content Features
11 Character features (e.g., lowerCaseRatio, digitRatio)
9 Word features (e.g., badWordRatio)
4 Sentence features (e.g., commentSitelinkSimilarity)
3 Statement features (e.g., propertyFrequency)
Context Features
10 User features (e.g., userCountry)
2 Item features (e.g., logItemFrequency)
8 Revision features (e.g., revisionTag, revisionLanguage)
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
PR-AUC: 0.491
ROC-AUC: 0.991
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
 Detect and revert 30% vandalism
fully automatically
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
 Detect and revert 30% vandalism
fully automatically
ORES
FILTER
• Reduce workload by factor 10
(precision 2% instead of 0.2%)
 Still find 98.8% of all vandalism
Test Dataset (0.2% vandalism)
Conclusion and Outlook
Stefan Heindorf 11Vandalism Detection in Wikidata
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Stefan Heindorf 11Vandalism Detection in Wikidata
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Acknowledgement
• German Research Foundation (DFG)
• SIGIR Student Travel Grant
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Acknowledgement
• German Research Foundation (DFG)
• SIGIR Student Travel Grant
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Thank you!

More Related Content

Recently uploaded

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 

Recently uploaded (20)

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Vandalism Detection in Wikidata (CIKM '16 Best Paper)

  • 1. Vandalism Detection in Wikidata Stefan Heindorf1, Martin Potthast2, Benno Stein2, Gregor Engels1 CIKM 2016 October 25, 2016 1 2
  • 2. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 3. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 4. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 5. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 6. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 7. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 8. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 9. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 11. 3Stefan HeindorfVandalism Detection in Wikidata Item head
  • 12. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body
  • 13. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body Revisions
  • 14. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 15. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 16. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 17. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body Revisions
  • 18. Why is it a problem? 4 Patrolling Reverting Warning Blocking Protecting • Over 2 Mio manual edits per month • A lot of tedious work • Vandalism is not detected in time Stefan HeindorfVandalism Detection in Wikidata
  • 19. Research Question How to detect damaging changes to crowdsourced knowledge bases? 5Stefan HeindorfVandalism Detection in Wikidata
  • 20. Our Approach Vandalism Detection in Wikidata Stefan Heindorf 6
  • 21. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] Vandalism Detection in Wikidata Stefan Heindorf 6
  • 22. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features Vandalism Detection in Wikidata Stefan Heindorf 6
  • 23. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features 3. Experiment with ML  Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 6
  • 24. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features 3. Experiment with ML  Multiple-Instance Learning 4. Compare with state of the art  2 Baselines Vandalism Detection in Wikidata Stefan Heindorf 6
  • 26. Corpus [SIGIR ’15] Revisions over time 7Month
  • 27. Corpus [SIGIR ’15] Revisions over time 7Month
  • 28. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions
  • 29. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions 24 million manual revisions
  • 30. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 31. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 32. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 33. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 34. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training Validation Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 35. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training TestValidation Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 36. Content Features 11 Character features (e.g., lowerCaseRatio, digitRatio) 9 Word features (e.g., badWordRatio) 4 Sentence features (e.g., commentSitelinkSimilarity) 3 Statement features (e.g., propertyFrequency) Context Features 10 User features (e.g., userCountry) 2 Item features (e.g., logItemFrequency) 8 Revision features (e.g., revisionTag, revisionLanguage) Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata
  • 37. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag
  • 38. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 39. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 40. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 41. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 42. Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9
  • 43. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9
  • 44. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
  • 45. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
  • 46. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 47. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 48. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 49. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 50. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector 10Vandalism Detection in Wikidata Stefan Heindorf
  • 51. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter 10Vandalism Detection in Wikidata Stefan Heindorf
  • 52. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10Vandalism Detection in Wikidata Stefan Heindorf
  • 53. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf Test Dataset (0.2% vandalism)
  • 54. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf FILTER Test Dataset (0.2% vandalism)
  • 55. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 56. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 57. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 58. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER PR-AUC: 0.491 ROC-AUC: 0.991 Test Dataset (0.2% vandalism)
  • 59. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf  Detect and revert 30% vandalism fully automatically ORES FILTER Test Dataset (0.2% vandalism)
  • 60. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf  Detect and revert 30% vandalism fully automatically ORES FILTER • Reduce workload by factor 10 (precision 2% instead of 0.2%)  Still find 98.8% of all vandalism Test Dataset (0.2% vandalism)
  • 61. Conclusion and Outlook Stefan Heindorf 11Vandalism Detection in Wikidata
  • 62. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Stefan Heindorf 11Vandalism Detection in Wikidata
  • 63. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html
  • 64. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html
  • 65. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/
  • 66. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Acknowledgement • German Research Foundation (DFG) • SIGIR Student Travel Grant Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/
  • 67. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Acknowledgement • German Research Foundation (DFG) • SIGIR Student Travel Grant Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/ Thank you!