SlideShare a Scribd company logo
1 of 67
Download to read offline
Vandalism Detection in Wikidata
Stefan Heindorf1, Martin Potthast2, Benno Stein2, Gregor Engels1
CIKM 2016
October 25, 2016
1 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
Motivation
Vandalism Detection in Wikidata Stefan Heindorf 2
3Stefan HeindorfVandalism Detection in Wikidata
3Stefan HeindorfVandalism Detection in Wikidata
Item head
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
(Feb 22, 2013)
(May 13, 2013)
(May 30, 2013)
Item head
Item body
Revisions
3Stefan HeindorfVandalism Detection in Wikidata
Item head
Item body
Revisions
Why is it a problem?
4
Patrolling Reverting Warning Blocking Protecting
• Over 2 Mio manual edits per month
• A lot of tedious work
• Vandalism is not detected in time
Stefan HeindorfVandalism Detection in Wikidata
Research Question
How to detect damaging changes to
crowdsourced knowledge bases?
5Stefan HeindorfVandalism Detection in Wikidata
Our Approach
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
3. Experiment with ML  Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 6
Our Approach
1. Label Dataset  Vandalism Corpus [SIGIR’15]
2. Study Vandalism Characteristics  47 Features
3. Experiment with ML  Multiple-Instance Learning
4. Compare with state of the art  2 Baselines
Vandalism Detection in Wikidata Stefan Heindorf 6
Corpus [SIGIR ’15]
Revisions over time
7
Corpus [SIGIR ’15]
Revisions over time
7Month
Corpus [SIGIR ’15]
Revisions over time
7Month
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
24 million manual revisions
Corpus [SIGIR ’15]
Revisions over time
7Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
Validation
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Corpus [SIGIR ’15]
Revisions over time
7
Item head
(1.3% vandalism)
Item body
(0.2% vandalism)
Training
TestValidation
Month
103,000 vandalism revisions
24 million manual revisions
 0.4% vandalism
Content Features
11 Character features (e.g., lowerCaseRatio, digitRatio)
9 Word features (e.g., badWordRatio)
4 Sentence features (e.g., commentSitelinkSimilarity)
3 Statement features (e.g., propertyFrequency)
Context Features
10 User features (e.g., userCountry)
2 Item features (e.g., logItemFrequency)
8 Revision features (e.g., revisionTag, revisionLanguage)
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Features (47 in total)
Stefan Heindorf 8Vandalism Detection in Wikidata
revisionTag Vand. Total Prob.
Rev. with tags 52 T 8,619 T 0.60%
By abuse filter 49 T 122 T 39.90%
By editing tools 3 T 8,496 T 0.03%
Rev. w/o tags 52 T 15,386 T 0.34%
revisionTag
Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
Multiple-Instance Learning
• Observation: Vandalism seldom occurs in isolation
• Idea: Apply Multiple-Instance Learning
Vandalism Detection in Wikidata Stefan Heindorf 9
22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha)
22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):))
12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
Session 1
Session 2
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10Vandalism Detection in Wikidata
Stefan Heindorf
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
ORES
FILTER
PR-AUC: 0.491
ROC-AUC: 0.991
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
 Detect and revert 30% vandalism
fully automatically
ORES
FILTER
Test Dataset (0.2% vandalism)
WDVD vs. Baselines
• WDVD (our approach)
Wikidata Vandalism Detector
• FILTER (baseline)
Wikidata Abuse Filter
• ORES (baseline)
Objective Revision Evaluation
Service
10
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Precision
Recall
Vandalism Detection in Wikidata
Stefan Heindorf
 Detect and revert 30% vandalism
fully automatically
ORES
FILTER
• Reduce workload by factor 10
(precision 2% instead of 0.2%)
 Still find 98.8% of all vandalism
Test Dataset (0.2% vandalism)
Conclusion and Outlook
Stefan Heindorf 11Vandalism Detection in Wikidata
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Stefan Heindorf 11Vandalism Detection in Wikidata
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Acknowledgement
• German Research Foundation (DFG)
• SIGIR Student Travel Grant
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Conclusion and Outlook
Conclusion
• Vandalism: Concentration on item heads (currently)
• Features: Content & Context
• Model: Multiple-Instance
• PR-AUC: 0.491
• ROC-AUC: 0.991
Outlook
• Goal: Better detection (on item bodies)
• Idea: Double-check with other sources
Acknowledgement
• German Research Foundation (DFG)
• SIGIR Student Travel Grant
Stefan Heindorf 11Vandalism Detection in Wikidata
Code + Data:
http://www.heindorf.me/
wdvd.html
Join the competition:
Vandalism Detection @WSDM Cup 2017
http://www.wsdm-cup-2017.org/
Thank you!

More Related Content

Recently uploaded

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Vandalism Detection in Wikidata (CIKM '16 Best Paper)

  • 1. Vandalism Detection in Wikidata Stefan Heindorf1, Martin Potthast2, Benno Stein2, Gregor Engels1 CIKM 2016 October 25, 2016 1 2
  • 2. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 3. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 4. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 5. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 6. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 7. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 8. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 9. Motivation Vandalism Detection in Wikidata Stefan Heindorf 2
  • 11. 3Stefan HeindorfVandalism Detection in Wikidata Item head
  • 12. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body
  • 13. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body Revisions
  • 14. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 15. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 16. 3Stefan HeindorfVandalism Detection in Wikidata (Feb 22, 2013) (May 13, 2013) (May 30, 2013) Item head Item body Revisions
  • 17. 3Stefan HeindorfVandalism Detection in Wikidata Item head Item body Revisions
  • 18. Why is it a problem? 4 Patrolling Reverting Warning Blocking Protecting • Over 2 Mio manual edits per month • A lot of tedious work • Vandalism is not detected in time Stefan HeindorfVandalism Detection in Wikidata
  • 19. Research Question How to detect damaging changes to crowdsourced knowledge bases? 5Stefan HeindorfVandalism Detection in Wikidata
  • 20. Our Approach Vandalism Detection in Wikidata Stefan Heindorf 6
  • 21. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] Vandalism Detection in Wikidata Stefan Heindorf 6
  • 22. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features Vandalism Detection in Wikidata Stefan Heindorf 6
  • 23. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features 3. Experiment with ML  Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 6
  • 24. Our Approach 1. Label Dataset  Vandalism Corpus [SIGIR’15] 2. Study Vandalism Characteristics  47 Features 3. Experiment with ML  Multiple-Instance Learning 4. Compare with state of the art  2 Baselines Vandalism Detection in Wikidata Stefan Heindorf 6
  • 26. Corpus [SIGIR ’15] Revisions over time 7Month
  • 27. Corpus [SIGIR ’15] Revisions over time 7Month
  • 28. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions
  • 29. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions 24 million manual revisions
  • 30. Corpus [SIGIR ’15] Revisions over time 7Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 31. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 32. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 33. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 34. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training Validation Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 35. Corpus [SIGIR ’15] Revisions over time 7 Item head (1.3% vandalism) Item body (0.2% vandalism) Training TestValidation Month 103,000 vandalism revisions 24 million manual revisions  0.4% vandalism
  • 36. Content Features 11 Character features (e.g., lowerCaseRatio, digitRatio) 9 Word features (e.g., badWordRatio) 4 Sentence features (e.g., commentSitelinkSimilarity) 3 Statement features (e.g., propertyFrequency) Context Features 10 User features (e.g., userCountry) 2 Item features (e.g., logItemFrequency) 8 Revision features (e.g., revisionTag, revisionLanguage) Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata
  • 37. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag
  • 38. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 39. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 40. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 41. Features (47 in total) Stefan Heindorf 8Vandalism Detection in Wikidata revisionTag Vand. Total Prob. Rev. with tags 52 T 8,619 T 0.60% By abuse filter 49 T 122 T 39.90% By editing tools 3 T 8,496 T 0.03% Rev. w/o tags 52 T 15,386 T 0.34% revisionTag
  • 42. Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9
  • 43. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9
  • 44. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
  • 45. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported
  • 46. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 47. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 48. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 49. Multiple-Instance Learning • Observation: Vandalism seldom occurs in isolation • Idea: Apply Multiple-Instance Learning Vandalism Detection in Wikidata Stefan Heindorf 9 22:35, 11 September 2013 184.19.64.111 (talk) . . (Changed English label: Barack Obama Aloha) 22:35, 11 September 2013 184.19.64.111 (talk) . . (Added English alias: Lulu:):):):):):):)) 12:05, 11 September 2013 MatmaBot (talk | contribs) . . (Changed Polish description: imported Session 1 Session 2
  • 50. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector 10Vandalism Detection in Wikidata Stefan Heindorf
  • 51. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter 10Vandalism Detection in Wikidata Stefan Heindorf
  • 52. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10Vandalism Detection in Wikidata Stefan Heindorf
  • 53. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf Test Dataset (0.2% vandalism)
  • 54. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf FILTER Test Dataset (0.2% vandalism)
  • 55. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 56. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 57. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER Test Dataset (0.2% vandalism)
  • 58. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf ORES FILTER PR-AUC: 0.491 ROC-AUC: 0.991 Test Dataset (0.2% vandalism)
  • 59. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf  Detect and revert 30% vandalism fully automatically ORES FILTER Test Dataset (0.2% vandalism)
  • 60. WDVD vs. Baselines • WDVD (our approach) Wikidata Vandalism Detector • FILTER (baseline) Wikidata Abuse Filter • ORES (baseline) Objective Revision Evaluation Service 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Precision Recall Vandalism Detection in Wikidata Stefan Heindorf  Detect and revert 30% vandalism fully automatically ORES FILTER • Reduce workload by factor 10 (precision 2% instead of 0.2%)  Still find 98.8% of all vandalism Test Dataset (0.2% vandalism)
  • 61. Conclusion and Outlook Stefan Heindorf 11Vandalism Detection in Wikidata
  • 62. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Stefan Heindorf 11Vandalism Detection in Wikidata
  • 63. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html
  • 64. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html
  • 65. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/
  • 66. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Acknowledgement • German Research Foundation (DFG) • SIGIR Student Travel Grant Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/
  • 67. Conclusion and Outlook Conclusion • Vandalism: Concentration on item heads (currently) • Features: Content & Context • Model: Multiple-Instance • PR-AUC: 0.491 • ROC-AUC: 0.991 Outlook • Goal: Better detection (on item bodies) • Idea: Double-check with other sources Acknowledgement • German Research Foundation (DFG) • SIGIR Student Travel Grant Stefan Heindorf 11Vandalism Detection in Wikidata Code + Data: http://www.heindorf.me/ wdvd.html Join the competition: Vandalism Detection @WSDM Cup 2017 http://www.wsdm-cup-2017.org/ Thank you!