SlideShare a Scribd company logo
1 of 53
Download to read offline
Content Moderation Across Multiple Platforms
with Capsule Networks and Co-Training
Vani Agarwal
linkedin/in/vani-agarwal-04a02bb
b/
@VaniAgarwal9 fb.com/vani.agarwal30
Dr. Arun Balaji (Chair)
Dr. Ponnurangam Kumaraguru (Co-chair)
2
Thesis Committee
◆ Dr. Rajiv Ratn Shah, IIIT Delhi
◆ Dr. Niharika Sachdeva, InfoEdge
◆ Dr. Arun Balaji Buduru, IIIT Delhi
◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
Demo
3
What is content moderation
4Ref: 11
Different platforms have certain
policies, if content does not
meet the guidelines than
moderation action takes place.
Why content moderation is necessary
5
◆
Posts on different platforms
6
Challenges for content moderation
7
◆ Different needs of different
platforms
◆ Huge amount of content
◆ The way in which content
displayed differs for each
platform
Platforms struggling to moderate content
8
Human
Moderators
suffer PTSD
High
Turnaround
Time
Ref: 1
High Cost
Research Aim
Given posts P = p1
,p2
,. . . ., pk
from domains D = D1
,D2
,. . . .,
Dn
, find a subset of posts which should be flagged for
moderation
9
INPUT OUTPUT
List of items to
be moderated
P’ where P’ ⊆ P
MODEL
Contributions
◆ Comparison of different methods across multiple platforms
◆ Capsule Networks for Content Moderation
◆ Co-training to understand domain adaptability
10
Outline
◆ Data Collection
◆ Comparison of Methods
◆ Capsule Networks
◆ Co-training for Domain Adaptation
◆ Conclusion
11
Data Collection
12
◆ Twitter, Quora, Wikipedia - public datasets
◆ Whisper - combination of public dataset and website
scraping
◆ Reddit - Collected data from subreddits
Data Collection of Reddit
◆ Subreddit r/creepy
▶ violent content
▶ Weak labels
◆ Subreddit r/pics
▶ normal content
▶ Manually checked 200 posts to see if they are problematic or
not
13
Twitter
14
Reddit
15
Wikipedia
“hey punk dont be deleting my stuff,
you know nothing bout the harly drags
so stay out of my shit you stupid nerd,
punk fag female thats all u, bitch”
16
Quora
17
Whisper
18
Dataset Summary
19
Twitter Reddit Wikipedia Whisper Quora
Data
collection
strategy
1. Tweets
related to
protest,
riots[7]
Collected
data from
subreddits:
r/creepy
r/pics
1.Comments
of personal
attacks[3]
Hate
speech
related
posts[4]
and web
scraping
Insincere
questions
asked on
Quora[2]
2. Tweets
related to
Racism,
Sexism[1]
2. Toxic
comments
on talk
page[12]
Dataset Summary
20
Dataset Positive(1) Negative(0) Total Positive
class %
Text / Image
Quora 817 12244 13061 6% Text
Whisper 760 1720 2480 30% Text
Wikipedia1 647 5146 5793 11% Text
Wikipedia2 783 7195 7978 10% Text
Twitter2 1200 2000 3200 37% Text
Twitter1 3619 1052 4671 77% Text + Image
Reddit 2073 2598 4671 44% Text + Image
Data Pre-processing
Tweet - "nice to see that the top trending post by suriya
#TamilNaduBandh #Saithan are located around TamilNadu"
Anonymized Tweet - "nice to see that the top trending post by
<NAME> are located around tamilnadu"
21
Lower case
Remove hashtags, emoticons, punctuations
Named Entity Recognizer
Outline
◆ Data Collection
◆ Comparison of Methods
◆ Capsule Networks
◆ Co-training for Domain Adaptation
◆ Conclusion
22
Methods
23
Text
Models
Logistic Regression[5] LR_machina
Logistic Regression[6] LR_Badjatiya
Multi Layer Perceptron[5] MLP
Gated Recurrent Unit[7] GRU
Long short term memory[7] LSTM
Convolutional Neural Network[6] CNN
Capsule Network CapsNet
Fusion
Models
LSTM + (Object + Scene recognition) LstmFusion
CapsNet + (Object + Scene recognition) CapsFusion
Outline
◆ Data Collection
◆ Comparison of Methods
◆ Capsule Networks
◆ Co-training for Domain Adaptation
◆ Conclusion
24
Capsule Network Intuition
25Ref: 10
◆ It is not a human face.
◆ Capsule networks
understand spatial
orientation.
◆ Max Pooling loses
information.
Capsule Working
26Ref: 10
Why CapsNet?
◆ Capsules output vector.
◆ Each Capsule decides which feature to
pass to higher capsule.
◆ Prominent features are transformed from
one Capsule to another using routing
protocol.
◆ This helps to learn semantic meaning of
text data well.
27
Capsule Network Architecture
28
Experimental Design
- Evaluation parameters used
- Average Precision or Area under PR curve
- Macro F1
- 5 fold cross validations
- Grid search on various Hyper-parameters
- Train set - 80%
- Test set - 20%
29
Text All Model Results
30
Method Macro F1 Average Precision
GRU 0.6977 0.6983
LR_Badjatiya 0.7560 0.7260
LR_machina 0.6772 0.3805
CNN 0.6400 0.6960
LSTM 0.7076 0.7057
MLP 0.7300 0.6916
CapsNet 0.8254 0.7695
Performance in Twitter2 dataset
Text Model Results on all Datasets
31
Dataset Method Macro F1 Average Precision
Quora CapsNet 0.6959 0.9269
LSTM 0.6731 0.6560
Reddit CapsNet 0.7967 0.7373
LSTM 0.7306 0.7321
Twitter1 CapsNet 0.7953 0.7695
LSTM 0.8635 0.6748
Twitter2 CapsNet 0.8254 0.7695
LSTM 0.7076 0.7057
Text Model Results on all Datasets
32
Dataset Method Macro F1 Average Precision
Whisper CapsNet 0.9856 0.9783
LSTM 0.9816 0.9816
Wikipedia1 CapsNet 0.8361 0.9195
LSTM 0.7775 0.7413
Wikipedia2 CapsNet 0.8361 0.9195
LSTM 0.8098 0.7698
Average CapsNet 0.8244 0.8600
LSTM 0.7919 0.7516
Fusion Model Architecture
33
Fusion Model Results
34
Dataset Method Macro F1 Average Precision
Twitter1 LstmFusion 0.6711 0.8381
CapsFusion 0.6968 0.8613
Reddit LstmFusion 0.7529 0.7566
CapsFusion 0.8141 0.8149
Takeaways from Capsule Network Model
◆ Capsule networks perform better than LSTM by 10.54% in
average precision.
◆ CapsFusion model performs better than LstmFusion by
5.2% in average precision.
35
Error Analysis
◆ Manual analysis of 50 instances marked wrong by LSTM but
correctly by CapsNet
◆ Findings:
▶ False Positives by LSTM / correctly classified by CapsNet
“ :I think ``YOU RACIST CUNT`` qualifies as defamation. diff. I didn't edit your
post, that was someone else.”
▶ False Negatives by LSTM / correctly classified by CapsNet
“` I had no interest in getting ``under your skin``. It's you and your fellow admins
who got under my skin. So well done. It doesn't matter anymore. — `”
36
Qualitative Analysis
Text - “Where are the activists and foot soldiers when k'tak
bleeds in silence.”
Label - Positive
DeepSHAP[8] results
37
Qualitative Analysis
Text - “The proud hero of kashmir! The hero of freedom
struggle.”
Label - Negative
DeepSHAP results
38
Outline
◆ Data Collection
◆ Comparison of Methods
◆ Capsule Networks
◆ Co-training for Domain Adaptation
◆ Conclusion
39
40Ref: [9]
Example of Domain Adaptation
41
Example of Domain Adaptation from CODA paper (Chen et
al) on reviews from different domains
Our Co-training for Domain Adaptation Algorithm
42
Co-training for Domain Adaptation
43
Twitter1
Reddit
(20%)
+
Training
Testing
Twitter 2
(20%)
Whisper
(20%)
M1
M2
+
+
. . .
M6
Reddit
(80%)
Twitter2
(80%)
Whisper
(80%)
. . .
Co-training for Domain Adaptation Results
44
Trained on
(Domain1)
Co-trained on
(Domain2)
Method Macro F1 Average
Precision
Twitter1
Quora
CapsNet
0.6739 0.6723
Reddit 0.6690 0.5581
Twitter2 0.6321 0.6232
Whisper 0.9167 0.9201
Wikipedia1 0.7496 0.7480
Wikipedia2 0.7341 0.7457
Tradeoff Analysis
45
Domain1-Twitter1 and Domain2-Reddit
Co-training Results
◆ Just by augmenting with 20% samples we face reduction in
performance by 17% compared to a model trained on 100%
samples.
◆ As the percentage of Domain2 samples added to Domain1
increases, models performance improves.
◆ Therefore, if we only have a small amount of labeled data
co-training for domain adaptation is a viable option.
46
Outline
◆ Data Collection
◆ Comparison of Methods
◆ Capsule Networks
◆ Co-training for Domain Adaptation
◆ Conclusion
47
Conclusion
48
◆ We perform Multi-platform comparison for Content
moderation.
◆ Capsule Networks outperformed existing methods for
content moderation.
◆ Co-training for domain adaptation, a cost-effective solution
for annotating data.
Challenges, Limitation, Future Work
49
◆ Some datasets use weak labels,
▶ Future work: see if stronger labels perform better than weak
labels.
◆ Different platforms have different style of expressing content and
also have different moderation policies
▶ Co-training may not work if the policies of platforms do not
align.
◆ It was challenging to collect Reddit dataset
▶ Quarantined subreddits no longer available
◆ We plan to extend the work to video content on various platforms.
Acknowledgement
◆ Committee Members
◆ Indira Sen, GESIS
◆ Snehal Gupta, Asmit Kumar Singh, Shubham Singh
◆ Members of Precog
◆ Family and friends
50
References
[1]Twitter data- https://github.com/zeerakw/hatespeech
[2] Quora data-
https://www.kaggle.com/c/quora-insincere-questions-classificati
on/data
[3] Wikipedia data-
https://figshare.com/articles/Wikipedia_Detox_Data/4054689
[4] Whisper data-
https://github.com/Mainack/hatespeech-data-HT-2017
[5] ExMachina - https://arxiv.org/abs/1610.08914
[6] Badjatiya- https://arxiv.org/abs/1706.00188
51
References
[7] LSTM-
http://precog.iiitd.edu.in/pubs/empowering-first-responders.pdf
[8] DeepShap - https://github.com/slundberg/shap
[9] Co-training for domain adaptation -
https://papers.nips.cc/paper/4433-co-training-for-domain-adapta
tion.pdf
52
Thanks!
vani17068@iiitd.ac.in
@VaniAgarwal9
53

More Related Content

What's hot

Machine learning seminar presentation
Machine learning seminar presentationMachine learning seminar presentation
Machine learning seminar presentationsweety seth
 
Cracking The YouTube Algorithm 2020
Cracking The YouTube Algorithm 2020Cracking The YouTube Algorithm 2020
Cracking The YouTube Algorithm 2020Matt Gielen
 
Artificial Intelligence - AI For Everyone
Artificial Intelligence - AI For EveryoneArtificial Intelligence - AI For Everyone
Artificial Intelligence - AI For EveryoneSridhar Seshadri
 
Adjustment of inheritance
Adjustment of inheritanceAdjustment of inheritance
Adjustment of inheritanceSadhana28
 
Social media etiquette
Social media etiquette Social media etiquette
Social media etiquette Syed M Zeeshan
 
Real life application of machine learning
Real life application of machine learningReal life application of machine learning
Real life application of machine learningAshikur Rahman
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
How Social Media Can Seriously Help Your Business
How Social Media Can Seriously Help Your BusinessHow Social Media Can Seriously Help Your Business
How Social Media Can Seriously Help Your BusinessSocialCreeper.com
 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysisAnil Shrestha
 
The 7 Biggest Ethical Challenges of Artificial Intelligence
The 7 Biggest Ethical Challenges of Artificial IntelligenceThe 7 Biggest Ethical Challenges of Artificial Intelligence
The 7 Biggest Ethical Challenges of Artificial IntelligenceBernard Marr
 
How artificial intelligence changing the world
How artificial intelligence changing the worldHow artificial intelligence changing the world
How artificial intelligence changing the worldUSM Systems
 
Web 3.0 Breakthrough Technologies
Web 3.0 Breakthrough TechnologiesWeb 3.0 Breakthrough Technologies
Web 3.0 Breakthrough TechnologiesChanHan Hy
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil Richard Titus
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science Ansh Budania
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 

What's hot (20)

Machine learning seminar presentation
Machine learning seminar presentationMachine learning seminar presentation
Machine learning seminar presentation
 
Cracking The YouTube Algorithm 2020
Cracking The YouTube Algorithm 2020Cracking The YouTube Algorithm 2020
Cracking The YouTube Algorithm 2020
 
Artificial Intelligence - AI For Everyone
Artificial Intelligence - AI For EveryoneArtificial Intelligence - AI For Everyone
Artificial Intelligence - AI For Everyone
 
Adjustment of inheritance
Adjustment of inheritanceAdjustment of inheritance
Adjustment of inheritance
 
data science
data sciencedata science
data science
 
Social media etiquette
Social media etiquette Social media etiquette
Social media etiquette
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Real life application of machine learning
Real life application of machine learningReal life application of machine learning
Real life application of machine learning
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
How Social Media Can Seriously Help Your Business
How Social Media Can Seriously Help Your BusinessHow Social Media Can Seriously Help Your Business
How Social Media Can Seriously Help Your Business
 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysis
 
The 7 Biggest Ethical Challenges of Artificial Intelligence
The 7 Biggest Ethical Challenges of Artificial IntelligenceThe 7 Biggest Ethical Challenges of Artificial Intelligence
The 7 Biggest Ethical Challenges of Artificial Intelligence
 
How artificial intelligence changing the world
How artificial intelligence changing the worldHow artificial intelligence changing the world
How artificial intelligence changing the world
 
Web 3.0 Breakthrough Technologies
Web 3.0 Breakthrough TechnologiesWeb 3.0 Breakthrough Technologies
Web 3.0 Breakthrough Technologies
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Data is the new oil
Data is the new oil Data is the new oil
Data is the new oil
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
AI for Science
AI for ScienceAI for Science
AI for Science
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 

Similar to Content Moderation Across Multiple Platforms with Capsule Networks and Co-Training

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language ModelsDataScienceConferenc1
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsGiulio Carducci
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Clarisse Hedglin
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataTech Triveni
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly DistributedC4Media
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projectZenodia Charpy
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...ChemAxon
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
Federated mesos clusters for global data center designs
Federated mesos clusters for global data center designsFederated mesos clusters for global data center designs
Federated mesos clusters for global data center designsKrishna-Kumar
 

Similar to Content Moderation Across Multiple Platforms with Capsule Networks and Co-Training (20)

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
Big Data analytics with Tableau Training by myTectra
Big Data analytics with Tableau Training by myTectraBig Data analytics with Tableau Training by myTectra
Big Data analytics with Tableau Training by myTectra
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Unevenly Distributed
Unevenly DistributedUnevenly Distributed
Unevenly Distributed
 
Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Federated mesos clusters for global data center designs
Federated mesos clusters for global data center designsFederated mesos clusters for global data center designs
Federated mesos clusters for global data center designs
 

More from IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

More from IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Recently uploaded

Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsKineticEngineeringCo
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New HorizonMorshed Ahmed Rahath
 
Circuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringCircuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringKanchhaTamang
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5T.D. Shashikala
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
EMPLOYEE MANAGEMENT SYSTEM FINAL presentation
EMPLOYEE MANAGEMENT SYSTEM FINAL presentationEMPLOYEE MANAGEMENT SYSTEM FINAL presentation
EMPLOYEE MANAGEMENT SYSTEM FINAL presentationAmayJaiswal4
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 

Recently uploaded (20)

Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and Applications
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Circuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineeringCircuit Breaker arc phenomenon.pdf engineering
Circuit Breaker arc phenomenon.pdf engineering
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
EMPLOYEE MANAGEMENT SYSTEM FINAL presentation
EMPLOYEE MANAGEMENT SYSTEM FINAL presentationEMPLOYEE MANAGEMENT SYSTEM FINAL presentation
EMPLOYEE MANAGEMENT SYSTEM FINAL presentation
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 

Content Moderation Across Multiple Platforms with Capsule Networks and Co-Training

  • 1. Content Moderation Across Multiple Platforms with Capsule Networks and Co-Training Vani Agarwal linkedin/in/vani-agarwal-04a02bb b/ @VaniAgarwal9 fb.com/vani.agarwal30 Dr. Arun Balaji (Chair) Dr. Ponnurangam Kumaraguru (Co-chair)
  • 2. 2 Thesis Committee ◆ Dr. Rajiv Ratn Shah, IIIT Delhi ◆ Dr. Niharika Sachdeva, InfoEdge ◆ Dr. Arun Balaji Buduru, IIIT Delhi ◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
  • 4. What is content moderation 4Ref: 11 Different platforms have certain policies, if content does not meet the guidelines than moderation action takes place.
  • 5. Why content moderation is necessary 5 ◆
  • 6. Posts on different platforms 6
  • 7. Challenges for content moderation 7 ◆ Different needs of different platforms ◆ Huge amount of content ◆ The way in which content displayed differs for each platform
  • 8. Platforms struggling to moderate content 8 Human Moderators suffer PTSD High Turnaround Time Ref: 1 High Cost
  • 9. Research Aim Given posts P = p1 ,p2 ,. . . ., pk from domains D = D1 ,D2 ,. . . ., Dn , find a subset of posts which should be flagged for moderation 9 INPUT OUTPUT List of items to be moderated P’ where P’ ⊆ P MODEL
  • 10. Contributions ◆ Comparison of different methods across multiple platforms ◆ Capsule Networks for Content Moderation ◆ Co-training to understand domain adaptability 10
  • 11. Outline ◆ Data Collection ◆ Comparison of Methods ◆ Capsule Networks ◆ Co-training for Domain Adaptation ◆ Conclusion 11
  • 12. Data Collection 12 ◆ Twitter, Quora, Wikipedia - public datasets ◆ Whisper - combination of public dataset and website scraping ◆ Reddit - Collected data from subreddits
  • 13. Data Collection of Reddit ◆ Subreddit r/creepy ▶ violent content ▶ Weak labels ◆ Subreddit r/pics ▶ normal content ▶ Manually checked 200 posts to see if they are problematic or not 13
  • 16. Wikipedia “hey punk dont be deleting my stuff, you know nothing bout the harly drags so stay out of my shit you stupid nerd, punk fag female thats all u, bitch” 16
  • 19. Dataset Summary 19 Twitter Reddit Wikipedia Whisper Quora Data collection strategy 1. Tweets related to protest, riots[7] Collected data from subreddits: r/creepy r/pics 1.Comments of personal attacks[3] Hate speech related posts[4] and web scraping Insincere questions asked on Quora[2] 2. Tweets related to Racism, Sexism[1] 2. Toxic comments on talk page[12]
  • 20. Dataset Summary 20 Dataset Positive(1) Negative(0) Total Positive class % Text / Image Quora 817 12244 13061 6% Text Whisper 760 1720 2480 30% Text Wikipedia1 647 5146 5793 11% Text Wikipedia2 783 7195 7978 10% Text Twitter2 1200 2000 3200 37% Text Twitter1 3619 1052 4671 77% Text + Image Reddit 2073 2598 4671 44% Text + Image
  • 21. Data Pre-processing Tweet - "nice to see that the top trending post by suriya #TamilNaduBandh #Saithan are located around TamilNadu" Anonymized Tweet - "nice to see that the top trending post by <NAME> are located around tamilnadu" 21 Lower case Remove hashtags, emoticons, punctuations Named Entity Recognizer
  • 22. Outline ◆ Data Collection ◆ Comparison of Methods ◆ Capsule Networks ◆ Co-training for Domain Adaptation ◆ Conclusion 22
  • 23. Methods 23 Text Models Logistic Regression[5] LR_machina Logistic Regression[6] LR_Badjatiya Multi Layer Perceptron[5] MLP Gated Recurrent Unit[7] GRU Long short term memory[7] LSTM Convolutional Neural Network[6] CNN Capsule Network CapsNet Fusion Models LSTM + (Object + Scene recognition) LstmFusion CapsNet + (Object + Scene recognition) CapsFusion
  • 24. Outline ◆ Data Collection ◆ Comparison of Methods ◆ Capsule Networks ◆ Co-training for Domain Adaptation ◆ Conclusion 24
  • 25. Capsule Network Intuition 25Ref: 10 ◆ It is not a human face. ◆ Capsule networks understand spatial orientation. ◆ Max Pooling loses information.
  • 27. Why CapsNet? ◆ Capsules output vector. ◆ Each Capsule decides which feature to pass to higher capsule. ◆ Prominent features are transformed from one Capsule to another using routing protocol. ◆ This helps to learn semantic meaning of text data well. 27
  • 29. Experimental Design - Evaluation parameters used - Average Precision or Area under PR curve - Macro F1 - 5 fold cross validations - Grid search on various Hyper-parameters - Train set - 80% - Test set - 20% 29
  • 30. Text All Model Results 30 Method Macro F1 Average Precision GRU 0.6977 0.6983 LR_Badjatiya 0.7560 0.7260 LR_machina 0.6772 0.3805 CNN 0.6400 0.6960 LSTM 0.7076 0.7057 MLP 0.7300 0.6916 CapsNet 0.8254 0.7695 Performance in Twitter2 dataset
  • 31. Text Model Results on all Datasets 31 Dataset Method Macro F1 Average Precision Quora CapsNet 0.6959 0.9269 LSTM 0.6731 0.6560 Reddit CapsNet 0.7967 0.7373 LSTM 0.7306 0.7321 Twitter1 CapsNet 0.7953 0.7695 LSTM 0.8635 0.6748 Twitter2 CapsNet 0.8254 0.7695 LSTM 0.7076 0.7057
  • 32. Text Model Results on all Datasets 32 Dataset Method Macro F1 Average Precision Whisper CapsNet 0.9856 0.9783 LSTM 0.9816 0.9816 Wikipedia1 CapsNet 0.8361 0.9195 LSTM 0.7775 0.7413 Wikipedia2 CapsNet 0.8361 0.9195 LSTM 0.8098 0.7698 Average CapsNet 0.8244 0.8600 LSTM 0.7919 0.7516
  • 34. Fusion Model Results 34 Dataset Method Macro F1 Average Precision Twitter1 LstmFusion 0.6711 0.8381 CapsFusion 0.6968 0.8613 Reddit LstmFusion 0.7529 0.7566 CapsFusion 0.8141 0.8149
  • 35. Takeaways from Capsule Network Model ◆ Capsule networks perform better than LSTM by 10.54% in average precision. ◆ CapsFusion model performs better than LstmFusion by 5.2% in average precision. 35
  • 36. Error Analysis ◆ Manual analysis of 50 instances marked wrong by LSTM but correctly by CapsNet ◆ Findings: ▶ False Positives by LSTM / correctly classified by CapsNet “ :I think ``YOU RACIST CUNT`` qualifies as defamation. diff. I didn't edit your post, that was someone else.” ▶ False Negatives by LSTM / correctly classified by CapsNet “` I had no interest in getting ``under your skin``. It's you and your fellow admins who got under my skin. So well done. It doesn't matter anymore. — `” 36
  • 37. Qualitative Analysis Text - “Where are the activists and foot soldiers when k'tak bleeds in silence.” Label - Positive DeepSHAP[8] results 37
  • 38. Qualitative Analysis Text - “The proud hero of kashmir! The hero of freedom struggle.” Label - Negative DeepSHAP results 38
  • 39. Outline ◆ Data Collection ◆ Comparison of Methods ◆ Capsule Networks ◆ Co-training for Domain Adaptation ◆ Conclusion 39
  • 41. Example of Domain Adaptation 41 Example of Domain Adaptation from CODA paper (Chen et al) on reviews from different domains
  • 42. Our Co-training for Domain Adaptation Algorithm 42
  • 43. Co-training for Domain Adaptation 43 Twitter1 Reddit (20%) + Training Testing Twitter 2 (20%) Whisper (20%) M1 M2 + + . . . M6 Reddit (80%) Twitter2 (80%) Whisper (80%) . . .
  • 44. Co-training for Domain Adaptation Results 44 Trained on (Domain1) Co-trained on (Domain2) Method Macro F1 Average Precision Twitter1 Quora CapsNet 0.6739 0.6723 Reddit 0.6690 0.5581 Twitter2 0.6321 0.6232 Whisper 0.9167 0.9201 Wikipedia1 0.7496 0.7480 Wikipedia2 0.7341 0.7457
  • 46. Co-training Results ◆ Just by augmenting with 20% samples we face reduction in performance by 17% compared to a model trained on 100% samples. ◆ As the percentage of Domain2 samples added to Domain1 increases, models performance improves. ◆ Therefore, if we only have a small amount of labeled data co-training for domain adaptation is a viable option. 46
  • 47. Outline ◆ Data Collection ◆ Comparison of Methods ◆ Capsule Networks ◆ Co-training for Domain Adaptation ◆ Conclusion 47
  • 48. Conclusion 48 ◆ We perform Multi-platform comparison for Content moderation. ◆ Capsule Networks outperformed existing methods for content moderation. ◆ Co-training for domain adaptation, a cost-effective solution for annotating data.
  • 49. Challenges, Limitation, Future Work 49 ◆ Some datasets use weak labels, ▶ Future work: see if stronger labels perform better than weak labels. ◆ Different platforms have different style of expressing content and also have different moderation policies ▶ Co-training may not work if the policies of platforms do not align. ◆ It was challenging to collect Reddit dataset ▶ Quarantined subreddits no longer available ◆ We plan to extend the work to video content on various platforms.
  • 50. Acknowledgement ◆ Committee Members ◆ Indira Sen, GESIS ◆ Snehal Gupta, Asmit Kumar Singh, Shubham Singh ◆ Members of Precog ◆ Family and friends 50
  • 51. References [1]Twitter data- https://github.com/zeerakw/hatespeech [2] Quora data- https://www.kaggle.com/c/quora-insincere-questions-classificati on/data [3] Wikipedia data- https://figshare.com/articles/Wikipedia_Detox_Data/4054689 [4] Whisper data- https://github.com/Mainack/hatespeech-data-HT-2017 [5] ExMachina - https://arxiv.org/abs/1610.08914 [6] Badjatiya- https://arxiv.org/abs/1706.00188 51
  • 52. References [7] LSTM- http://precog.iiitd.edu.in/pubs/empowering-first-responders.pdf [8] DeepShap - https://github.com/slundberg/shap [9] Co-training for domain adaptation - https://papers.nips.cc/paper/4433-co-training-for-domain-adapta tion.pdf 52