SlideShare a Scribd company logo
Sandra Sukarieh
Spam Spam…Spam!
Master’s Seminar 24 January 2020
Prof. Jilles Vreeken
2
This presentation has been identified by
experts to be avoiding MDL !
Viewer discretion is advised.
What Spam Spam...Spam?
3
What Spam Spam...Spam?
4Amazon.de
What Spam Spam...Spam?
5
What Spam Spam...Spam?
6
FH: Best game ever! I
love the pictures and
the quality!
RECOMMENDED!!
JF: I got it as a gift
and I loooooove it
<3
JV: I have never
enjoyed a game like
this one!
SS: This game is
super with a super
quality!
What Spam Spam...Spam?
7
 More than 20 % of Yelp’s reviews are of misleading content with
steady growth and one-third of all consumer reviews on the
Internet are estimated to be misleading [Rayana and Akoglu 2015].
 Spammers are becoming smarter in hiding themselves.
Has anyone noticed the Spam Spam...Spam?
8
Fake Reviews and Likes
• Liu et al. SPEC and SVM classification (EMNLP-CoNLL, 2007).
Suspicious Users
• Jiang et al. CatchSync (KDD, 2014).
Collusion Groups
• Cao et al. SynchroTrap (CCS, 2014 ).
• Beutel et al. CopyCatch (WWW, 2013).
• Xu et al. KNN and transactions history (CIKM, 2013 ).
Another way to deal with Spam Spam...Spam?
9
6 Jan 2020
8-9 Jan 2020
15-17 Dec 2019
Another way to deal with Spam Spam...Spam?
10
6 Jan 2020
8-9 Jan 2020
15-17 Jan 2020
FH: Best game ever! I
love the pictures and
the quality!
RECOMMENDED!!
JF: I got it as a gift
and I loooooove it <3
JV: I have never
enjoyed a game like
this one!
SS: This game is
super with a super
quality!
Spammy Spammy...Spammy… Time Intervals
11
 Not done before!
 Doesn't depend on assumptions that can be easily broken.
 Might help in catching smart spammers!
 Might help in catching one-time spamming campaigns!
 Further results can be reported.
Spammy Spammy...Spammy… Time Intervals
12
𝑡 is a time interval.
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
Spammy Spammy...Spammy… Time Intervals
13
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
 The weight of a time interval.
 Represents the characteristics of the interval itself.
 Defined by three characteristics:
 Density.
 Users Ratio.
 Time Weight.
Time Intervals Weight
14
𝒕 𝟏 𝒕 𝟐
Time Interval Density
Time Intervals Weight
15
Time Interval Time Weight
6 Jan 2020 6 - 8 Jan 2020
Spammy Spammy...Spammy… Time Intervals
16
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
 The pairs score of a time interval.
 Represents the effect of what’s happening in other intervals.
 Defined as the normalized sum of the following:
𝑠𝑐𝑜𝑟𝑒(𝑡, 𝑡′) 𝑡 =
𝑢 ∩ 𝑢′ . ψ 𝑡′
|𝑢 ∪ 𝑢′|
Spammy Spammy...Spammy… Time Intervals
17
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
 The weighted probability of the interval content.
 𝑝𝑟𝑜𝑏 𝑡 𝑝 : the probability of the interval content in the
distribution of the products rates.
 The less the probability, the more spammy the interval is.
 Defined as following:
ψ 𝑝𝑟𝑜𝑏 𝑡 = 1 − 𝑝𝑟𝑜𝑏(𝑡|𝑝)
Spammy Spammy...Spammy… Time Intervals
18
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
Do we need anything else to get the best possible results?????????
Spammy Spammy...Spammy… Time Intervals
19
Reported intervals precision
Spammy Spammy...Spammy… Time Intervals
20
Reported products precision (left) and recall (right)
Spammy Spammy...Spammy… Time Intervals
21
Reported reviews precision (left) and recall (right)
Spammy Spammy...Spammy… Time Intervals
22
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 0.5 ∧ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 75%
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 0.56 ∨ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 85% ∨ ψ 𝑝𝑟𝑜𝑏 𝑡 ≤ 10−3
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ∨ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 𝜇 𝑝𝑎𝑖𝑟𝑠 ∨ ψ 𝑝𝑟𝑜𝑏 𝑡 ≤ 𝜇 𝑝𝑟𝑜𝑏
⇒ 𝑡 is reported as a spammy time interval.
Spammy Spammy...Spammy… Groups
23
𝑔 is a group.
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 ≥ 𝜇 𝑔 ⇒ 𝑔 is reported as a spamming group.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 =
1
6
[φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ]
Spammy Spammy...Spammy… Groups
24
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 =
1
6
[φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ]
Minimum
Density
Maximum
Sparsity
Products
Count
Size
Time
Window
Co-reviewing
Ratio
Spammy Spammy...Spammy… Groups
25
Take the users of each reported interval ????????
Consider this set of users as a spamming group?????????
Just like that????????????????????????????????
Oh… we can rank them using the group spam score!
That’s it???????????????????????????????????????
NO!
Spammy Spammy...Spammy… Groups
26
Initial Candidate Groups
Repeat until
the score
becomes
worse
Remove the
least spammy
user
Set of users
Spammy Spammy...Spammy… Groups
27
 Initial groups are cliques in the user-user graph!
 We use the initial groups as blocks that can be merged to create
collusion spamming group.
Backtrack in
case the result
has a low
score
Repeat until
no more
possible
merges
Merge the
pair with the
highest
common users
ratio
Spammy Spammy...Spammy… Groups
28
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑔 𝑔 ≥ 𝜇 𝑔 ⇒ 𝑔 is reported as a spamming group.
Reported groups precision
Spammy Spammy...Spammy… Groups
29
Reported groups recall (left) and F1-score (right)
Spammy Spammy...Spammy… Groups
30
Precision of reported spammers before and after grouping
Before Grouping After Grouping
0.430 0.941
0.722 1
0.792 0.984
0.208 0.762
Spammy Spammy...Spammy… Users
31
 Report users of the top-ranked intervals.
 Reported users are ranked based on a spamicity score of a user.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑢 𝑢 =
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 + 1
2
𝑖𝑓 𝑢 𝑖𝑠 𝑎 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 1 𝑔𝑟𝑜𝑢𝑝
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Spammed Spammed...Spammed… Products
32
 Report the products of the top-ranked intervals.
 Reported products that were co-reviewed by all members of a
reported collusion group.
Category Before After
Recall 0.625 0.813
F1-score 0.769 0.897
Reported products results after adding the additional targets
Spammy Spammy...Spammy… Reviews
33
 Report the reviews of the top-ranked intervals.
 Reported reviews done by all members of a reported collusion group
to a product.
Category Before After
Recall 0.387 0.532
F1-score 0.558 0.695
Reported reviews results after adding the additional reviews
Conclusion
 Detecting suspicious time intervals is bright new and very helpful in detecting
spamming campaigns.
 The spamicity of an intervals is based on:
 The interval characteristics (weight).
 The effect of other time intervals (pairs score).
 The weighted probability of the interval content.
 When having a set of suspicious time intervals, we can:
 Create collusion spamming groups and score them.
 Report individual users, ranked by a spamicity estimation.
 Report targeted products.
 Report spammy reviews.
34
What’s next ??????????????
 Check the results on real Amazon data files.
 Compare the solution with other methods (already found some!).
 Find a cool name for the algorithm!
 Finish “not before” the deadline!
35
36
 Check the results on real Amazon data files.
 Compare the solution with other methods (already found some!).
 Find a cool name for the algorithm!
 Finish “not before” the deadline!
Thank you!
References
 Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. Low-quality
product review detection in opinion summarization. Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP-CoNLL), pages 334–342, 2007.
 Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang.
Catchsync: catching synchronized behavior in large directed graphs. KDD ’14
Proceedings of the 20th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 941–950, 2014.
 Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P.
Gummadi, Balachander Krishnamurthy, and Alan Mislove. Towards detecting
anomalous user behavior in online social networks. Proceedings of the 23rd
USENIX Security Symposium (USENIX Security) , pages 223–238, 2014.
 Qiang Cao, Xiaowei Yang, Jieqi Yu,and Christopher Palow. Uncovering large groups
of active malicious accounts in online social networks. CCS ’14 Proceedings of the
2014 ACM SIGSAC Conference on Computer and Communications Security, pages
477–488, 2014.
37
References
 Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos
Faloutsos. Copycatch: stopping group attacks by spotting lockstep behavior in
social networks. WWW ’13 Proceedings of the 22nd international conference on
World Wide Web , pages 119–130, 2013.
 Zhen Xie and Sencun Zhu. Grouptie: toward hidden collusion group discovery in
app stores. WiSec ’14 Proceedings of the 2014 ACM conference on Security and
privacy in wireless and mobile networks, pages 153–164, 2014.
 Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. Uncovering collusive
spammers in chinese review websites. CIKM ’13 Proceedings of the 22nd ACM
international conference on Information & Knowledge Management, pages 979–
988, 2013.
 Shebuti Rayana and Leman Akoglu. Collective opinion spam detection: Bridging
review networks and metadata. KDD ’15 Proceedings of the 21th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 985–
994, 2015.
38

More Related Content

Similar to Master Thesis Seminar

Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responsejeffmcjunkin
 
ARTICLECooperating with machinesJacob W. Crandall 1, May.docx
ARTICLECooperating with machinesJacob W. Crandall 1, May.docxARTICLECooperating with machinesJacob W. Crandall 1, May.docx
ARTICLECooperating with machinesJacob W. Crandall 1, May.docxrossskuddershamus
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
SauravKumar-ContentFiltering-InternDay2015
SauravKumar-ContentFiltering-InternDay2015SauravKumar-ContentFiltering-InternDay2015
SauravKumar-ContentFiltering-InternDay2015Saurav Kumar
 
Plagiarism and paraphrase Tools PPT By Dr krishna Gadasandula
Plagiarism and paraphrase Tools PPT By Dr krishna GadasandulaPlagiarism and paraphrase Tools PPT By Dr krishna Gadasandula
Plagiarism and paraphrase Tools PPT By Dr krishna GadasandulaDr. Krishna Gadasandula
 
Questionnaire Format For Research
Questionnaire Format For ResearchQuestionnaire Format For Research
Questionnaire Format For ResearchHeather Rice
 
The ACT Writing Sample Essays - Tes
The ACT Writing Sample Essays - TesThe ACT Writing Sample Essays - Tes
The ACT Writing Sample Essays - TesKristin Oliver
 
Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple AnnotatorsGaurav Trivedi
 
Qualitative Research Article Critique Example Nursin
Qualitative Research Article Critique Example  NursinQualitative Research Article Critique Example  Nursin
Qualitative Research Article Critique Example NursinJennifer Lopez
 
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019Pluribus One
 
Doordarshan Essay In Hindi Language. Online assignment writing service.
Doordarshan Essay In Hindi Language. Online assignment writing service.Doordarshan Essay In Hindi Language. Online assignment writing service.
Doordarshan Essay In Hindi Language. Online assignment writing service.Xiomara Smith
 
Risk based testing with Jira and Jubula
Risk based testing with Jira and JubulaRisk based testing with Jira and Jubula
Risk based testing with Jira and JubulaDaniele Gagliardi
 
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Francisco Manuel Rangel Pardo
 
Risk Management Insight FAIR(FACTOR AN.docx
Risk Management Insight        FAIR(FACTOR AN.docxRisk Management Insight        FAIR(FACTOR AN.docx
Risk Management Insight FAIR(FACTOR AN.docxpoulterbarbara
 
Hannah Arendt Essay Competition
Hannah Arendt Essay CompetitionHannah Arendt Essay Competition
Hannah Arendt Essay CompetitionJennifer Martinez
 
School Essay Knowledge Is Power Essay
School Essay Knowledge Is Power EssaySchool Essay Knowledge Is Power Essay
School Essay Knowledge Is Power EssayBeth Hall
 
Profile injection attack detection in recommender system
Profile injection attack detection in recommender systemProfile injection attack detection in recommender system
Profile injection attack detection in recommender systemASHISH PANNU
 

Similar to Master Thesis Seminar (20)

Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident response
 
ARTICLECooperating with machinesJacob W. Crandall 1, May.docx
ARTICLECooperating with machinesJacob W. Crandall 1, May.docxARTICLECooperating with machinesJacob W. Crandall 1, May.docx
ARTICLECooperating with machinesJacob W. Crandall 1, May.docx
 
10409004.ppt
10409004.ppt10409004.ppt
10409004.ppt
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
SauravKumar-ContentFiltering-InternDay2015
SauravKumar-ContentFiltering-InternDay2015SauravKumar-ContentFiltering-InternDay2015
SauravKumar-ContentFiltering-InternDay2015
 
Plagiarism and paraphrase Tools PPT By Dr krishna Gadasandula
Plagiarism and paraphrase Tools PPT By Dr krishna GadasandulaPlagiarism and paraphrase Tools PPT By Dr krishna Gadasandula
Plagiarism and paraphrase Tools PPT By Dr krishna Gadasandula
 
Questionnaire Format For Research
Questionnaire Format For ResearchQuestionnaire Format For Research
Questionnaire Format For Research
 
The ACT Writing Sample Essays - Tes
The ACT Writing Sample Essays - TesThe ACT Writing Sample Essays - Tes
The ACT Writing Sample Essays - Tes
 
Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple Annotators
 
Qualitative Research Article Critique Example Nursin
Qualitative Research Article Critique Example  NursinQualitative Research Article Critique Example  Nursin
Qualitative Research Article Critique Example Nursin
 
Anti-Phishing Phil
Anti-Phishing PhilAnti-Phishing Phil
Anti-Phishing Phil
 
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
 
Doordarshan Essay In Hindi Language. Online assignment writing service.
Doordarshan Essay In Hindi Language. Online assignment writing service.Doordarshan Essay In Hindi Language. Online assignment writing service.
Doordarshan Essay In Hindi Language. Online assignment writing service.
 
Risk based testing with Jira and Jubula
Risk based testing with Jira and JubulaRisk based testing with Jira and Jubula
Risk based testing with Jira and Jubula
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
 
Risk Management Insight FAIR(FACTOR AN.docx
Risk Management Insight        FAIR(FACTOR AN.docxRisk Management Insight        FAIR(FACTOR AN.docx
Risk Management Insight FAIR(FACTOR AN.docx
 
Hannah Arendt Essay Competition
Hannah Arendt Essay CompetitionHannah Arendt Essay Competition
Hannah Arendt Essay Competition
 
School Essay Knowledge Is Power Essay
School Essay Knowledge Is Power EssaySchool Essay Knowledge Is Power Essay
School Essay Knowledge Is Power Essay
 
Profile injection attack detection in recommender system
Profile injection attack detection in recommender systemProfile injection attack detection in recommender system
Profile injection attack detection in recommender system
 

More from sandra sukarieh

Cloud Computing Interoperability in Education
Cloud Computing Interoperability in EducationCloud Computing Interoperability in Education
Cloud Computing Interoperability in Educationsandra sukarieh
 
Applications of Distributed Systems
Applications of Distributed SystemsApplications of Distributed Systems
Applications of Distributed Systemssandra sukarieh
 
Storyboarding - Information Systems Engineering
Storyboarding - Information Systems EngineeringStoryboarding - Information Systems Engineering
Storyboarding - Information Systems Engineeringsandra sukarieh
 
Timed Colored Perti Nets
Timed Colored Perti NetsTimed Colored Perti Nets
Timed Colored Perti Netssandra sukarieh
 
Web Server - Internet Applications
Web Server - Internet ApplicationsWeb Server - Internet Applications
Web Server - Internet Applicationssandra sukarieh
 
Database Threats - Information System Security
Database Threats - Information System SecurityDatabase Threats - Information System Security
Database Threats - Information System Securitysandra sukarieh
 

More from sandra sukarieh (8)

Schema learning
Schema learningSchema learning
Schema learning
 
Strong stubborn sets
Strong stubborn setsStrong stubborn sets
Strong stubborn sets
 
Cloud Computing Interoperability in Education
Cloud Computing Interoperability in EducationCloud Computing Interoperability in Education
Cloud Computing Interoperability in Education
 
Applications of Distributed Systems
Applications of Distributed SystemsApplications of Distributed Systems
Applications of Distributed Systems
 
Storyboarding - Information Systems Engineering
Storyboarding - Information Systems EngineeringStoryboarding - Information Systems Engineering
Storyboarding - Information Systems Engineering
 
Timed Colored Perti Nets
Timed Colored Perti NetsTimed Colored Perti Nets
Timed Colored Perti Nets
 
Web Server - Internet Applications
Web Server - Internet ApplicationsWeb Server - Internet Applications
Web Server - Internet Applications
 
Database Threats - Information System Security
Database Threats - Information System SecurityDatabase Threats - Information System Security
Database Threats - Information System Security
 

Recently uploaded

PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxDenish Jangid
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chipsGeoBlogs
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleCeline George
 
Advances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfAdvances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfDr. M. Kumaresan Hort.
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPCeline George
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfjoachimlavalley1
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxJenilouCasareno
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 

Recently uploaded (20)

PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Advances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdfAdvances in production technology of Grapes.pdf
Advances in production technology of Grapes.pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 

Master Thesis Seminar

  • 1. Sandra Sukarieh Spam Spam…Spam! Master’s Seminar 24 January 2020 Prof. Jilles Vreeken
  • 2. 2 This presentation has been identified by experts to be avoiding MDL ! Viewer discretion is advised.
  • 6. What Spam Spam...Spam? 6 FH: Best game ever! I love the pictures and the quality! RECOMMENDED!! JF: I got it as a gift and I loooooove it <3 JV: I have never enjoyed a game like this one! SS: This game is super with a super quality!
  • 7. What Spam Spam...Spam? 7  More than 20 % of Yelp’s reviews are of misleading content with steady growth and one-third of all consumer reviews on the Internet are estimated to be misleading [Rayana and Akoglu 2015].  Spammers are becoming smarter in hiding themselves.
  • 8. Has anyone noticed the Spam Spam...Spam? 8 Fake Reviews and Likes • Liu et al. SPEC and SVM classification (EMNLP-CoNLL, 2007). Suspicious Users • Jiang et al. CatchSync (KDD, 2014). Collusion Groups • Cao et al. SynchroTrap (CCS, 2014 ). • Beutel et al. CopyCatch (WWW, 2013). • Xu et al. KNN and transactions history (CIKM, 2013 ).
  • 9. Another way to deal with Spam Spam...Spam? 9 6 Jan 2020 8-9 Jan 2020 15-17 Dec 2019
  • 10. Another way to deal with Spam Spam...Spam? 10 6 Jan 2020 8-9 Jan 2020 15-17 Jan 2020 FH: Best game ever! I love the pictures and the quality! RECOMMENDED!! JF: I got it as a gift and I loooooove it <3 JV: I have never enjoyed a game like this one! SS: This game is super with a super quality!
  • 11. Spammy Spammy...Spammy… Time Intervals 11  Not done before!  Doesn't depend on assumptions that can be easily broken.  Might help in catching smart spammers!  Might help in catching one-time spamming campaigns!  Further results can be reported.
  • 12. Spammy Spammy...Spammy… Time Intervals 12 𝑡 is a time interval. If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval. 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 = 1 3 [ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
  • 13. Spammy Spammy...Spammy… Time Intervals 13 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 = 1 3 [ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]  The weight of a time interval.  Represents the characteristics of the interval itself.  Defined by three characteristics:  Density.  Users Ratio.  Time Weight.
  • 14. Time Intervals Weight 14 𝒕 𝟏 𝒕 𝟐 Time Interval Density
  • 15. Time Intervals Weight 15 Time Interval Time Weight 6 Jan 2020 6 - 8 Jan 2020
  • 16. Spammy Spammy...Spammy… Time Intervals 16 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 = 1 3 [ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]  The pairs score of a time interval.  Represents the effect of what’s happening in other intervals.  Defined as the normalized sum of the following: 𝑠𝑐𝑜𝑟𝑒(𝑡, 𝑡′) 𝑡 = 𝑢 ∩ 𝑢′ . ψ 𝑡′ |𝑢 ∪ 𝑢′|
  • 17. Spammy Spammy...Spammy… Time Intervals 17 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 = 1 3 [ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]  The weighted probability of the interval content.  𝑝𝑟𝑜𝑏 𝑡 𝑝 : the probability of the interval content in the distribution of the products rates.  The less the probability, the more spammy the interval is.  Defined as following: ψ 𝑝𝑟𝑜𝑏 𝑡 = 1 − 𝑝𝑟𝑜𝑏(𝑡|𝑝)
  • 18. Spammy Spammy...Spammy… Time Intervals 18 If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval. 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 = 1 3 [ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ] Do we need anything else to get the best possible results?????????
  • 19. Spammy Spammy...Spammy… Time Intervals 19 Reported intervals precision
  • 20. Spammy Spammy...Spammy… Time Intervals 20 Reported products precision (left) and recall (right)
  • 21. Spammy Spammy...Spammy… Time Intervals 21 Reported reviews precision (left) and recall (right)
  • 22. Spammy Spammy...Spammy… Time Intervals 22 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 0.5 ∧ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 75% 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 0.56 ∨ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 85% ∨ ψ 𝑝𝑟𝑜𝑏 𝑡 ≤ 10−3 If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ∨ ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 ≥ 𝜇 𝑝𝑎𝑖𝑟𝑠 ∨ ψ 𝑝𝑟𝑜𝑏 𝑡 ≤ 𝜇 𝑝𝑟𝑜𝑏 ⇒ 𝑡 is reported as a spammy time interval.
  • 23. Spammy Spammy...Spammy… Groups 23 𝑔 is a group. If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 ≥ 𝜇 𝑔 ⇒ 𝑔 is reported as a spamming group. 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 = 1 6 [φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ]
  • 24. Spammy Spammy...Spammy… Groups 24 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 = 1 6 [φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ] Minimum Density Maximum Sparsity Products Count Size Time Window Co-reviewing Ratio
  • 25. Spammy Spammy...Spammy… Groups 25 Take the users of each reported interval ???????? Consider this set of users as a spamming group????????? Just like that???????????????????????????????? Oh… we can rank them using the group spam score! That’s it??????????????????????????????????????? NO!
  • 26. Spammy Spammy...Spammy… Groups 26 Initial Candidate Groups Repeat until the score becomes worse Remove the least spammy user Set of users
  • 27. Spammy Spammy...Spammy… Groups 27  Initial groups are cliques in the user-user graph!  We use the initial groups as blocks that can be merged to create collusion spamming group. Backtrack in case the result has a low score Repeat until no more possible merges Merge the pair with the highest common users ratio
  • 28. Spammy Spammy...Spammy… Groups 28 If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑔 𝑔 ≥ 𝜇 𝑔 ⇒ 𝑔 is reported as a spamming group. Reported groups precision
  • 29. Spammy Spammy...Spammy… Groups 29 Reported groups recall (left) and F1-score (right)
  • 30. Spammy Spammy...Spammy… Groups 30 Precision of reported spammers before and after grouping Before Grouping After Grouping 0.430 0.941 0.722 1 0.792 0.984 0.208 0.762
  • 31. Spammy Spammy...Spammy… Users 31  Report users of the top-ranked intervals.  Reported users are ranked based on a spamicity score of a user. 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑢 𝑢 = 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 + 1 2 𝑖𝑓 𝑢 𝑖𝑠 𝑎 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 1 𝑔𝑟𝑜𝑢𝑝 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 32. Spammed Spammed...Spammed… Products 32  Report the products of the top-ranked intervals.  Reported products that were co-reviewed by all members of a reported collusion group. Category Before After Recall 0.625 0.813 F1-score 0.769 0.897 Reported products results after adding the additional targets
  • 33. Spammy Spammy...Spammy… Reviews 33  Report the reviews of the top-ranked intervals.  Reported reviews done by all members of a reported collusion group to a product. Category Before After Recall 0.387 0.532 F1-score 0.558 0.695 Reported reviews results after adding the additional reviews
  • 34. Conclusion  Detecting suspicious time intervals is bright new and very helpful in detecting spamming campaigns.  The spamicity of an intervals is based on:  The interval characteristics (weight).  The effect of other time intervals (pairs score).  The weighted probability of the interval content.  When having a set of suspicious time intervals, we can:  Create collusion spamming groups and score them.  Report individual users, ranked by a spamicity estimation.  Report targeted products.  Report spammy reviews. 34
  • 35. What’s next ??????????????  Check the results on real Amazon data files.  Compare the solution with other methods (already found some!).  Find a cool name for the algorithm!  Finish “not before” the deadline! 35
  • 36. 36  Check the results on real Amazon data files.  Compare the solution with other methods (already found some!).  Find a cool name for the algorithm!  Finish “not before” the deadline! Thank you!
  • 37. References  Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. Low-quality product review detection in opinion summarization. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 334–342, 2007.  Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. Catchsync: catching synchronized behavior in large directed graphs. KDD ’14 Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 941–950, 2014.  Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander Krishnamurthy, and Alan Mislove. Towards detecting anomalous user behavior in online social networks. Proceedings of the 23rd USENIX Security Symposium (USENIX Security) , pages 223–238, 2014.  Qiang Cao, Xiaowei Yang, Jieqi Yu,and Christopher Palow. Uncovering large groups of active malicious accounts in online social networks. CCS ’14 Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 477–488, 2014. 37
  • 38. References  Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. WWW ’13 Proceedings of the 22nd international conference on World Wide Web , pages 119–130, 2013.  Zhen Xie and Sencun Zhu. Grouptie: toward hidden collusion group discovery in app stores. WiSec ’14 Proceedings of the 2014 ACM conference on Security and privacy in wireless and mobile networks, pages 153–164, 2014.  Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. Uncovering collusive spammers in chinese review websites. CIKM ’13 Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 979– 988, 2013.  Shebuti Rayana and Leman Akoglu. Collective opinion spam detection: Bridging review networks and metadata. KDD ’15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 985– 994, 2015. 38