The document discusses a new method for detecting spam reviews by analyzing time intervals. It proposes calculating a "spamicity" score for time intervals based on characteristics of the interval, relationships to other intervals, and probability of content. Intervals above a threshold are flagged as spammy. Users and reviews from spammy intervals are then grouped and ranked to identify spammers and their targeted products and reviews. The method improves over prior work and results in higher precision and recall in detecting spam users, products, and reviews.
6. What Spam Spam...Spam?
6
FH: Best game ever! I
love the pictures and
the quality!
RECOMMENDED!!
JF: I got it as a gift
and I loooooove it
<3
JV: I have never
enjoyed a game like
this one!
SS: This game is
super with a super
quality!
7. What Spam Spam...Spam?
7
More than 20 % of Yelp’s reviews are of misleading content with
steady growth and one-third of all consumer reviews on the
Internet are estimated to be misleading [Rayana and Akoglu 2015].
Spammers are becoming smarter in hiding themselves.
8. Has anyone noticed the Spam Spam...Spam?
8
Fake Reviews and Likes
• Liu et al. SPEC and SVM classification (EMNLP-CoNLL, 2007).
Suspicious Users
• Jiang et al. CatchSync (KDD, 2014).
Collusion Groups
• Cao et al. SynchroTrap (CCS, 2014 ).
• Beutel et al. CopyCatch (WWW, 2013).
• Xu et al. KNN and transactions history (CIKM, 2013 ).
9. Another way to deal with Spam Spam...Spam?
9
6 Jan 2020
8-9 Jan 2020
15-17 Dec 2019
10. Another way to deal with Spam Spam...Spam?
10
6 Jan 2020
8-9 Jan 2020
15-17 Jan 2020
FH: Best game ever! I
love the pictures and
the quality!
RECOMMENDED!!
JF: I got it as a gift
and I loooooove it <3
JV: I have never
enjoyed a game like
this one!
SS: This game is
super with a super
quality!
11. Spammy Spammy...Spammy… Time Intervals
11
Not done before!
Doesn't depend on assumptions that can be easily broken.
Might help in catching smart spammers!
Might help in catching one-time spamming campaigns!
Further results can be reported.
12. Spammy Spammy...Spammy… Time Intervals
12
𝑡 is a time interval.
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
13. Spammy Spammy...Spammy… Time Intervals
13
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
The weight of a time interval.
Represents the characteristics of the interval itself.
Defined by three characteristics:
Density.
Users Ratio.
Time Weight.
16. Spammy Spammy...Spammy… Time Intervals
16
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
The pairs score of a time interval.
Represents the effect of what’s happening in other intervals.
Defined as the normalized sum of the following:
𝑠𝑐𝑜𝑟𝑒(𝑡, 𝑡′) 𝑡 =
𝑢 ∩ 𝑢′ . ψ 𝑡′
|𝑢 ∪ 𝑢′|
17. Spammy Spammy...Spammy… Time Intervals
17
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
The weighted probability of the interval content.
𝑝𝑟𝑜𝑏 𝑡 𝑝 : the probability of the interval content in the
distribution of the products rates.
The less the probability, the more spammy the interval is.
Defined as following:
ψ 𝑝𝑟𝑜𝑏 𝑡 = 1 − 𝑝𝑟𝑜𝑏(𝑡|𝑝)
18. Spammy Spammy...Spammy… Time Intervals
18
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 ≥ 𝜇 𝑡 ⇒ 𝑡 is reported as a spammy time interval.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦𝑡 𝑡 =
1
3
[ψ 𝑡 + ψ 𝑝𝑎𝑖𝑟𝑠 𝑡 + ψ 𝑝𝑟𝑜𝑏 𝑡 ]
Do we need anything else to get the best possible results?????????
23. Spammy Spammy...Spammy… Groups
23
𝑔 is a group.
If 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 ≥ 𝜇 𝑔 ⇒ 𝑔 is reported as a spamming group.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 =
1
6
[φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ]
24. Spammy Spammy...Spammy… Groups
24
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 𝑔 =
1
6
[φ 𝐷 𝑔 + 1 − φ 𝑆 𝑔 + φ 𝑃 𝑔 + φ 𝑆 𝑔 + φ 𝑇𝑊 𝑔 + φ 𝐶𝐷 𝑔 ]
Minimum
Density
Maximum
Sparsity
Products
Count
Size
Time
Window
Co-reviewing
Ratio
25. Spammy Spammy...Spammy… Groups
25
Take the users of each reported interval ????????
Consider this set of users as a spamming group?????????
Just like that????????????????????????????????
Oh… we can rank them using the group spam score!
That’s it???????????????????????????????????????
NO!
27. Spammy Spammy...Spammy… Groups
27
Initial groups are cliques in the user-user graph!
We use the initial groups as blocks that can be merged to create
collusion spamming group.
Backtrack in
case the result
has a low
score
Repeat until
no more
possible
merges
Merge the
pair with the
highest
common users
ratio
31. Spammy Spammy...Spammy… Users
31
Report users of the top-ranked intervals.
Reported users are ranked based on a spamicity score of a user.
𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑢 𝑢 =
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 + 1
2
𝑖𝑓 𝑢 𝑖𝑠 𝑎 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 1 𝑔𝑟𝑜𝑢𝑝
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠𝑅𝑎𝑡𝑖𝑜 𝑢 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
32. Spammed Spammed...Spammed… Products
32
Report the products of the top-ranked intervals.
Reported products that were co-reviewed by all members of a
reported collusion group.
Category Before After
Recall 0.625 0.813
F1-score 0.769 0.897
Reported products results after adding the additional targets
33. Spammy Spammy...Spammy… Reviews
33
Report the reviews of the top-ranked intervals.
Reported reviews done by all members of a reported collusion group
to a product.
Category Before After
Recall 0.387 0.532
F1-score 0.558 0.695
Reported reviews results after adding the additional reviews
34. Conclusion
Detecting suspicious time intervals is bright new and very helpful in detecting
spamming campaigns.
The spamicity of an intervals is based on:
The interval characteristics (weight).
The effect of other time intervals (pairs score).
The weighted probability of the interval content.
When having a set of suspicious time intervals, we can:
Create collusion spamming groups and score them.
Report individual users, ranked by a spamicity estimation.
Report targeted products.
Report spammy reviews.
34
35. What’s next ??????????????
Check the results on real Amazon data files.
Compare the solution with other methods (already found some!).
Find a cool name for the algorithm!
Finish “not before” the deadline!
35
36. 36
Check the results on real Amazon data files.
Compare the solution with other methods (already found some!).
Find a cool name for the algorithm!
Finish “not before” the deadline!
Thank you!
37. References
Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. Low-quality
product review detection in opinion summarization. Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP-CoNLL), pages 334–342, 2007.
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang.
Catchsync: catching synchronized behavior in large directed graphs. KDD ’14
Proceedings of the 20th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 941–950, 2014.
Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P.
Gummadi, Balachander Krishnamurthy, and Alan Mislove. Towards detecting
anomalous user behavior in online social networks. Proceedings of the 23rd
USENIX Security Symposium (USENIX Security) , pages 223–238, 2014.
Qiang Cao, Xiaowei Yang, Jieqi Yu,and Christopher Palow. Uncovering large groups
of active malicious accounts in online social networks. CCS ’14 Proceedings of the
2014 ACM SIGSAC Conference on Computer and Communications Security, pages
477–488, 2014.
37
38. References
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos
Faloutsos. Copycatch: stopping group attacks by spotting lockstep behavior in
social networks. WWW ’13 Proceedings of the 22nd international conference on
World Wide Web , pages 119–130, 2013.
Zhen Xie and Sencun Zhu. Grouptie: toward hidden collusion group discovery in
app stores. WiSec ’14 Proceedings of the 2014 ACM conference on Security and
privacy in wireless and mobile networks, pages 153–164, 2014.
Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. Uncovering collusive
spammers in chinese review websites. CIKM ’13 Proceedings of the 22nd ACM
international conference on Information & Knowledge Management, pages 979–
988, 2013.
Shebuti Rayana and Leman Akoglu. Collective opinion spam detection: Bridging
review networks and metadata. KDD ’15 Proceedings of the 21th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 985–
994, 2015.
38