Slides of my Master's thesis defense: SPRAP: Detecting Opinion Spam Campaigns in Online Rating Services - Exploratory Data Analysis Group - Universität des Saarlandes
2. Just a cool title? or something’s actually going wrong?
2Amazon.de
I got it as a gift and I
loooooove it <3
This game is super with a
high quality!
I have never enjoyed a
game like this one!
Best game ever! I love the
pictures and the quality!
RECOMMENDED!!
3. 3
More than 20% of Yelp’s reviews are of misleading content
and one-third of all consumer reviews on the Internet are
estimated to be misleading [Rayana and Akoglu, 2015].
Spammers are becoming smarter in hiding themselves.
Deceptive mix of legitimate reviews to build trust and fake reviews
to achieve the tasks.
Avoid the well-known spam patterns.
Not just a cool title! Something’s INDEED going wrong!
4. Has anyone noticed that?
4
Fake Reviews
and Likes
• Liu et al., SPEC and SVM classification (EMNLP-
CoNLL, 2007)
Suspicious
Users
• Rayana and Akoglu, SPEAGLE (KDD, 2015)
Collusion
Groups
• Dhawan et al., DeFrauder (IJCAI, 2019)
5. Another way to deal with that? Maybe more robust?
characteristics that cannot be avoided
Relatively short period Using the same account co-reviewing
# co-reviewed products
logof#pairsco-reviewed𝑛products
5
6. Another way to deal with that? Maybe more robust?
6
6 Jan 2020
8-9 Jan 2020
15-17 Dec 2019
7. Another way to deal with that? Maybe more robust?
7
6 Jan 2020
8-9 Jan 2020
15-17 Dec 2020
Detecting spam time intervals
in which spam campaigns
temporally take place
Detecting collusion spam
groups who perform those
spam campaigns
8. How to do it?
8
Spam behavior is rare and the majority are genuine
Anomaly detection probabilistic model:
∃𝑝 𝑟: 𝑥 𝑖𝑠 𝑠𝑝𝑎𝑚 ⇒ 𝑝 𝑟 𝑥 < 𝑠𝑜𝑚𝑒 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
Detecting spam time intervals
in which spam campaigns
temporally take place
Detecting collusion spam
groups who perform those
spam campaigns
∃𝑝 𝑇: 𝑡 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑠 𝑡𝑜 𝑎 𝑠𝑝𝑎𝑚
𝑐𝑎𝑚𝑝𝑎𝑖𝑔𝑛 ⇒ 𝑝 𝑇 𝑡 < 𝜇
∃𝑝 𝐺: 𝑔 𝑖𝑠 𝑎 𝑐𝑜𝑙𝑙𝑢𝑠𝑖𝑜𝑛 𝑠𝑝𝑎𝑚
𝑔𝑟𝑜𝑢𝑝 ⇒ 𝑝 𝐺 𝑔 < 𝛿
9. How to do it?
9
𝑝 𝑇 𝑝 𝐺
Spamicity indicators
Spamicity scores
10. Intervals Spamicity Score
10
Spamicity indicators
Members
Count
Harmonious
Rates
Quick
Attacks
Big Deviation
from the Target’s
True Quality
Multiple Targets
Interval characteristics interval weight ψ 𝑡
Size
s(𝑡)
Density 𝑑(𝑡) Weighted
Width w(𝑡)
Probability
f(𝑡)
Pairs Score
ψ 𝑝𝑎𝑖𝑟𝑠 𝑡
averaged in one spamicity score s𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑡
11. Groups Spamicity Score
11
Spamicity indicators
Targeted
Products
Members
Count
# Reviewed
Products NOT
Common
Between
Members
Quick
Attacks
Co-reviewing
Targets
Targets
Count
𝑓𝑔(𝑔)
Size
s(𝑔)
Sparsity
𝑠𝑝(𝑔)
Time
Window
𝑡𝑤(𝑔)
Co-reviewing
Ratio
𝑐𝑟(𝑔)
averaged in one spamicity score s𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔
# Reviewed
Products
Common
Between
Members
Density
𝑑(𝑔)
15. SPRAP – Top Ranked Intervals
15
Extracting intervals for each product 𝑞:
Sliding window approach:
𝑤𝑖𝑑𝑡ℎ ∈ [1, |𝑡𝑖𝑚𝑒𝑙𝑖𝑛𝑒 𝑞|]
Huge and redundant space 𝑤𝑖𝑑𝑡ℎ ∈ [1, 𝜏]
Intervals with high spamicity score are reported:
𝑖𝑓 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑡 ≥ 𝜇 ⇒ 𝑡 𝑖𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑎𝑠 𝑠𝑝𝑎𝑚 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
What 𝑃?
𝑃 is the intervals empirical distribution:
Contains all valid intervals.
Contains intervals before further filtering.
Added intervals are merged to get wider entities.
16. SPRAP – Collusion Spam Groups
16
Creating all possible groups is infeasible.
We are not only after cliques in the user co-reviewing graph,
so we cannot use Maximum Cliques or MFIM.
We are only considering “valid groups”:
𝑢1
𝑢2
𝑢6
𝑢3
𝑢4
𝑢5
𝑢7
17. SPRAP – Collusion Spam Groups
17
Top
Ranked
Intervals
Initial Groups
CollusionSpammingGroups
Refined Groups
Groups taken directly from
Top Ranked Intervals
Groups after removing
non-spammers
Final reported groups
after merging the refined
groups (not necessarily
cliques)
18. SPRAP – Collusion Spam Groups
18
6 Jan 2020
8-9 Jan 2020
15-17 Dec 2020
19. SPRAP – Collusion Spam Groups
19
𝑃 is the valid groups empirical distribution, but:
The set of created groups is very small.
The majority of created groups is connected to spam
campaigns.
Creating all valid groups is infeasible Sampling!
Straight-forward sampling can lead to a lot of
rejections MCMC!
What 𝑃?
A Group is considered spam if 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 ≥ δ.
20. SPRAP – Collusion Spam Groups
20
Normalization
Schaeffer [2010] dealt with a balanced random walk:
reaches a Uniform stationary distribution.
undirected, unweighted graphs.
𝑝 𝑣,𝑤 =
min 1
deg 𝑣
,
1
deg 𝑤
𝑖𝑓 𝑤 ∈ 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠(𝑣)
1 −
𝑤∈𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑣
min 1
deg 𝑣
,
1
deg 𝑤
𝑖𝑓 𝑤 = 𝑣
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
However, our graph is the user co-reviewing graph and
we want to sample valid groups!
What 𝑃?
21. SPRAP – Collusion Spam Groups
21
Normalization
We define a Valid Groups Markov Chain :
States are valid groups.
We use the defined balanced random walk to sample valid
groups.
No need to build the whole chain before sampling.
We add a random jump with a small probability 𝜖.
What 𝑃?
22. SPRAP – Evaluation
22
Thresholds and Configurations
We estimate the best values of spamicity thresholds (𝜇, 𝛿)
by 5 repetitions of LOOCV.
We set the parameters as follows:
𝜇 = 0.4 𝛿 = 0.6 𝜏 = 3
23. SPRAP – Evaluation
23
General Performance
Data
Set
Intervals Reviews Targets Spammers
Grouped
Spammers
R P R P R P R P R P
A 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1
C 0.926 0.978 0.925 0.985 0.889 1 0.755 0.952 0.755 0.976
D 0.991 0.946 1 0.962 1 0.92 1 0.914 1 0.955
E 1 0.95 0.997 0.974 1 0.963 1 0.922 0.972 0.972
F 0.986 0.939 0.994 0.969 1 0.895 0.989 0.869 0.989 0.989
G 1 0.965 1 0.979 1 0.964 1 0.89 1 0.946
H 1 1 1 1 1 1 1 1 0.938 1
24. SPRAP – Evaluation
23
General Performance
Data
Set
Intervals Reviews Targets Spammers
Grouped
Spammers
R P R P R P R P R P
A 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1
C 0.926 0.978 0.925 0.985 0.889 1 0.755 0.952 0.755 0.976
D 0.991 0.946 1 0.962 1 0.92 1 0.914 1 0.955
E 1 0.95 0.997 0.974 1 0.963 1 0.922 0.972 0.972
F 0.986 0.939 0.994 0.969 1 0.895 0.989 0.869 0.989 0.989
G 1 0.965 1 0.979 1 0.964 1 0.89 1 0.946
H 1 1 1 1 1 1 1 1 0.938 1
25. SPRAP – Evaluation
23
General Performance
Data
Set
Intervals Reviews Targets Spammers
Grouped
Spammers
R P R P R P R P R P
A 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1
C 0.926 0.978 0.925 0.985 0.889 1 0.755 0.952 0.755 0.976
D 0.991 0.946 1 0.962 1 0.92 1 0.914 1 0.955
E 1 0.95 0.997 0.974 1 0.963 1 0.922 0.972 0.972
F 0.986 0.939 0.994 0.969 1 0.895 0.989 0.869 0.989 0.989
G 1 0.965 1 0.979 1 0.964 1 0.89 1 0.946
H 1 1 1 1 1 1 1 1 0.938 1
26. SPRAP – Evaluation
23
General Performance
Data
Set
Intervals Reviews Targets Spammers
Grouped
Spammers
R P R P R P R P R P
A 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1
C 0.926 0.978 0.925 0.985 0.889 1 0.755 0.952 0.755 0.976
D 0.991 0.946 1 0.962 1 0.92 1 0.914 1 0.955
E 1 0.95 0.997 0.974 1 0.963 1 0.922 0.972 0.972
F 0.986 0.939 0.994 0.969 1 0.895 0.989 0.869 0.989 0.989
G 1 0.965 1 0.979 1 0.964 1 0.89 1 0.946
H 1 1 1 1 1 1 1 1 0.938 1
27. SPRAP – Evaluation
24
Wide Dense Campaigns – Effects of 𝜏
Generated Interval Interval in 𝑻 Interval in 𝑰
01-09-2019, 08-09-2019
04-09-2019, 04-09-2019
01-09-2019, 08-09-2019
06-09-2019, 06-09-2019
04-09-2019, 06-09-2019
03-09-2019, 05-09-2019
05-09-2019, 06-09-2019
03-09-2019, 04-09-2019
06-09-2019, 08-09-2019
04-09-2019, 05-09-2019
01-09-2019, 03-09-2019
Details of detecting a time interval of width 8 in data set H.
28. SPRAP – Evaluation
25
Comparison to SPEAGLE [Rayana and Akoglu, 2015]
SPEAGLE reports spammers, fake reviews, and targets.
SPEAGLE depends heavily on textual characteristics
we plant their labeled reviews in data set C whose
spammers are pure spammers.
Algorithm
Reviews Spammers Targets
R P R P R P
SPRAP 0.925 0.985 0.755 0.952 0.889 1
SPEAGLE 1 0.196 1 0.118 1 0.07
Results of SPRAP with 𝜇 = 0.4 𝛿 = 0.6 𝜏 = 3 against the
best achieved recall and precision values for SPEAGLE.
29. SPRAP – Evaluation
26
Merging Groups and Comparison to DeFrauder [Dhawan et al., 2019]
DeFrauder detects collusion spam groups.
We compare between the two methods on the data set
D which has 6 planted collusion spam groups of a mixed
nature.
Algorithm |𝑪| 𝒔 𝒎𝒂𝒙(𝒈)
Spammers Targets
R P R P
SPRAP 9 26 1 0.955 1 0.92
DeFrauder 126 5 0.709 0.329 1 0.383
Results of SPRAP with 𝜇 = 0.4 𝛿 = 0.6 𝜏 = 3 against DeFrauder.
30. SPRAP – Evaluation
27
Merging Groups and Comparison to DeFrauder [Dhawan et al., 2019]
Group
All targets
reviews by
all members
Reported
as 1
group
Original in
refined
groups
Members
Reported
Targets
FP
Members
𝑔1 Yes Yes 7 3/3 7/7 0
𝑔2 No Yes 9 4/4 9/9 0
𝑔3 No No, as 2 3 5/5 3/3 1
𝑔4 No No, as 3 6 12/12 5/5 0
𝑔5 No Yes 15 15/15 10/10 1
𝑔6 No Yes 8 25/25 5/5 1
Reported collusion groups of SPRAP for data set D.
31. SPRAP – Evaluation
28
Amazon Software data
Amazon Software data set:
Unlabeled.
Has 341931 reviews, 275374 users, and 28736 products.
Reported
Entities
Spam
Intervals
Spam Groups Spammers Fake Reviews Targets
Details
𝐼 = 9606 𝐶 = 3797 𝑆 = 37883 𝑌 = 48043 𝑍 = 1066
-
35.5% non-
cliques
37883 with
score ≥ 0.5
- -
- 33374 members - - -
Further notes:
Longest reported time interval is of 71 days.
Biggest reported collusion spam group is of 1139 members.
32. Conclusion
29
Detecting spam campaigns is not trivial due to:
Lack of ground truth.
Huge overlap between spam and genuine behavior.
Evolution of spammers and altering their techniques.
Spamicity scores that depend on a set of indicators can be
a good approximation of the optimal distribution to detect
different spam entities.
We presented SPRAP:
Detects different spam entities with a very good accuracy.
Starts from locating spam time intervals.
Avoids easily broken assumptions.
What I did
33. Conclusion
30
Turning the solution into a full probabilistic anomaly
detection model.
Weighting the spamicity indicator differently to favor some
over the others (e.g. favor groups with more targets.)
Importance groups sampling to include more “close-to-
spam” groups.
What could be done
34. Thank you!
Special thanks to Prof. Vreeken who gave me
the opportunity to be a part of the amazing
EDA group and supported me all over the way,
and Janis for his valuable assistance and his help
throughout the whole process.
I guess I have a Master’s degree now :D
35. References
Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. Low-quality product review
detection in opinion summarization. Proceedings of the 2007 Joint Conference on Empirical
Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-
CoNLL), pages 334–342, 2007.
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. Catchsync: catching
synchronized behavior in large directed graphs. KDD ’14 Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 941–950, 2014.
Bimal Viswanath, M. Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P. Gummadi, Balachander
Krishnamurthy, and Alan Mislove. Towards detecting anomalous user behavior in online social
networks. Proceedings of the 23rd USENIX Security Symposium (USENIX Security) , pages 223–238,
2014.
Qiang Cao, Xiaowei Yang, Jieqi Yu,and Christopher Palow. Uncovering large groups of active
malicious accounts in online social networks. CCS ’14 Proceedings of the 2014 ACM SIGSAC
Conference on Computer and Communications Security, pages 477–488, 2014.
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos.
Copycatch: stopping group attacks by spotting lockstep behavior in social networks. WWW ’13
Proceedings of the 22nd international conference on World Wide Web , pages 119–130, 2013.
36. References
Zhen Xie and Sencun Zhu. Grouptie: toward hidden collusion group discovery in app stores. WiSec
’14 Proceedings of the 2014 ACM conference on Security and privacy in wireless and mobile
networks, pages 153–164, 2014.
Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. Uncovering collusive spammers in chinese
review websites. CIKM ’13 Proceedings of the 22nd ACM international conference on Information &
Knowledge Management, pages 979–988, 2013.
Shebuti Rayana and Leman Akoglu. Collective opinion spam detection: Bridging review networks
and metadata. KDD ’15 Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 985– 994, 2015.
Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, and Tanmoy Chakraborty. Spotting
collective behaviour of online frauds in customer reviews. IJCAI-19, pages 245–251, 2019.
Satu Schaeffer. Scalable uniform graph sampling by local computation. SIAM J. Scientific
Computing, 32:2937–2963, 01 2010. doi: 10.1137/080716086.
38. Appendix B
Refining Groups
A Group is considered spam if 𝑠𝑝𝑎𝑚𝑖𝑐𝑖𝑡𝑦 𝑔 ≥ δ.
Refining groups is done by removing the least-spammy user
in each iteration as long as the spamicity is increasing.
The least-spammy user is estimated based on:
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠_ 𝑟𝑎𝑡𝑖𝑜(𝑢) =
𝑡 ∈𝐼 𝟙{𝑢 ∈ 𝑈𝑡}
𝑡 ∈𝑇 𝟙{𝑢 ∈ 𝑈𝑡}
39. Appendix B
Reporting Groups
Merging refined groups is done iteratively as long as the
spamicity of the resulted group is preserved.
In each iteration with merge the pair with the highest
common-users ratio.
Reported Collusion spam groups are not necessarily cliques
in the user co-reviewing graph, unlike the initial and the
refined ones.
46. Appendix F
Amazon Software data
The highest-ranked interval 𝑡 𝑚𝑎𝑥:
Spamicity score = 0.987.
Up-voting campaign with 17 high rates over 2 days.
Low probability since the target has a lot of reviews ∈ {1, 2, 3}.
The highest-ranked collusion group 𝑔 𝑚𝑎𝑥:
Spamicity score = 0.89.
16 users giving 5-rate reviews to one target 𝑞 during 2 days.
Majority of members only reviewed 𝑞.
Corresponding initial group has 27 members.