Spam Detection Using ML
The problem
Fake reviews
- Most ecommerce
product reviews are
fake
- The user could be
paid to post a
review.
- Someone could
post a fake review
to malign the
company
Malicious Tweets
- Tweets could
contain links to
potentially harmful
websites
- Could be a phishing
tweet
False Information
- Spreading false
information against
a person
- Or a company
- Or to provoke
someone
Existing Systems
Linguistic Based
- Humans can
understand
language but
machines can’t
- Used in search
engines to predict
next words
- Two Unigrams and
Bigrams
Behavior Based
- Based on Metadata
- User creates a set
of rules
- Updating of rules
might be needed
over time
- User-dependent
and human
intervention needed
Graph Based
- Integrate into a
single graphical
representation
- Runs graph based
anomaly detection
algorithms
- Not reliable and is
challenging to
detect false
reviews
Solution
1. Behavioral Based Features
a. Early Time Frame
b. Threshold Rating Deviation
2. Linguistic Based Features
a. Ratio of First Personal Pronouns
and Exclamation Sentences
3. Behavioral Based Features
a. Burstiness of reviews by 1 person
b. Average of user’s negative ratio
given to different businesses
4. Linguistic Based Features
a. Average Content similarity and
Maximum content similarity
Implementation
Flow diagram
Advantages
Linguistic Based Graph Based
- Each and every data obtained is accurate.
- Spam Features are as a built in function.
- Less human interaction.
- It is statistics based approach.
- Supports review centric spam detection.
- Supports reviewer centric spam detection.
Impact
Reduction in Spam
Spam
References
https://www.ijrte.org/wp-
content/uploads/papers/v8i6/F1120038620.pdf

Spam detection using ML

  • 1.
  • 2.
    The problem Fake reviews -Most ecommerce product reviews are fake - The user could be paid to post a review. - Someone could post a fake review to malign the company Malicious Tweets - Tweets could contain links to potentially harmful websites - Could be a phishing tweet False Information - Spreading false information against a person - Or a company - Or to provoke someone
  • 3.
    Existing Systems Linguistic Based -Humans can understand language but machines can’t - Used in search engines to predict next words - Two Unigrams and Bigrams Behavior Based - Based on Metadata - User creates a set of rules - Updating of rules might be needed over time - User-dependent and human intervention needed Graph Based - Integrate into a single graphical representation - Runs graph based anomaly detection algorithms - Not reliable and is challenging to detect false reviews
  • 4.
    Solution 1. Behavioral BasedFeatures a. Early Time Frame b. Threshold Rating Deviation 2. Linguistic Based Features a. Ratio of First Personal Pronouns and Exclamation Sentences 3. Behavioral Based Features a. Burstiness of reviews by 1 person b. Average of user’s negative ratio given to different businesses 4. Linguistic Based Features a. Average Content similarity and Maximum content similarity
  • 5.
  • 6.
  • 7.
    Advantages Linguistic Based GraphBased - Each and every data obtained is accurate. - Spam Features are as a built in function. - Less human interaction. - It is statistics based approach. - Supports review centric spam detection. - Supports reviewer centric spam detection.
  • 8.
  • 9.