Detection of Spam
Tipping Behaviour on
Foursquare
Anupama Aggarwal¶, Prof P. Kumaraguru “PK”¶ ,
Prof J. Almeida*
¶

Indrap...
Foursquare 101
‣ Location Based Social Network
‣ 33 Million Users *
‣ 3.5 Billion checkins *
‣ 31% of mobile social media ...
Location
Sharing
OSN

Foursquare 101

Friends Activity

Your Last
Checkin

Venue

Friends
Suggestions

Venue
Suggestions
T...
Spam Tips
Advertising / Marketing

‣ Tips unrelated to Venue

Scam / Phishing

4
Spam
according to

Foursquare ToS
‣ Tips with links to websites selling software, realtor contact
info, a listing for your...
Contributions
‣ Characterizing irregular user behaviour
‣

We observed different categories of spam users

‣

We character...
Data Crawling

2,400,594 tips
613,298 users

7
Observed Categories of
Spam Users
‣ Marketing : These users post tips to promote and
advertise a specific product/ brand / ...
Ground Truth Data
Annotation Portal

2,000 Legitimate users
1,900 Spammers
9
Features used to
detect Spammers
‣ User Attributes
‣

Properties of the Foursquare user profile and his checkins

‣ Social ...
Features used
Category

χ2 rank

Feature

User
Attributes

1
3
4
5
11
12
15

Number of Tips
Ratio of Check-ins and Tips
Nu...
Few Observations
‣ Spammers post same/similar Tips on multiple venues
‣ A large fraction of spam Tips contain URLs
‣ Spam ...
Relation b/w Tips and
Checkins
Tips

Irregular User Behaviour

Check-ins
Tips Distribution

Legitimate users

Spammers

14
Classification Results
Classification
Algorithm

Precision
(Spam)

Precision
(Safe)

Recall
(Spam)

Recall
(Safe)

Accuracy
...
Detection of Spam Classes
‣ Expectation-Maximization (EM) clustering
‣ Spammers Categories ‣

Advertising / Marketing

‣

...
Detection of Spam Classes
‣ Clustering Accuracy for spammer categories -

Advertising

88.23%

Self-Promotion

87.23%

Abu...
Conclusion
‣ Analyzed spammers behaviour on Foursquare
‣ We obtained an accuracy of 89.76% with Random Forest
classifier to...
Future Work
‣ Refine our methodology by use of other classification
algorithms
‣ Use multiclass classification to detect user...
Thank You!
Questions ?

20
For any further information, please write to
pk@iiitd.ac.in
precog.iiitd.edu.in

21
Upcoming SlideShare
Loading in …5
×

Detection of Spam Tipping Behaviour on Foursquare

564 views

Published on

In this paper, we present what, to our knowledge, is the first effort to
identify and analyze different patterns of tip spamming activity in
Foursquare, with the goal of developing automatic tools to detect users
who post spam tips - tip spammers. A manual investigation of a real
dataset collected from Foursquare led us to identify four categories of
spamming behavior, viz. Advertising/Spam, Self-promotion, Abusive and Malicious. We then applied
machine learning techniques, jointly with a selected set of user, social
and tip's content features associated with each user, to develop
automatic detection tools. Our experimental results indicate that we are
able to not only correctly distinguish legitimate users from tip
spammers with high accuracy (89.76%) but also correctly identify a large
fraction (at least 78.88%) of spammers in each identified category.

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
564
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Detection of Spam Tipping Behaviour on Foursquare

  1. 1. Detection of Spam Tipping Behaviour on Foursquare Anupama Aggarwal¶, Prof P. Kumaraguru “PK”¶ , Prof J. Almeida* ¶ Indraprastha Institute of Information Technology (IIIT-Delhi, India) * Universidade Federal de Minas Gerais (UFMG, Brazil) 1
  2. 2. Foursquare 101 ‣ Location Based Social Network ‣ 33 Million Users * ‣ 3.5 Billion checkins * ‣ 31% of mobile social media users use Foursquare * * As of January 2013 2
  3. 3. Location Sharing OSN Foursquare 101 Friends Activity Your Last Checkin Venue Friends Suggestions Venue Suggestions Tip : Suggested Activity for a Venue Tip can be Liked or Saved
  4. 4. Spam Tips Advertising / Marketing ‣ Tips unrelated to Venue Scam / Phishing 4
  5. 5. Spam according to Foursquare ToS ‣ Tips with links to websites selling software, realtor contact info, a listing for your business, or other promotion ‣ Tips with inappropriate language or negativity directed at another person ‣ Unauthorized or unsolicited advertising, junk 5
  6. 6. Contributions ‣ Characterizing irregular user behaviour ‣ We observed different categories of spam users ‣ We characterize features distinguishing these spam users ‣ Automatic detection of spammers ‣ Distinguish between spam and legitimate Foursquare users ‣ Cluster spam users into different categories according to their behaviour 6
  7. 7. Data Crawling 2,400,594 tips 613,298 users 7
  8. 8. Observed Categories of Spam Users ‣ Marketing : These users post tips to promote and advertise a specific product/ brand / venue / external URL ‣ Malicious : Such Foursquare users post external  URLs in Tips which direct to spam / phishing / malware websites ‣ Abusive / Derogatory: These users try to deface or bad-mouth another person ‣ Self Promotion: These users try to draw attention to themselves 8
  9. 9. Ground Truth Data Annotation Portal 2,000 Legitimate users 1,900 Spammers 9
  10. 10. Features used to detect Spammers ‣ User Attributes ‣ Properties of the Foursquare user profile and his checkins ‣ Social Attributes ‣ Friends network of the Foursquare user under inspection ‣ Content Attributes ‣ Details about Tips posted by the Foursquare user 10
  11. 11. Features used Category χ2 rank Feature User Attributes 1 3 4 5 11 12 15 Number of Tips Ratio of Check-ins and Tips Number of Check-ins Number of Badges Number of Mayorships Ratio of Check-ins and Badges Number of Photos posted Social Attributes 6 Number of Friends 2 Similarity score of Tips 7 8 9 10 13 14 Number of URLs posted Average number of words in Tips Average number of characters in Tips Ratio of number of likes and number of Tips Average number of spam words in Tips Average number of phone-numbers posted in Tips Content Attributes 11
  12. 12. Few Observations ‣ Spammers post same/similar Tips on multiple venues ‣ A large fraction of spam Tips contain URLs ‣ Spam Tips may also have phone numbers ‣ Legitimate users have more Friends ‣ Spammers have very few Friends but large number of Tips 12
  13. 13. Relation b/w Tips and Checkins Tips Irregular User Behaviour Check-ins
  14. 14. Tips Distribution Legitimate users Spammers 14
  15. 15. Classification Results Classification Algorithm Precision (Spam) Precision (Safe) Recall (Spam) Recall (Safe) Accuracy KNN 83.2% 86.6% 86.3% 83.5% 84.89% Decision Tree 88.1% 89.2% 88.3% 85.8% 89.53% Random Forest 89.3% 90.2% 88.3% 90.3% 89.76% 15
  16. 16. Detection of Spam Classes ‣ Expectation-Maximization (EM) clustering ‣ Spammers Categories ‣ Advertising / Marketing ‣ Self Promotion ‣ Abusive ‣ Malicious 16
  17. 17. Detection of Spam Classes ‣ Clustering Accuracy for spammer categories - Advertising 88.23% Self-Promotion 87.23% Abusive 78.88% Malicious 0% 17
  18. 18. Conclusion ‣ Analyzed spammers behaviour on Foursquare ‣ We obtained an accuracy of 89.76% with Random Forest classifier to distinguish spammers from legitimate users ‣ We classified the spammers into four broad categories ‣ We were able to to detect users belonging to Advertising, Self-promotion and Abusive categories with an accuracy of 88.23%, 87.23% and 78.88% 18
  19. 19. Future Work ‣ Refine our methodology by use of other classification algorithms ‣ Use multiclass classification to detect users in any of the spam categories ‣ Correlation of content and the URLs posted by different users can help us in identifying several spam campaigns on Foursquare 19
  20. 20. Thank You! Questions ? 20
  21. 21. For any further information, please write to pk@iiitd.ac.in precog.iiitd.edu.in 21

×