Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Detecting Fake Engagement on Instagram

512 views

Published on

Instagram has become a growing platform for users to share media reflecting their interests like food, travel, fashion etc. In addition, it is heavily used by marketers and influencers to reach out to potential audience by advertising their content. The number of likes received on posts reflects social reputation of the users, and in some cases, social media influencers with a large reach are also compensated by marketers to promote products. This has led to users artificially bolstering the likes they receive to project an inflated social worth. Our analysis on over 20,000 likes spanning 18,091 posts from 1000 users, reveals that fake engagement on Instagram has distinct features. In this study, we build an automated mechanism to detect fake likes on Instagram, and estimate the true reach of a user. We achieve a high precision of 92% to detect fake likes. We further validate the efficiency of our model to perform an in-the-wild study and label 1,34,021 like instances, identifying 8,845 previously unknown fake likes.

Published in: Engineering
  • Be the first to comment

Detecting Fake Engagement on Instagram

  1. 1. Detecting Fake Engagement on Instagram Indira Sen linkedin/in/indira-sen-8 a6068140 @drealcharbar fb.com/indira.sen.31 Dr. Ponnurangam Kumaraguru (chair) 1
  2. 2. Thesis Committee - Dr. Anwitaman Datta, NTU Singapore - Mr. Nitendra Rajput, InfoEdge - Dr. Ponnurangam Kumaraguru, IIIT Delhi (Chair) 2
  3. 3. Likes on Instagram 3,363 likes 3
  4. 4. Likes on Instagram 1,008 likes 4
  5. 5. Why is Engagement Important on Instagram? 5
  6. 6. Why Fake Likes? - ‘Influencers’ compensated on engagement: likes and comments - Incentive to artificially inflate engagement metrics by purchasing likes, like markets or like back networks - Inflated like count fool potential brand or advertisers into hiring ‘unworthy’ Influencers 6
  7. 7. Motivation 7 - Influencer Marketing - $1B industry - Fake influencers landed deals over $500
  8. 8. - How do we automatically detect fraudulent likes on Instagram? Core Thesis Question Organic Likes - Likers who engage with content - Genuine reach Inorganic Likes - Likers bought from marketplaces - Artificial reach - Understanding properties of genuine liking behaviour B : {b1 , b2 , …, bn } - Reducing the effect of likes which do not match B 8
  9. 9. Thesis Outline - Research Aim - Data Collection - Analysis of Fake Likes - Machine Learning Classifier to Detect Fake Likes - Estimating Reach of Users - Conclusion 9
  10. 10. What is a Like Instance? - Given a poster S whose post p has been liked by liker L, we define a like instance as the tuple (L, p, S) 10
  11. 11. Research Aim - Find out the features of liker L, post p and S, to determine the probability of liker L genuinely liking that particular post p. - Identify true reach of poster by determining fake likes received on the posted content. 11
  12. 12. Possible Reasons for Genuine Liking Homepage: followees’ posts Explore: Instagram’s Recommendations Likes of followees 12
  13. 13. Possible Reasons for Genuine Liking Based on photos you liked Based on people you follow Similar to accounts you interact with Explore 13
  14. 14. Possible Reasons For Genuine Liking - Poster is a followee - Poster is a followee of a followee - Topical interests in common 14
  15. 15. How to get Fake Likes - Marketplaces - Like Back collusion networks - Link Farming hashtags - Bots 15
  16. 16. Architecture Diagram 1) Liker meta and last 18 posts 2) Poster meta and last 18 posts 3) Post meta Fake Likes Other Likes Training Data Machine Learning Model Random unknown Likes Fake Not Fake Features Features 16 1 - α α
  17. 17. Data Collection: Fake Likes Purchased Fake Likes Fake Likes 1: Likes given by Honeypot victims Likes on videos with views = 0 Honeypot Fake Likes 2 victim? Instagram Featured users Snowball Sample to 1M Random sample of 500 Honeypot Other Likes not victim? 17 Instagram Featured users Snowball Sample to 1M Random sample of 500 Honeypot Other Likes not victim? Data Collection: Fake Likes Purchased Fake Likes Fake Likes 1: Likes given by Honeypot victims Likes on videos with views = 0 Honeypot Fake Likes 2 victim? 17
  18. 18. Data Collection: Fake Likes - Honeypots to trap fake likers bought through a service - If user falls for honeypot then we monitor their liking behaviour Honeypot 18
  19. 19. Instagram Featured users Snowball Sample to 1M Random sample of 500 Honeypot Other Likes not victim? Data Collection: Fake Likes Purchased Fake Likes Fake Likes 1: Likes given by Honeypot victims Likes on videos with views = 0 Honeypot Fake Likes 2 victim? 19
  20. 20. Data Collection: Other Likes Purchased Fake Likers Fake Likes 1: Likes given by Honeypot victims Likes on videos with views = 0 Honeypot Fake Likes 2 victim? Instagram Featured users Snowball Sample to 1M Random sample of 500 Honeypot Other Likes not victim? 20
  21. 21. Data Collection: Other Likes - Randomly sample 500 users from 1M users who are not honeypot victims #Likes #Posts #Likers #Posters Fake 10,417 8,408 500 7,715 Other 11,810 11,644 500 7,631 21
  22. 22. Thesis Outline - Research Aim - Data Collection - Analysis of Fake Likes - Machine Learning Classifier to Detect Fake Likes - Estimating Reach of Users - Conclusion 22
  23. 23. Understanding Fake Likes - Hypotheses indicative of fake liking behaviour - Validate with 2 sample KS test - Network effect: - Liker is follower of poster - Liker is follower of follower of poster 23
  24. 24. Liker is Follower of Poster - Green edges: liker relationship - Red edges: liker - follower relationship - Other likes have a higher proportion of follower-likers 24 Other Likes Fake Likes
  25. 25. Network Effects 25 - 90% fake like instances have only .25 of followee likes 90% 56%
  26. 26. Interest Overlap - A user will like a post if she shares topical interests with the post - Affinity: lower the affinity, the higher the overlap 26
  27. 27. Extracting Topics - Bio, post text and post image - Wikification and Densecap for images 27
  28. 28. Extracting Topics - Bio, post text and post image - Wikification and Densecap for images 28 Image topics Post caption topics
  29. 29. Interest Overlap - A user will like a post if she shares topical interests with the post - Affinity - non-commutative 29
  30. 30. Affinity - Affinity outperforms Jaccard distance in terms of discernibility - post image topics strong indicators of genuine liking 30
  31. 31. - Our metric is able to capture semantic relationship between entities compared to other traditional distance metrics - 90% of other likes have an average affinity of 0.5 - 90% of fake likes have an average affinity of 0.74 0.740.5 31
  32. 32. Other Features - Celebrities tend to get more likes (engagement) - Genuine likers will keep coming back - repeated likers - Link Farming hashtags: #like4like, #l4l, #like2follow - Topical hashtags - Posting activity of liker (Badri et al, CIKM’16) and poster - Profile picture of liker: egghead profiles (cheap to create) 32
  33. 33. Automatic Detection of Fake Likes - Using features described and a set of ML classifiers - Fake likes : Other likes ratio → 1:2 - SVM RBF kernel gives best performance 33
  34. 34. Classification Model - Performance - Manually look at 100 false negatives and find that 70 of them had high topical overlap - Liker interest set was small: affinity metric limitation Precision Recall F1-score 0 0.93 0.96 0.945 1 0.895 0.825 0.86 total 0.92 0.925 0.92 34
  35. 35. In the Wild Experiment - random 1,34,669 like instances - Categorize posts into : food, fashion, outdoors, merchandise, people, gadgets, pets, captioned - We find 8,557 fake likes - Manually analyze 100 of these and find 78 to be fake 35
  36. 36. Thesis Outline - Research Aim - Data Collection - Analysis of Fake Likes - Machine Learning Classifier to Detect Fake Likes - Estimating Reach of Users - Conclusion 36
  37. 37. - Enable advertisers to make better decisions - Reduce the effect of fake likes a poster may have received - Measure Deviation in reach Reach Estimation 37
  38. 38. Who receives fake likes? - Users posting about merchandise, outdoors (including travel posts) and people (posts containing faces) have highest deviation from the projected reach. 38
  39. 39. Who receives fake likes? 39 merchandise, outdoors (including travel posts) and people Most posters do not have high deviation while some users have very high deviation
  40. 40. Do Popular Users have more Fake Likes? - No, users with lower follower counts who maybe trying to gain a following higher deviation 40 ‘Micro Influencers’ have higher deviation
  41. 41. Conclusion - Automated method to detect fake like instances - Performs well to identify unseen fake likes on Instagram. - Find true reach of a user - Helps advertisers and brands identify users with genuine, meaningful reach 41
  42. 42. Challenges, Limitations and Future Work - Availability of labeled data, approximations using honeypot - Data collection constraints, integrate network features - Improve affinity, improve precision(dynamic features) - Fine grained topical recommendations for brands and advertisers 42
  43. 43. Acknowledgement - Anupama Aggarwal, PhD Scholar, IIIT Delhi - Committee members - Srishti Gupta, Divyansh Agarwal, Neha Jawalkar, Sonu Gupta, Kushagra Bhargava - Siddharth Singh, Shiven Mian - Members of Precog - Family and friends 43
  44. 44. References - https://instamacro.com/ - http://nymag.com/selectall/2017/08/fake-instagram-accou nt-earns-sponsored-influencer-money.html - http://www.independent.co.uk/life-style/gadgets-and-tech/ social-media-experiment-fake-instagram-accounts-make- money-influencer-star-blogger-mediakix-a7887836.html - http://nymag.com/selectall/2017/08/fake-instagram-accou nt-earns-sponsored-influencer-money.html 44
  45. 45. Thanks! Any questions? You can find me at: indira15021@iiitd.ac.in 45 pk@iiitd.ac.in

×