Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Detecting Fake Profiles On Online Matrimony

478 views

Published on

In a diverse country like India, socio-economic factors like religion, caste, language, income along with other common physical, professional based factors, play a vital role while searching for spouse. With surge of Internet connectivity, online matrimonial websites have become hugely popular to cater such needs. Most of the users registered on these portals have genuine intention of finding their desired life partner, however due to various factors it attracts few people with no genuine intention for the same. Such users are known as Fake/Spam profiles. These people lead to bad user experience as well as revenue loss for the online matrimony business. In this thesis we present an approach to identify such users suing machine learning techniques. Due to lack of large labelled examples for fake / suspicious users, we solve the above problem as anomaly detection problem. In this thesis, we use autoencoder which is widely used for anomaly detection. We capture user’s behaviour, profile information and edit history to detect him/her as in-genuine or genuine profile. We then treat this problem as a reconstruction task using autoencoder which is trained on a set of genuine profiles features. While prediction, the autoencoder shows small reconstruction error for genuine profiles and a very high reconstruction error for the fake users and detect them. The proposed system produces 91.76% accuracy with 90.2% recall for fake class. To the best of our knowledge, this is the first study done to detect fake/spam user profiles in online matrimony domain.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Detecting Fake Profiles On Online Matrimony

  1. 1. Detecting Fake Profiles On Online Matrimony Vaibhav Garg Dr. Ponnurangam Kumaraguru (Chair) linkedin.com/in/vaibhav-garg- 0a708899 facebook.com/in/vaibhav.gar g.104203 @rk_check
  2. 2. 2 Thesis Committee ◆ Dr. Arun Balaji Buduru, IIIT Delhi ◆ Dr. Siddhartha Asthana, United Health Group (Optum) ◆ Dr. Ponnurangam Kumaraguru, IIIT Delhi
  3. 3. 3
  4. 4. 4 Core Thesis Question How to automatically detect fake profiles on online matrimony ?
  5. 5. 5 Demo * Due to the privacy policy of the company, we can not give demo on the actual company’s portal.
  6. 6. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 6
  7. 7. 7 Register Suggested View Profile Start Conversation
  8. 8. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 8
  9. 9. 9 About the Data ◆ To dig into the problem, we chose a use case of India’s leading matrimony website ◆ Ground Truth: 5,40,737 genuine profiles and very less number of fake profiles. ◆ Data of Categorical Attributes : age, body type, caste, city, country, education, height, income, manglik, marital status, mother tongue, occupation, religion.
  10. 10. Categorical Data 10 Attribute Number of Categories Different Categories Caste 470 Hindu: Arora, Hindu: Aggarwal, Hindu: Brahmin etc. Height 37 5’0, 5’1, 5’2, 5’3 etc. Income 25 Rs. 0 - 1 Lakh, Rs 1-2 Lakh etc Mother Tongue 42 Telugu, Bengali, Hindi-Delhi etc. Occupation 69 Doctor, Analyst, IT-Engineer etc.
  11. 11. Categorical Data 11 Attribute Number of Categories Different Categories Religion 10 Hindu, Muslim, Christian etc. Body Type 4 Slim, Average, Athletic, Heavy Country 214 India, Afghanistan, Australia etc. City 3683 Delhi, UP, Ahmedabad etc. Manglik 2 Manglik, Non-Manglik
  12. 12. Categorical Data 12 Attribute Number of Categories Different Categories Marital Status 4 Never Married, Divorcee, Separated and Widowed Education 53 B.A, B.Com, B.Tech etc.
  13. 13. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 13
  14. 14. 14 Behaviour Heterogeneity C1 Genuine Profile Fake Profile C2 C3 C4 C8 C7 C6 C5 C1 C2 C3
  15. 15. 15 Inconsistent Edits Edit Done After 4 Days of Registration
  16. 16. 16 Profile Inconsistency
  17. 17. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 17
  18. 18. 18 Behavioural Trend for Caste Attribute Experimented on 100 fake and 100 genuine profiles belonging to Aggarwal Community
  19. 19. 19 Behavioural Trend for Marital Status Attribute Experimented on 100 fake and 100 genuine profiles belonging to Non Married Community
  20. 20. 20 Static Windows User’s First 8 days Activity First 12 hours Day 0 … . . . . . 0th window 1st window Day 0 Activity Day 1 Activity Day 6 Activity Day 7 Activity … . . . . . Last 12 hours Day 0 First 12 hours Day 7 Last 12 hours Day 7 15th window 16th window
  21. 21. 21 Static Windows and Feature Generation
  22. 22. 22 Which Model to Choose ?
  23. 23. Model Architecture 23 Output Features
  24. 24. Offline Results on Behaviour Features 24 Confusion Matrix Predicted Fake Predicted Clean Actual Fake 2953 852 Actual Clean 168 17799 Above results are obtained on 3805 fake profiles and 17967 clean profiles Drawback: The user has to be 8 days old on portal to be scrutinized through this approach
  25. 25. LIVE Results : True Positives 25
  26. 26. LIVE Results : False Negatives 26 Edit and Profile features needs to be incorporated !!
  27. 27. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 27
  28. 28. 28 Edit Summary for Mother Tongue Attribute Experimented on 100 fake and 100 genuine profiles which registered with Hindi-UP category
  29. 29. 29 Edit Summary for Income Attribute Experimented on 100 fake and 100 genuine profiles which registered with Rs 5-7.5 Lakh category
  30. 30. 30 Concept of Dynamic Windows User’s Active Lifetime on portal = T seconds User’s total initiates = N Time period of first N/W initiates If we select no of windows = W Time period of next N/W initiates Time period of last N/W initiates … . . . . . 0th window 1st window last window
  31. 31. Feature Designing ◆ Profile Features : One hot vector of profile attributes ◆ Behavior Features : In dynamic time windows, each feature stores the proportion of initiates sent to a particular category of attribute ◆ Edit Features : In dynamic time windows, each feature stores the proportion of time user has spent on that particular category of attribute ◆ Other Raw Features : In each window, we also store the total interests sent and time duration of that window. 31
  32. 32. 32 Feature Designing 0th window + + . . . . Nth window
  33. 33. 33 Experimenting with number of dynamic windows No of Windows Precision Recall Accuracy Using 5 windows 0.170 0.510 0.8830 Using 4 window 0.192 0.635 0.8891 Using 3 windows 0.230 0.780 0.8977 Using 2 windows 0.242 0.804 0.8975 Using 1 window 0.266 0.866 0.8972
  34. 34. 34 Feature Selection on Best Model Method Precision Recall Accuracy Best Model 0.266 0.866 0.8972 Best Model + Feature Selection 0.269 0.894 0.9083 Criteria Used = (Entropy for fake) - (Entropy for clean) (Entropy for fake) Precision is still low !!
  35. 35. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 35
  36. 36. 36 Affinity Features along with Behaviour Features ◆ An Affinity score between two categories i and j is the likelihood score of a person having category i to send interests to user having category j ◆ Affinity scores when incorporated with behaviour features compare between how a user is expected to behave and how he/she actually behaves on the platform
  37. 37. 37 Affinity Features
  38. 38. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 38
  39. 39. 39 Proposed Full length Feature Vector Profile Features Behaviour Features in Time windows Affinity Features Edit Features in Time windows + + +
  40. 40. 40 Final Model Architecture
  41. 41. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 41
  42. 42. 42 Final Results Method Precision Recall Accuracy Proposed Features + Autoencoder 0.341 0.902 0.9176 Product team demanded for 25% precision at 60% recall !!
  43. 43. Outline ◆ About Online Matrimony ◆ About the Data ◆ Characteristics of a fake profile ◆ Using only Behaviour Trends ◆ Using Behavior, Edit and Profile Information ◆ Incorporating Community features ◆ Feature Engineering: Proposed Full length feature vector ◆ Final Results ◆ Conclusion 43
  44. 44. Conclusion ◆ We first studied the distinction in behaviour, profile and edit pattern between genuine and fake users ◆ We incorporated these characteristics in the form of features using dynamic time windows. ◆ We then trained the autoencoder model to detect fake profiles on online matrimony. 44
  45. 45. 45 Real World Impact Week 1 Week 2
  46. 46. 46 Real World Impact Week 3 Week 4
  47. 47. Limitations and Future Work ◆ More number of samples for training autoencoder can lead to more generalisation. ◆ We detected fake profiles using categorical attributes only. Text spamming can be explored. 47
  48. 48. Acknowledgement ◆ Committee Members ◆ Hunny, Adhish from InfoEdge India Ltd. ◆ Members of Precog family ◆ Family and friends 48
  49. 49. 49 References ◆ https://timesofindia.indiatimes.com/city/hyderabad/nigerian-held-for-matrim onial-fraud-in-hyderabad/articleshow/66939563.cms ◆ https://www.hindustantimes.com/mumbai-news/woman-creates-fake-profil e-on-matrimony-site-cheats-mumbai-man-of-rs23-lakh/story-KHLj4zPWI8U Gv31YM5A8tK.html ◆ https://timesofindia.indiatimes.com/city/mangaluru/online-matrimony-frauds -on-the-rise-in-mangaluru/articleshow/66102334.cms ◆ https://timesofindia.indiatimes.com/city/pune/matrimonial-fraud-on-the-rise- more-than-50-cases-registered-this-year/articleshow/60049950.cms ◆ https://dl.acm.org/citation.cfm?id=2689747 ◆ https://link.springer.com/book/10.1007%2F978-3-319-20466-6 ◆ https://dl.acm.org/citation.cfm?id=3106489
  50. 50. Thanks! vaibhav17064@iiitd.ac.in 50

×