Adversarial ID - Social spam recognition

1,569 views

Published on

Spam recognition in social system using seprevised machine learning.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,569
On SlideShare
0
From Embeds
0
Number of Embeds
56
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Adversarial ID - Social spam recognition

  1. 1. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Adversarial IR - Social Spamming Nicola Miotto Unipd - Computer Science January 22, 2011Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 1 / 39
  2. 2. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Outline 1 Introduction Spam Adversarial IR 2 Tag-spam detection in Social Bookmarking systems Problem description Features Classification 3 Youtube Video Spamming Problem description Features Classificatio 4 Conclusions 5 ReferencesNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 2 / 39
  3. 3. Introduction Tag-spam detection Youtube Video Spamming Conclusions References IntroductionNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 3 / 39
  4. 4. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - History 1970: BBC broadcasts the Spam sketch by Monty Python’s Flying Circus, where the current meaning of the term is derived; 1978: advisory message sent to 393 ARPANET users, the earliest documented spam; ’90: Make Money Fast flooding around in many newsgroup. Frist association an IT related field of the term spam; 1998: new definition for the term spam in the New Oxford Dictionary of English: Definition Irrelevant or inappropriate messages sent on the Internet to a large number of newsgroups or users.Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 4 / 39
  5. 5. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Fields E-mail Istant Messaging: Messaging spam Web-Search: Spamdexing Social systems: Social spam And so on...Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 5 / 39
  6. 6. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Spammer Earn money on the web! Google AdSense or Heyos like services allow users to place Ad automatically generated in their web pages in order to get money from clicks and page impressions. Legal Avertiser: He produces web site where to put content-related Ad; He improves the pagerank of the website for the relevant keywork; Try to lead potential customers to his websites;Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 6 / 39
  7. 7. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Spammer Spammer: Website contents just used to attract users and improve the pagerank; No discrimination between interested and not interested users; Authomatic spam-network generation programs: they find the relevant keywords (eg: via AdWords) they register the domain names containing those keywords; they create complete websites with fake contents with the keywords found; they link the generated websites together in order to improve the pagerank;Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 7 / 39
  8. 8. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Social Spamming Spam campaign directed to Social Network users Social bookmarking systems: Delicious; Video social network: YouTube; General purpose social network: Facebook; and so on..Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 8 / 39
  9. 9. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Social Spamming Features: Lots of user related information; Easier to point to a specific demographic segment; Cheaper (usually); Adopted solution (most of the times): Report abuse → generic solution, but less effective than ad-hoc ones.Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 9 / 39
  10. 10. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Spam - Consequences Users hijacked towards areas out of their informative needs; unfair competition with legal advertiser Information poisoning due to the spam noiseNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 10 / 39
  11. 11. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Adversarial IR - Definition Adversarial: “Assumes competing parties trying to affect the outcome of a system (system could be an algorithm, a market, etc)” Adversarial IR: “Information retrieval, ranking, or classification system affected by multiple parties acting in their own interest”Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 11 / 39
  12. 12. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Adversarial IR - AIRWeb AIRWeb Adversarial Information Retrieval on the Web Annual workshop about Adversarial IR Researchers and industry practitioners gathered to to present and discuss advances in the state-of-the-art of Adversarial IT First workshop in 2005 (Japan)Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 12 / 39
  13. 13. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Discussed techniques AIRWeb papers 42 Social spam recognition techniques discussed during the AIRWeb workshops Supervised Machine Learning 42 1 Feature modelling 2 Training dataset retrieval 3 Machine learning (ie: SVM) 4 Result evaulationNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 13 / 39
  14. 14. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Tag-spam detection in Social Bookmarking systemsNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 14 / 39
  15. 15. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Problem description - Tag-spam Social bookmarking system: User can associate meta-information (tags) to resources (links); Association of one o more words to any resource; Advertiser: Social tagging: posting link to his website tagging them with content-related keywords Spammer: Most “famous” keywords (eg: music) used to tag not-related websites (eg: his spam-websites);Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 15 / 39
  16. 16. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Figure: Delicious.com Screenshot (2011)Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 16 / 39
  17. 17. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Figure: Example: Tag-spam on Delicious.com (2008)Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 17 / 39
  18. 18. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Problem description - Folksonomy Data structure to represent a social tagging system; Hyper-graph connecting users, resources and tags; Symbols: u ∈ U, U set of users; r ∈ R, R set of resources; t ∈ T , T set of tags; post= {(u, r , t1 ), ..., (u, r , tn )} = {(u, r , (t1 , ..., tn ))} F = {post1 , ..., postn }Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 18 / 39
  19. 19. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Figure: Folksonomy graphical representation exampleNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 19 / 39
  20. 20. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Tag based Which tags do spammers use? TagSpam Ut = {u : (∃r : (u, r , t) ∈ F )} St ∈ Ut , identified as spammer |St | Pr (t) = |Ut | T (u, r ) = {t : (u, r , t) ∈ F } 1 fTagSpam (u, r ) = Pr (t) |T (u, r )| t∈T (u,r )Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 20 / 39
  21. 21. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Tag based Is there as semantical relationship between tags? TagBlur σ(t1 , t2 ) ∈ [0, 1], normalized tag similarity between t1 e t2 Z = tag pairs in T( u, r ) 1 1 1 fTagBlur (u, r ) = − Z σ(t1 , t2 ) + 1+ t1 =t2 ∈T (u,r )Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 21 / 39
  22. 22. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Resource based I DomFP Spammers use programs to generate pages → same content for spam pages We know the fingerprint of some spam pages Compute the likelihood that r is spam comparing r fingerprint to know ones NumAds Usually spammers just offers lots of Ads NumAds application exampe: count googlesyndication.com amount in the resource html codeNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 22 / 39
  23. 23. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Resource based Plagiarism Spammers usually copy content from high-ranked websites Compare r contents to other webpages ValidLinks Spammer websites are frequently knocked down Lots of invalid links posted by u implies greater likelihood of u being spammerNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 23 / 39
  24. 24. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Classification - Training dataset BibSonomy.org : public dataset 27.000 user and their post hand made classification → 25.000 spammers and 2.000 legal users Classification : Binary classification into either spammer or not spammerNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 24 / 39
  25. 25. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Classification - Results SVM AdaBoost Features Accuracy FP F1 Accuracy FP F1 TagSpam 95.82% .061 .957 94.66% .048 .943 + TagBlur 96.75% .048 .966 96.06% .044 .958 + DomFp 96.75% .048 .966 96.06% .044 .958 + ValidLinks 96.52% .048 .964 96.75% .026 .965 + NumAds 96.52% .048 .964 97.22% .026 .970 + Plagiarism 96.75% .048 .966 98.38% 0.22 .983Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 25 / 39
  26. 26. Introduction Tag-spam detection Youtube Video Spamming Conclusions References YouTube Video SpammingNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 26 / 39
  27. 27. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Description - Youtube video spam Video-response: user answers to a video with another related video Spammer: user answering with not related videos Reasons: increase video popularity marketing campaign pornography distribution system poisoning Issue: automatic content based spam recognition hard to implementNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 27 / 39
  28. 28. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Description - Techniques Content-based recognition: video content analysis too many computational resource hard to generalize the idea of spam in a video, unless it doesn’t have textual conent Video and users relationship analysis: lots of informations publicly available spammers have specific social features (they’re lonely) user behaviour towards spammers can be automatically analysedNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 28 / 39
  29. 29. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - User-based For each user: # posted videos # friends # watched videos # favourite videos # video responses # responded videos # subscrition # subscriberNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 29 / 39
  30. 30. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Video-based 2 category per user: All posted videos Just video responses 7 attributes each of them # views duration # votes # comments # favourites # youtube honours # external links Total and average for each attribute attribute, so 28 in total.Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 30 / 39
  31. 31. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Features - Social network Basate su Video response user graph: directed graph (X,Y) each user is a node in the graph (x1 , x2 ) directed edge from x1 ∈ X to x2 ∈ Y if x1 ∈ X responded to a video of x2 ∈ Y Analysis: in/out degree for each “user” assortativity: degree(n) / avg( degree(neighbours(n)) ) userrank: depending on quantity and quality of in links clustering coefficient, betwenness, reciprocityNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 31 / 39
  32. 32. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Classification - Dataset Data crawling: Starting from top-100 most responded video, retrieving connected data concerning video responses, responded video e users. Hand made classification: Each user with at leas a video response not related to the responded video is classified as spammer. Test set: 473 legal users + 119 spammer = 592 usersNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 32 / 39
  33. 33. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Classification - Training Support Vector Machine 5-fold cross-validation Adopted features: user-based video-based social-network all togetherNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 33 / 39
  34. 34. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Classification - Results Measure User Video SN All TP 0.054 0.426 0.375 0.439 TN 0.998 0.922 1 0.981 FP 0.002 0.078 0 0.019 FN 0.946 0.574 0.625 0.561 Accuracy 0.821 0.821 0.874 0.870 F 0.094 0.484 0.540 0.558 TP = users correctly classified as spammers FP = legal users classified as spammersNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 34 / 39
  35. 35. Introduction Tag-spam detection Youtube Video Spamming Conclusions References ConclusionsNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 35 / 39
  36. 36. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Conclusions Classifications Tag-spam recognition : Accuracy > 98% False positives < 2% Youtube-video spam recognition : True positives > 44% False positives < 2%Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 36 / 39
  37. 37. Introduction Tag-spam detection Youtube Video Spamming Conclusions References Conclusions Pro: Few legal users classified as spammer Tag-spam recognition finds most of the spammer Dataset build out of publicly available information Contro: Social system already poisoned by spam Hand made classification of training examplesNicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 37 / 39
  38. 38. Introduction Tag-spam detection Youtube Video Spamming Conclusions References References I Brian D. Davison, The Potential for Research and Development in Adversarial Information Retrieval, Computer Science and Engr., Lehigh University, Cambridge, 2009, available at http://airweb.cse.lehigh.edu/2009/slides/ Davison-AIRWeb2009-Keynote.pdf. B.Markines,C.Cattuto,F.Menczer,D.Benz,A.Hotho,and G. Stumme, Evaluating similarity measures for emergent semantics of social tagging, In Proc. 18th Intl. WWW Conf., 2009, available at http://www2009.org/proceedings/pdf/p641.pdf. Benjamin Markines, Ciro Cattuto, Filippo Menczer, Social Spam Detection, AIRWeb ’09, April 21, 2009 Madrid, Spain, available at http://airweb.cse.lehigh.edu/2009/papers/p41-markines.pdf.Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 38 / 39
  39. 39. Introduction Tag-spam detection Youtube Video Spamming Conclusions References References II Fabricio Benevenuto, Tiago Rodrigues, Virgilio Almeida, Jussara Almeida, Chao Zhang, Keith Ros, Identifying Video Spammers in Online Social Networks, AIRWeb ’08, April 22, 2008 Beijing, China, available at http://airweb.cse.lehigh.edu/2008/submissions/ benevenuto_2008_spam_video.pdf.Nicola Miotto (Unipd - Computer Science) Adversarial IR - Social Spamming January 22, 2011 39 / 39

×