On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
When socialbots attack:Modeling susceptibility of users in online social networks Claudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier Lyon, 16.4.2012
What are socialbots?A socialbot is a piece of software that controls a useraccount in an online social network and passes itself of asa human being
3 Danger of socialbots Social Engineering Gaining access to secure objects by exploiting human psychology rather than using hacking techniques Harvest private user data such as email addresses, phone numbers, and other personal data that have monetary value Spread Misinformation Ratkiewicz et al. describe the use of Twitter bots to run smear campaigns during the 2010 U.S. midterm elections. J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web, WWW 11, pages
Danger of socialbots Snowball effects Boshmaf et al. show that Facebook can be infiltrated by social bots sending friend requests. 102 socialbots, 6 weeks, 3.517 friend requests and 2.079 infections Average reported acceptance rate: 59,1% up to 80% depending on how many mutual friends the social bots had with the infiltrated usersY. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network. In Proceedingsof the 27th Annual Computer Security Applications Conference, page 93. ACM Press, Dec 2011.
How likely will she be infected by a bot Experimental Setup ? Whom shall we protect to avoid large scale infiltration due to snowball effects? Who is a bot? Whom shall we eliminate? Is she a bot?src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/
Experimental SetupTwo-stage approach Predict Infections (binary classification task) Who is susceptible for bot attacks – i.e. who gets infected? Predict Infection level (regression task) How susceptible is a user – i.e. how often does a user interact with bots?Dataset: Social Bot Challenge 2011
Social Bot Challenge 2011Competition organized by Tim HwangAim was to develop socialbots that persuade 500 randomly Twitterusers (targets) to interact with themTargets have a topic in common: catsTeams got points if targets replied to, mentioned, retweeted orfollowed their lead bot14 days during which teams were allowed to develop their socialbots.Game started on the Jan 23rd 2011 (day 1) and ended Feb 5th 2011(day 14)At the 30th of January (day 8) the teams were allowed to updatetheir codebase
Feature Engineering How likely will this user become infected?User Network Behavior Content
Network Features3 directed networks: Follow, retweet and interaction(retweet, reply, mention and follow) networkHub and Authority Score (HITS) High authority score node has many incoming edges from nodes with a high hub score High hub score node has many outgoing edges to nodes with a high authority scoreIn-degree and Out-degreeClustering Coefficient number of actual links between the neighbors of a node divided by the number of possible links between them
Behavioral Features Informational Coverage Conversational Coverage Question Coverage Social Diversity Informational Diversity Temporal Diversity Lexical Diversity Topical DiversityC. Wagner and M. Strohmaier. The wisdom in tweetonomies: Acquiring latent conceptual structuresFrom social awareness streams. In Proc. of the Semantic Search 2010 Workshop, April 2010.
Linguistic Features LIWC uses a word count strategy searching for over 2300 words Words have previously been categorized into over 70 linguistic dimensions. standard language categories (e.g., articles, prepositions, pronouns including first person singular, first person plural, etc.) psychological processes (e.g., positive and negative emotion categories, cognitive processes such as use of causation words, self-discrepancies), relativity-related words (e.g., time, verb tense, motion, space) traditional content dimensions (e.g., sex, death, home, occupation).J. Pennebaker, M. Mehl, and K. Niederhoer. Psychological aspects of natural language use: Our words,our selves. Annual review of psychology, 54(1):547-577, 2003.
Feature ComputationFor all targets we computed the features by using alltweets they authored during the challenge (up to thepoint in time where they become infected) and asnapshot of the follow network which was asrecorded at the 26th of January (day 4)We only used targets which became susceptible atday 7 or laterFeatures do not contain any future information (suchas tweets or social relations which were createdafter a user became infected)
Predict InfectionsBinary Classification of users into susceptible and non-susceptibleTrain 6 classifiers97 FeaturesCompare classifiers via 10 cross-fold validationBalanced dataset
Predict Level of InfectionWhich factors are correlated with users‘susceptibility score?Susceptibility score counts number of interactions between a target and any lead botMethod: Regression Trees can handle strongly nonlinear relationships with high order interactions and different variable typesFit the model to our 75% of the susceptible users
Users who• use more negation words (e.g. not, never, no),• tweet more regularly 1 (i.e. have a high temporal balance) Predicting Levels of Susceptibility• use more words related with the topic death negemo (e.g. bury, con, kill) < 0.40068 >= 0.40068tend to interact more often with bots 2 temp_bal < 0.37025 >= 0.37025 3 death < −0.16389 >= −0.16389 Node 4 (n = 25) Node 5 (n = 7) Node 6 (n = 9) Node 7 (n = 15) 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2
Predicting Levels of Susceptibility Rank correlation of hold-out users given their real susceptibility level and their predicted susceptibility level (Kendall τ up to 0.45) Goodness of fit (R2 up to 0.3)Potential Reasons: Dataset is too small (we only had 81 susceptible users and 61% of them had level 1, 17% had level 2, 10% had level 3, very few users had more than 3 interactions)
Summary & ConclusionsApproach to identify susceptible usersFeatures of all three types contributed to theidentificationUsers are more likely to be susceptible if they are emotional Meformers they use Twitter mainly for communicating their communications are not focused to a small circle of friends they are social and active (i.e., interact with many others)
Summary & ConclusionsActive Twitter users are more susceptible They are more likely to see the messages/requests of social bots But we expected that they develop some skills to distinguish social bots from human by using Twitter frequentlyPredicting users’ susceptibility score is difficult More data and further experiments are required
Future WorkRepeating experiments on larger datasetsTaxonomy of social bot strategies Massive numbers of con-messages (brute force) Manipulation of messages through false retweets (changing pro- to con messages) Diverting attention by adding con-hashtags to pro-hashtagsSusceptibility of users for different strategies
Emotional Meformers which are active, communicative and social Experimental Setup are more likely to be infected THANK YOU firstname.lastname@example.org http://claudiawagner.infosrc: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/