27. Identify the user:
Advanced tracking
Passive data: headers, plugins, browser, OS
JS: screen resolution, custom resource detection via Plugins API
(i.e. printers via PDF, fonts via Flash, etc.)
Track ID
Cookies, Flash cookies (allow cross-domain references),
HTML5 storage, silverlight
Java: own download cache, applets can read embedded resource streams
Future? Apps and games in social networks.
36. Tweets to someone
After some testing and feature-selection
algorithms:
numberOfVias
tweetsToSomeone
tweetsWithLink
followingFollowers
friendFollowerRatio
tweetsKnownReceiver
tweetsUnknownReceiver
40. Avoiding semantic analysis
• if its do you me your my do it my be find is but on are its rt
that was
• I a me at get out your they on rt if I get rt can a
• u you rt find in I that that your my my find one you so is is
my you this but get all a one its it
• they with its your get me of I
49. Top tweets sent
• Mmmm hot chocolate with cream
• Beyonce looks so hot in her new ad 1800 different tweets
• So Hot
• Spain !! Too hot
• hot summer
• a hot bubble bath is much needed
• Tea water supposed to hot ya now
• Air conditioner-laying on the bed-naked-relax-heaven! So hot tonight!
• playing piano and guitar r the only things i can do right in life does
this make me hot enough for a boyfriend yet</p
• Austin mahone is just like another justin beiber..he is hot tho!
50. Top tweets sent
• Mmmm hot chocolate with cream
• Beyonce looks so hot in her new ad 1800 different tweets
• So Hot
• Spain !! Too hot
• hot summer
• a hot bubble bath is much needed
• Tea water supposed to hot ya now
• Air conditioner-laying on the bed-naked-relax-heaven! So hot tonight!
• playing piano and guitar r the only things i can do right in life does
this make me hot enough for a boyfriend yet</p
• Austin mahone is just like another justin beiber..he is hot tho!
55. Conclusions
It is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
56. Conclusions
It is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
57. Conclusions
It is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
58. Conclusions
It is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
59. Thank you
Questions?
Vicente Díaz @trompi
Senior Security Analyst,
Global Research and Analysis
Editor's Notes
Today I´m gonna talk about fraud in Twitter and Machine Learning. The historical problem with ML, field of AI, is expectations.In our collective imagination we envision The Terminator, Matrix and Ghost in the shell. What I want to do with this presentation is show how AI may be used in a much more simple way for daily problems that we, as researchers, face every day.However using AI in security is not new, but for some reason I think we underuse it. I hope after this talk everybody will be more interested in this topicand learn how to use it on a regular basis.
I just want to stress that we often use these techniques in security for very interesting stuff.But it always looks like something big and difficult to apply to more mundane problems. That´s where I hope this talk could help everybody through an example.
Ok, so let´s start. I´ve already said we will apply machine learning to detect fraud. In this case, we detect fraud in Twitter.Why Twitter? I don´t think it is necessary to stress why Twitter is relevant these days, here you can see some numbers about how big it is.But also Twitter has some other interesting features for a researcher: all the data is public and easily accessible, information about profiles is public and easy to obtain through a convenient API (note this changed last year and now you should use Oauth). Also Twitter messages are short, that helps in case you want to analyze the contents.
So what´s the problem with Twitter? Where is the fraud?One of the problems is in the level of Spam that social networks are reaching. Playing with big numbers is never easy and we only have a partial view of the big picture, but from our data we see how the level of Spam started decreasing a couple of years ago. At the same time, the level of Spam increased in Social Networks.The reasons are understandable: we learned how to detect spam in email messages and we ignore them. Also we have a lot of software doing the filtering for us. But in the case of social networks spammers get a better ROI as people still gets confused and opens everything it´s sent to them. Also protection mechanisms are not so well established.I remember I read something on how old email spammers were moving to buying legitimate Ads in Social Networks as the ROI was bigger.
We have some data from Twitter about their levels of Spam. This is always a bit tricky, because their figures is what they detect, so we may think that either they improved their detection mechanisms or failed to detect new stuff Still no fresh data is available, but we see how Twitter started to get serious about that. Still the problem exists, and keep in mind that 1% of 175 million tweets is 1,75 million Spam messages a day!About media, this is what I was talking about when talking nobody has all the data.Still – why doing this spam? Is people buying Viagra through Twitter? We will talk a bit about this later.
Well, we have many examples of Spam being sent on Twitter, but not so many on malware. Why not?Spam is still a grey area. We will see that in the examples later, but many times it is very difficult to say whether a campaign is malicious or not – or legal or not, so it´s not easy for the social network to shutdown the account. But when it comes to malware distribution, everything is clear. All security researchers and AV industry is quick to investigate and shutdown everything.We should understand how basically there are two techniques for this: creating new bots or hijacking existing accounts. Both methods have pros and cons for attackers and researchers.
Well, we have many examples of Spam being sent on Twitter, but not so many on malware. Why not?Spam is still a grey area. We will see that in the examples later, but many times it is very difficult to say whether a campaign is malicious or not – or legal or not, so it´s not easy for the social network to shutdown the account. But when it comes to malware distribution, everything is clear. All security researchers and AV industry is quick to investigate and shutdown everything.We should understand how basically there are two techniques for this: creating new bots or hijacking existing accounts. Both methods have pros and cons for attackers and researchers.
Hacked accounts may have many uses, but one of them is to get more accounts!
Hacked accounts may have many uses, but one of them is to get more accounts!
Well, what else do we have in Twitter?Anyone ever heard of what is called Digital Marketing? Basically that consists on a bunch of people whose work consists on creating strange ratios based on followers, trends and stuff and show it to their bosses One of their most used tools is Twitter. They try to get as many followers as possible to spread the company´s message. But they also know how important is the influence of other people in social networks to promote their message. In the past it was the marketing guy going to wikipedia´s website to change what other people said about them – we could see that through the used IPs for their shame. Nowadays a lot of digital marketing companies promote their message using fake profiles in social networks, like in this example.In this case accounts are not hijacked, as this would be big trouble for the company behind that. Also, these profiles may not be against the Terms Of Service of the social network but are totally against the interest of it – they want real people, and that´s why they are lately asking for complementary data (such as the telephone number in Google) and shutting down fake profiles (in Facebook).
Well, what else do we have in Twitter?Anyone ever heard of what is called Digital Marketing? Basically that consists on a bunch of people whose work consists on creating strange ratios based on followers, trends and stuff and show it to their bosses One of their most used tools is Twitter. They try to get as many followers as possible to spread the company´s message. But they also know how important is the influence of other people in social networks to promote their message. In the past it was the marketing guy going to wikipedia´s website to change what other people said about them – we could see that through the used IPs for their shame. Nowadays a lot of digital marketing companies promote their message using fake profiles in social networks, like in this example.In this case accounts are not hijacked, as this would be big trouble for the company behind that. Also, these profiles may not be against the Terms Of Service of the social network but are totally against the interest of it – they want real people, and that´s why they are lately asking for complementary data (such as the telephone number in Google) and shutting down fake profiles (in Facebook).
Some accounts may be abused by hacktivists as well, as when defacing any website. These cases are more rare, the accounts are hijacked in a more selective and unique way, and as such, this is not very interesting from a global perspective. Still their impact may be very important in case the account is not detected as malicious quickly!
We have seen how it is interesting for attackers to both create fake accounts and to hijack legitimate ones.So, how to create trouble in Twitter? There are different methods for malicious activity. Basically attackers can Create new profiles or Hack existing ones.For this last method they:Can steal it as any victim of any malwareBruteforce the accountDelete the hash from Twitter
We have seen how it is interesting for attackers to both create fake accounts and to hijack legitimate ones.So, how to create trouble in Twitter? There are different methods for malicious activity. Basically attackers can Create new profiles or Hack existing ones.For this last method they:Can steal it as any victim of any malwareBruteforce the accountDelete the hash from Twitter
We have seen how it is interesting for attackers to both create fake accounts and to hijack legitimate ones.So, how to create trouble in Twitter? There are different methods for malicious activity. Basically attackers can Create new profiles or Hack existing ones.For this last method they:Can steal it as any victim of any malwareBruteforce the accountDelete the hash from Twitter
So let me show you some examples and details on how this works with a real campaign.
In this case it all started last summer when I was sending a tweet about Battlefield, and I got a reply from this nice girl – which usually never happens to me.Inmediately I got suspicious and started looking into it. I discovered several other profiles, basically all of them consisting on nice girls sending messages like the one I received to guys like me.These messages were on different topics: xbox, iphone, macbookpro, victoria´s secret, etc.Here you can see a collage I created with some of the profile pictures I found.
In this case it all started last summer when I was sending a tweet about Battlefield, and I got a reply from this nice girl – which usually never happens to me.Inmediately I got suspicious and started looking into it. I discovered several other profiles, basically all of them consisting on nice girls sending messages like the one I received to guys like me.These messages were on different topics: xbox, iphone, macbookpro, victoria´s secret, etc.Here you can see a collage I created with some of the profile pictures I found.
One of the most interesting things to notice is how all these were one-shot bots. The lifespan for almost half of them was less than 45 minutes!Another interesting thing to notice is how these bots were doing some semantic analysis of their victims. And that´s a real improvement for Twitter bots when compared to email Spam bots where you have no knowledge of the victim. In this case you can try to get the interest of your victim by offering him something related to his interests.
Well, basically after some redirections –first through a fake blog - you were landing on a page like this one where depending on your IP and your ZIP code you were asked for your email to play a lottery.This is not the typical Viagra website, so I was wondering how was the spammer making money here? – explain the campaign on the following slides and how the AD industry works.
So let me say a few words on what is happening with our privacy these days and how tracking works.
Flash cookies to rewrite cookies.From the How Unique is Your web browser we get a 86% of unique fingerprints.Sometimes plugins are used to bypass the content of blocked sites.JS code simulates user interaction to bypass the third-party cookie restrictions.Related, cross-domain postMessage support to pass cookies between coordinating sites and store in localStorage!Initiatives such as DoNotTrack (http header) are being completely ignored
The same on a enterprise level: do we know who else Google provide access to our data?Multibillion industry, data is cross-checked with real life information and finally sold. To whom?What about GOVs?
Ok, so once we know how it works I decided to check whether it was possible to detect all these nasty malicious campaigns and I decided to do my own experiment.
There is also a surprisingly good result for detecting hacked accounts as well. In this case I believe the reason is that many of the features are related to the tweets sent and, in my experiment, I only considered the last 20 tweets sent – so it´s a limited time window.
There is also a surprisingly good result for detecting hacked accounts as well. In this case I believe the reason is that many of the features are related to the tweets sent and, in my experiment, I only considered the last 20 tweets sent – so it´s a limited time window.
So we have seen how the key is in choosing the right features for machine learning to be effective and detect the malicious profiles.Keep in mind this is an experiment and in real life we might find that our training subset does not cover all possible variables.However, creators of fake profile keep this in mind in order to avoid being detected. Many features may be easily forged to avoid detection, that´s why derived features such as the ones that have to do with relationship with other profiles are more solid, but still can be avoided.Let´s take a look to what attackers are doing in order to avoid detection.
Creating neighborhoods, however, has some implicit risk: it is easy to shutdown the whole group once detected.I haven´t seen yet using hacked accounts to create relationships with bots in order to make it harder to detect campaigns – maybe we will see this in the future.
Creating neighborhoods, however, has some implicit risk: it is easy to shutdown the whole group once detected.I haven´t seen yet using hacked accounts to create relationships with bots in order to make it harder to detect campaigns – maybe we will see this in the future.
Usually bots use the same messages and URLs. URLs are difficult to search because of shorteners, but in some cases they use other phrases all of them. This way you can localize them.As you can see here, it is common they re-use the same profile picture many times, this is another way to detect them.You can just search in Twitter (using the API or the website)
In this example you can see how reused profile pictures is another trick.You can see how they use here the same profile description, so it is quite easy to get many other profiles of this campaign just looking for the email they use in the campaign.
We can also see how the non-deleted bots are reused in different campaigns, changing some parameters to adjust for the new one and also to avoid detection.There are more pictures, but I got tired … Also, some other accounts were suspended during this time.Another hint for suspicious profiles is in the name of the profile itself. You can see how all of them here follow a given pattern.
We see how analyzing these profiles they are extremely easy to detect. Still they survive thanks to brute force.
We see how analyzing these profiles they are extremely easy to detect. Still they survive thanks to brute force.
You see any pattern?1800 tweets include the word hot
You see any pattern?1800 tweets include the word hot
This is one of my favourites sites to use to find stuf