Thanks for joining! The webinar is about 30 minutes long Questions will be answered during the session Please submit your questions using the chat window
My name is Orion Cassetto, I am the Sr. Product Marketing Manager at Incapsula
Prior to Incapsula, I held product marketing positions at Imperva, and Armorize Technologies.
My experience is in Web application Security and Software as a Service solutions.
Today we will be talking about: An overview of Bot technology How bots are used for Hacking and Denial of Service Attacks The Impact of Content Scraping on Websites Suggestions for Bot detection and Mitigation
Let’s begin by discussing what a bot is, and what isn’t. A bot, as pertains to the internet, isn’t a time travelling cyborg assassin sent back in time to kill Sarah Conner. All jokes aside, a bot is an software program which performs some task or function over the internet.
They usually perform simple tasks in a highly repetitive and rapid manner, producing results at a speed unobtainable by humans.
Bots are responsible for many small jobs that amount to critical tasks that we take for granted such as populating search engine results.
Bots usually visit websites in regular patterns and do things like checking if websites are online, measuring their speed, and fetching content. They can also be used to scan websites to find security vulnerabilities, which we will talk more about later.
Based on research by the Incapsula team, bots now make up as much as 61% of website traffic. While much of this traffic is legitimate it would be naive to assume that helping facilitate a better internet is all they are up to. Roughly 50% of the automated traffic we analyzed was malicious.
We’ve talked about some of the great things that automated clients on the internet are responsible for like populating search engine results, powering APIs or application programming interfaces, and finding security flaws in our website code. But what about the bad bots? The malicious ones? What exactly are they up to and how does it affect websites?
Bad bots are responsible for a host of malicious activity including: Site scraping to steal website content Comment spam which you commonly see on blogs and forums Fraud And even web application attacks.
By blocking bad bots, website owners can significantly improve the security posture of their website. It is important to keep in mind that blocking good bots would be very disadvantageous to websites and thus, care should be to taken to create an ecosystem that is both bot friendly and also free of malicious automated clients.
Over the last two decades, bots have evolved from simple scripts with minimal capabilities to complex programs which are sometimes able to convince websites and their security precautions that they are humans.
This makes them a powerful tool for hackers looking to bypass security protocols that would identify a less sophisticated bot.
Another example of how far bots have evolved is that of fake googlebots.
A recent study published by the Incapsula labs found that an average website is visited by 187 Googlebots per day and each Googlebot visit averages 4 pages per visit.
Of these Googlebot visits, 1 in 24 visits will be from a fake Googlebot.
Why would a hacker create a fake google bot? Many websites are designed to permit google bots into areas of a website which other bots may not normally be able to access. It’s somewhat akin to bot version of a fake ID, or a backstage pass.
Like you might expect, imposter google bots are typically up to no good. They usually perform malicious tasks like attacking websites, performing marketing intelligence, stealing webpage content, posting comment spam, and a host of other unwanted activities.
Now that we’ve spent some time reviewing the basics of bots, I want to shift the focus of our discussion to how they are used for hacking.
If you’ve spent any amount of time on blog sites or forums, you’ll likely have noticed suspicious looking posts for sneakers, designer bags, Viagra, Cialis etc. [click]
This is comment spam and it is typically put there by purpose built bots which seek out websites which accept user comment and are not designed to defend against submissions made by automated clients.
[click] Comment spam, while more of a nuisance than anything else does have several negative affects on web sites. From the user point of view these posts are annoying and result in a worse website viewing experience. They can also direct visitors of to potentially malicious sites where they may be infected with Malware. From the website operator point of view they drive traffic away from their websites, can link to competitors’ websites, and are burdensome to identify and clean off of comment sections.
Another type of automated attack typically performed by bots is Clickfruad. Clickfraud is the act of illegitimately clicking on pay per click ads. Clickfraud is an insidious, and commonly overlooked weapon which usually manifests itself in two forms: Clickfraud as performed by competitors of advertisers, or by competitors of ad publishers (by publisher I mean the website showing the ad.
When performed by Competitors of advertisers, bots are created which click on ads a high rate of speed thus forcing the advertisers to pay for fake ads which are never seen by humans.
When performed by competitors of publishers click fraud seeks to making it appear that the owner has written a bot to click on their own ads. This would generally be in breach of contract with the ad networks and result in being banned as a publisher. For websites dependent on ad revenue this can be devastating.
Another major bot related security issue is Search engine optimization (or SEO as it’s know) referral spam. An Excellent case study for SEO referral spam is that of Semalt, which happened earlier this year.
Semalt is a Ukrainian based “SEO” Company which recently launched an enormous referral spam campaign. The campaign utilized a network of some 290,000 malware infected computers (also known as a botnet), to crawl the internet looking for vulnerable targets and then attacking them. [Click]
Once a victim was found, the botnet visited them with a fake referral source. These referral sources belong to websites that Semalt was paid to improve search engine rankings for. Referral links are one of the criteria which Google uses to evaluate search engine rankings. When googles crawls the victim websites it will notice all of these fake referral links in the public logs of these websites and then increase the SEO ranking of Semalt’s “clients”.
Why does that matter for you or any website owner? This referral spam needs to be identified and blocked because the presence of fake SEO referrals can cause long term damage to your website’s Search engine results and can result in complete blacklisting or removal from page results.
Being blacklisted from Google search results would clearly have a large negative impact on your website.
Another rising bot-related threat is the DDoS attack. DDoS stands for distributed Denial of Service and it is a type of attack where hundreds or thousands of infected computers band together into a single weapon, referred to as a “bot net”. This botnet is then used to attack a single target with the goal of overwhelming the network or server it is using, thus creating a website outage. DDoS attacks are quickly becoming a favorite weapon for attackers because they are relatively cheap to perform and difficult to defend against.
One interesting campaign that happened earlier this year around February and March targeted high profile SaaS companies such as Meetup and Basecamp. These SaaS companies have built successful online applications that can scale to support million of users and deliver huge amounts of content. Still all of these examples, and many more, were brought down with DDoS attacks.
It is frequently the case that DDoS attackers will request ransom for small amounts of money, like a couple hundred dollars in exchange for ending the attack and restoring the website’s availability. Although the dollars amount requested may be small, these attacks are typically large enough to bring down any company that does not have an active DDoS mitigation solution in place.
Lets take a look at how DDoS attacks work, and the role Bots play in them.
This network diagram shows an example of traffic flowing under normal conditions. Website visitors are routed across the internet, through a customer’s Internet service provider and to the destination website. Data is then sent back along this route to the website visitor. DDoS attacks interrupt this flow by overwhelming a internet connection or internet connected device.
A common type of DDoS attack called a volumetric attack does this by banding together hundreds of thousands of infected computers into a botnet (short for bot network). Then using this botnet to attack a single target. On the way to the target website, the volume of this bot generated traffic becomes so immense that it cannot fit through the internet connection the web owner has purchased from it’s ISP. The result is that no legitimate web traffic will be able to use this conduit and thus the website will appear offline until the attack subsides.
[Click] If website availability is important to you, then DDoS protection should be too
Any application without a DDoS mitigation strategy is at risk. DDoS mitigation is tricky to deal with because the volume and complexity of the attacks requires specialized tools or services to mitigate it.
Bots can also be used as powerful reconnaissance tools for hackers. Web vulnerability scanners are programs which use special bots to crawl through a website and find security flaws.
Typically these tools are used by website owners on their own websites with the goal of finding and fixing vulnerabilities in their applications.
In the hands of hackers these tools turn into a weapon which is pointed at websites which they do not own. Combined with a web crawler, hackers can create tools which are able to troll the internet looking for vulnerable websites.
Now that we know that hackers can use automated tools to find websites with vulnerabilities, you might be asking what chance is there than your website or application has such a vulnerability. According to a report by Cenzic, a leading vulnerability scanner – 96% of today’s web apps have vulnerabilities and 13% of websites can be compromised automatically.
We’ve talked about several nefarious things that bots do to compromise the security of victim websites. One such activity is so wide spread it warrants its own discussion, that being Site Scraping. Over the next few slides we will explore what site scraping is, how it works, and why it’s a problem for website owners.
The most common type of scraping is called site scaping. The goal of this activity is to copy or steal webpage content for use elsewhere. This repurposing of content may or may not be approved by the website owner. Typically bots do this by crawling a website, accessing the source code of the website and then parsing it to remove the key pieces of data they want. After obtaining content, they typically post it elsewhere on the internet.
A more advanced type of scraping is database scraping. Conceptually this is similar to site scraping except that hackers will create a bot which interacts with a victims application to retrieve data from its database. Think about a website such as an insurance quoting website. A bot could be created which would try all possible combinations in an application to obtain quote prices for all scenarios.
For example it could tell the application it was a 25 year old male trying to get a quote for a Honda, then for a Toyota, then a Ferrari. Each time I would get a different result back from the application. Given enough tries, It could be possible to obtain entire datasets. Clearly with the number of permutations available in this scenario, a bot would be preferable to a human. [Click] Database scraping can be used to steal intellectual property, price lists, customer lists, underwriting, etc.
Scraping isn’t always malicious. There are many cases where the goal for data owners is to simply propagate data to as many people as possible. For example, many government websites provide data which is intended to be consumed by the general public. This data is frequently available over APIs but sometimes scrapers must be employed to gather that data.
Another example of sites which may be powered by bots include aggregation websites such as travel sites, hotel booking websites, concert ticket websites and many others. Bots which distribute content form these sites, whether they obtain this content via an API or by scaping, tend to drive traffic toward the data owners’ websites. In this case bots may function as a critical part of their business model.
Site scraping can be a powerful tool. In the right hands, it helps automate the gathering and spread of information. In the wrong hands, it can lead to the theft of intellectual property or an unfair competitive edge.
Consider the case of a rental car agency, if one company a created a bot which regularly checked the price of its competitor and slightly undercut them at every price point, it would have a competitive advantage. This lower price would appear in all aggregator sites which compare both companies, and would likely result in more care rental conversions and higher search engine rankings.
When considering what to do with bots, its’ important to fully assess the impact of a specific bot before deciding whether or not to allow it to access your website. Does this automated client add or subtract value to your business? Is it driving traffic toward your webite, or away from your site?
We’ve spent time talking about what bots are, how they are used for hacking and site scraping, now we will begin to discuss how to identify and mitigate them.
The most effective way to identify bots is to use a specialized tool. Bot mitigation tools typically employ one or more of the following approaches:
All of the information is then combined to determine whether or not a website visitor is human, to classify it by visiting purpose.
A common thought is that Robots.txt can be used to protect against bad bots. Let’s look into what Robots.txt is, and what it’s capable of. First off, robots.txt is a list of rules which bots visiting a website are supposed to obey. Legitimate bots, including search engines like google carry these orders out. Bad bots on the other hand, ignore the rules. [click]
If you knew the specific client of a bot, you could use robots.text to block it, but for the most part robots.txt is not a good tool for blocking against malicious bots. Instead it should be thought of as a tool to dictate what the good bots on your website are doing.
In conclusion, Bad bots are responsible for a large number of serious security threats to websites. Website operators can greatly enhance their site’s security posture by analyzing traffic for bots, identifying malicious clients, and blocking them while maintaining site access for good bots. The easiest way to do this is through the use of third party tools which are commonly available as either stand alone products are as part of other solutions such as Web Application firewalls, or application delivery controllers.
In closing I want to tell you briefly how Incapsula can help.
Incapsula is a cloud based service that secures and accelerates websites. It works by using DNS redirection to route website traffic through the Incapsula Network.
Once traffic is flowing through Incapsula, malicious traffic and bad bots are blocked, and legitimate traffic is accelerated. This leads to a more secure, faster loading website.
For a free trial of Incapsula, visit us at www.incapsula.com
Understanding Web Bots and How They Hurt Your Business
Orion Cassetto, Sr. Product Marketing Manager, Incapsula
Understanding Web Bots and How They Hurt
•Thanks for joining!
•The webinar is about 30 minutes long
•Questions will be answered after the session
•Please submit your questions using
> the chat window
>Or tweet them to @orionevolution
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.2
Speaker Bio – Orion Cassetto
• Sr. Product Marketing Manager for
• Previously held product marketing
positions at Imperva and Armorize
• Experienced in Web app security, and
SaaS security solutions
• Holds degrees in Asian Studies, and
Chinese Language from Washington
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.3
• An overview of Bot technology
• How bots are used for Hacking and Denial of Service Attacks
• The Impact of Content Scraping on Websites
• Suggestions for Bot detection and Mitigation
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.4
What is an Internet Bot?
• A bot is a software program that runs automated
tasks over the internet
• They typically perform simple, repetitive tasks
• Are able to operate at a higher rate of speed than
humans can achieve
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.5
Popular Legitimate Uses for Web Bots
Bots tend to visit websites in regular cycles performing
• Search Engine Crawling
• Website Health Monitoring
• Fetching Web Content
• Web vulnerability Scanning
• Operating APIs (Application Programming Interfaces)
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.6
Automated Clients are the Majority of Web Traffic
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.7
Over 61%of all website traffic is non-human.
of that is malicious.
The Impact of Bots on Website Security
• Site Scraping
• Comment Spam
• SEO Spam
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.8
• Search Engine
• Website Health
• Fetching Content
• Powering APIs
Good Bots Bad Bots
Evolution of Bots
• Bots are increasingly able to imitate browser and human
• Browser-based bots which live inside of infected browsers are
becoming more sophisticated
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.9
Imposter Google Bots are on the Rise
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.10
Googlebots visit websites an
average of 187 times per day
24% of them are fake
Imposter Google Bots are on the Rise
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.11
Google Imposter Bots by Activity Type
How bots are used for Hacking
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.12
Bots and Comment Spam
• What is Comment Spam
> Posts in comment sections on websites allegedly linking to:
- Steams of popular TV shows
- Cheap Shoes
- Designer bags, etc.
• How bots are involved
> Bots are used to automatically find victim sites and insert spam
• Why it matters
> Comment spam is frequently responsible for
- Worse user experiences
- Lower website conversions (links usually exit your site)
- Malware distribution (infecting your visitors)
Bots and Click Fraud
• What is click fraud?
> When a person, or automated script imitates a legitimate user of a
web browser clicking on a pay-per-click ad
• How bots are involved
> Bots are created which can click on ads with a rate unachievable
• Click fraud can be used as a weapon for
- Competitors of advertisers
- Competitors of publishers
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.14
SEO Referral Spam
What is it?
1. Semalt is a Ukrainian search engine optimization (SEO) “company”
2. They used malware to hijack computers and create a giant botnet
3. This Botnet visits sites across the internet with fake referral sources
What damage could this cause your website?
• Long term SEO Damage to your website’s rankings
• Complete search engine result page blacklisting and removal
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.15
Bots for Distributed Denial Of Service (DDoS) Attacks
• DDoS attack are attacks where many infected computers band
together to attack a single target
• These attacks exhaust network connections and server
resources causing website outages
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.16
How DDoS Attacks Impact Site Availability
• DDoS attacks make your website completely inaccessible
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.17
• If website availability is important to you, then DDoS
protection should be too
• Any application without a DDoS mitigation strategy is at risk
Bots as Website Reconnaissance
• Website Vulnerability Scanners
> Powered by bots
> Crawl websites searching for security flaws
> Typically used by website owners
> Provide operators with a list of website vulnerabilities
> Can also be used by Hackers
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.18
Websites Have Many Vulnerabilities
96% of web applications
Sources: Cenzic, Inc. – Feb. 2014, Incapsula, Inc. –2013
13% of websites can be
The Impact Of Site Scraping Bots
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.20
Types of Scraping - Site Scraping
• Site Scraping is when a bot visits a website to copy or steal
• Usually done by reading and parsing web page source code
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.21
Your Site Their Site
Your Code Your Content
Types of Scraping - Database Scraping
• Database Scraping is when bots enter all possible parameters into an
application to retrieve content from a database
> Example of an car Insurance site
- Male, 25, Honda $X / Month
- Male, 25, Toyota $Y / Month
- Male, 25, Ferrari $Z / Month
• Can be used to steal intellectual property, underwriting, pricelists,
customer lists, etc.
Your Site Your DB
Sanctioned Uses for Site Scraping
• Obtaining or Distributing Public information
> Weather data
> Government data
> Economic data
• Aggregator Sites
> Travel Sites
> Shopping Aggregators
> Hotel booking
> Concert Tickets
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.23
How Site Scraping Can Hurt Your Business
• Site Scraping can lead to IP theft or Competitive Disadvantage
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.24
Randy's Rental Car
$30/day $35/day $45/day
$50/day $65/day $85/day
Competitor Rental Car
$29/day $34/day $44/day
$49/day $64/day $84/day
Identifying and Mitigating Bots
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.25
Inspecting Website Traffic for Bots
• Static approach:
> Structure of web requests
> Header information
> Visitor browser agent info
• Progressive Challenge approach
• Behavioral Approach
> Order and frequency of requests
> Interaction between clients and servers
What about using Robots.txt ?!?
• What is Robots.Txt?
> It is list of rules for the bots visiting your website
• Can’t I use it to block bad bots?
> In theory, yes. In reality, no.
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.
Bad bots ignore the rules!
Identify and Block Bad Bots
• Implement a solution which can block bad bots to prevent
> Comment Spam
> Site Scraping
> Vulnerability Scanning
> Automated SEO Poisoning
• Maintain site access for good Bots
• Bot Mitigation can be
> Standalone service
> Part of other tools like WAFs
or application delivery controllers
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.28
Website Security and Performance in Minutes with a Simple DNS Change
Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.29
By routing website traffic through the Incapsula network,
malicious traffic is blocked, and legitimate traffic is accelerated.
Incapsula Network Your Website
For a Free Trial of Incapsula visit us at
Please send follow up questions to firstname.lastname@example.org
31 Incapsula, Inc. / Proprietary and Confidential. All Rights Reserved.