When aggressive scrapers caused slowdowns on iCruise.com, Antoine Zammit, VP of technology at its parent company WMPH Vacations, said enough was enough.
Distil Networks is a bot detection and mitigation specialist. It works with some of travel’s biggest names such as Sabre, Skyscanner, Amadeus and Lufthansa as well as specialist operators of scale, such as WMPH.
In a tnooz workshop which took place this week, Elias Terman, Vice President of Marketing, Distil Networks gives a data-driven overview of the current state of the bad bot landscape, the recent shift of bad bot activity to mobile and new bot-driven scams such as spinning.
Antoine Zammit goes on to present a case study outlining how badly were hammering his web sites and the many benefits which using Distil to beat the scrapers brought to the business, including more leads, better conversions, improved site speed and a better experience for customers and partners.
3. The Open Web Application Security Project (OWASP) is an important standards body in the application security community. Their annual top
10 threats list is the basis for many web application security programs. They are now expanding their scope to include automated threats -
bots.
SUBSET OF THREATS NAME DEFINING CHARACTERISTICS
Account Aggregation
Account Creation
Credential Cracking
Credential Stuffing
Use by an intermediary application that collects together multiple accounts and interacts on their behalf
Create multiple account for subsequent misuse
Identify valid login credentials by trying different values for username and/or passwords
Mass log in attempts to verify the validity of stolen username/password pairs
Carding
Card Cracking
Cashing Out
Multiple payment authorisation attempts used to verify the validity of bulk stolen payment card data
Identify missing start/expiry dates and security codes for stolen payment card data by trying different values
Buy goods or obtain cash utilising validated stolen payment card or other user account data
Footprinting
Vulnerability Scanning
Fingerprinting
Probe and explore application to identify its constituents and properties
Crawl and fuzz application to identify weaknesses and possible vulnerabilities
Elicit information about the supporting software and framework types and versions
OTHER
Ad Fraud
CAPTCHA Bypass
Denial of Service
Expediting
Scalping
Scraping
Skewing
Sniping
Spamming
Token Cracking
False clicks and fraudulent display of web placed advertisements
Solve anti-automation tests
Target resources of the application and database servers, or individual user accounts, to achieve denial of service
Perform actions to hasten progress of usually slow, tedious or time-consuming actions
Obtain limited-availability and/or preferred goods/services by unfair methods
Collect application content and/or other data for use elsewhere
Repeated link clicks, page requests or form submissions intended to alter some metric
Last minute bid or offer for goods or services
Malicious or questionable information addition that appears in public or private content, databases or user messages
Mass enumeration of coupon numbers, voucher codes, discount tokens, etc
100% OF OWASP AUTOMATED THREATS (BOTS) TARGET TRAVEL INDUSTRY
PAYMENT
CARDHOLDER
DATA
ACCOUNT
CREDENTIALS
VULNERABILITY
IDENTIFICATION
This work is licensed under the Creative Commons Share-Alike License for OWASP Automated Threat Handbook Web Applications by Distil Networks
4. The bad bot landscape
How bad bots impact the travel industry
Web/screen scraping and spinning (hoarding)
Increased GDS pull costs
Decreased SEO, slowdowns, and downtime
Account takeover, credit card fraud, and points fraud
Skewed conversion metrics and look-to-book ratios
WMPH Vacations Case Study
Q&A
Agenda
5. Advanced Persistent Bots
Basic scripts running
in command line
Headless browsers,
advanced scripts,
Cycle IPs and User
Agents
Real browser
automation, malware
APBs
75%
6. More Bad Bots Claim to Be Mobile
The amount of bad bots claiming to be
mobile browsers jumped 42.78% in 2016
7. Mobile App Tools Used by Bot Operators
Mobile Device Farms Mobile Device Emulators Debugging Software
Mobile device emulators that mimic
human users
Testing systems that mimic human users
on mobile devices (e.g. AWS Device
Farm, Google Firebase Testing Lab)
Debugging software used for
tampering with SDKs/reverse
engineering the app
8. About Distil Networks
Industry Expertise
● Invented the category
● The recognized leader
● 70 airline customers
The Most Effective Technology
● Wider: Web, API, and Mobile
● Deeper: Catch more bots
● Smarter: Without impacting users
Vigilant and Dedicated Partner
● Not A Solution, Your Solution
● Unprecedented access
● An extension of your team
Bot Defense as Adaptable and Vigilant as the Threat Itself
Travel Industry Leaders Rely on Distil...
9. True or False?
You have good visibility and control
over unwanted website traffic and
transactions.
Poll
Question
12. Competitors
Content Theft
Competitive Intel
Price Scraping
Aggregators
Start-ups
Unauthorized Middlemen
Hackers / Fraudsters
Content for Fake Pages
Search Engines
Google
Bing
Yahoo
Baidu
Who is behind Web Scraping?
13. What Kind of Data is Being Scraped?
Customer data
Pricing info
Editorial content
GDS API pulls
SEO strategies
Booking engine inputs
14. Spinning (Hoarding) by Unauthorized Middlemen
Middlemen using mobile device emulators to continuously
hold seats in the airline booking engine, but not buying
Resell on a secondary market once a buyer is found
Monetary damage:
➔ Empty seats on planes
➔ Loss of add-on sales like upgrades, travel insurance,
etc. (about $20 to $40 of additional revenue per sale
for airlines*)
AIRLINE
CUSTOMER USE CASE
Spinning via
Mobile App
Emulators
Source: http://www.eyefortravel.com/mobile-and-technology/scraping-single-biggest-threat-travel-industry*
15. Application Denial of Service
OWASP AUTOMATED THREAT: DENIAL OF
SERVICE
Denial of Service Bot
Sophistication
16. DDoS vs. Application Denial of Service
Application Denial of Service
Attacks the application directly
Hard to spot because it won’t show up
as an anomaly on your firewall and
may not impact load balancer
DDoS
Attacks the ISP hosting your
application
Easier to spot because it floods
upstream infrastructure to point where
packets never arrive at the web server
18. Bad Bots Love Login Pages
OWASP AUTOMATED THREATS:
CREDENTIAL CRACKING, CREDENTIAL STUFFING
Account Takeover Bot
Sophistication
19. How Credential Stuffing Works
Over 1 billion
usernames, passwords
combinations exist in the
wild
Credential stuffing exploits
our propensity to reuse
passwords across multiple
sites.
20. Account Based Fraud
OWASP AUTOMATED THREATS:
CARDING, CARD CRACKING, CASHING OUT
Account Exploitation Bot
Sophistication
21. Travel Rewards Fraud
Dark Web listings that indicate typical price
ranges for airline and hotel loyalty accounts:
Airline loyalty accounts: $3.20 - $208
Hotel loyalty accounts: $1.50 - $45
Source: http://blog.cxloyalty.com/the-cost-of-loyalty-accounts-on-the-dark-web-how-to-protect-members
72 percent of loyalty program managers say they
have experienced an instance of loyalty program
fraud firsthand
22. Skewed Analytics and Look-to-Book Ratios
OWASP AUTOMATED THREAT: SKEWING
Sophistication level of bots
that skew analytics
23. Sophisticated Bots Appear as Human in Analytic Data
53% of bots able to load external Assets (e.g. JavaScript)
These bots will skew marketing tools such as (Google
Analytics, A/B testing, conversion tracking, etc.)
24. Skewed Analytics Leads to Misinformed Business Decisions
Inaccurate analytic data results in
Poor funnel analysis & optimization
Poor conversion rates
Inaccurate KPI tracking
Skewed look-to-book ratios
Difficulty in planning server expansion
25. The bad bot problem I'm most
concerned about:
A. Web scraping
B. Account-based fraud
C. Skewed analytics / look-to-book
D. Slowdowns and downtime
Poll
Question
26. About WMPH Vacations
At a Glance
Founded 2004 / 140 employees
More than 600,000 clients booked
9 corporate brands
30 websites
Award-Winning Mobile App
Reservation systems serve both direct
customers and 45 agents
Private label solutions
27. WMPH Technology Stack
30 different web properties
Mobile iCruise App for IOS & Android
Standardized web application stack
Employee Intranet
10 Virtual Servers on AWS
Cloud-based Phone System using 8x8 technology
Entire company is now over 90% cloud-based
API calls into everything from small cruise lines to
large Global Distribution Systems
28. WMPH Bot Challenges
Bad Bot Challenges
Aggressive web scraping caused site
slowdowns
API scraping almost took a cruise partner
offline
Constant barrage of SQL injection attack
attempts caused lots of noise in logs
Spam on cruise inquiry forms polluted
backend systems
Bots skewed conversion metrics
29. Tried Several Approaches to Solve the Problem...
Put CAPTCHAs on Forms Looked for Patterns Blocked IPs in AWS ELB
Creates a poor user experience Bots appear human in logs Defeated by distributed IP attacks
Defeated by advanced bots Labor intensive Defeated by low and slow crawlers
Defeated by CAPTCHA farms Distributed attacks hard to pinpoint Defeated by peer-to-peer / proxies
Reduces conversions rates Reactive in nature Reactive in nature
30. WMPH Vacations Selection Criteria
Bot Detection and Mitigation Solution Requirements
Block web scrapers without impacting human visitors or
good bots like Googlebot
Increase website availability and speed
Simple setup
Little or no maintenance; “self-optimizing” solution
Protect APIs powering our websites and mobile apps
31. Protect our web and mobile API servers
Fingerprint device
Verify browser
Verify device
Verify human
Verify Mobile Device ID
Verify mobile app
Verify device
Verify human
Stop bot operators (using mobile device farms,
device emulators, etc.) from accessing the API
servers that power our mobile apps
Prevent scrapers from hitting our
APIs through our website or by going
directly to our API servers
32. WMPH Results with Distil
40% increase in response times; no slowdowns
since deploying Distil
Improved partner relationships
Leads up 100% – No more spam – Only serving
CAPTCHAs to bots
Conversion rates up 22%
Self-tuning, proactive approach saving 20 hours
per month
Protecting login of company intranet
37. Best Practices and Lessons Learned
IT and marketing need to partner on solving
the bad bot problem.
Review the Distil logs daily.
Blacklist aggressive bot IP numbers
Report aggressive IPs to their respective
IPSs. Follow up, and follow up, and follow
up.
Distil support will give you a list of urls being
hit by the bad bots. This will help you
determine what they are trying to do.
Don’t whitelist your office IP right away.
About 20% of website traffic is made up of bad bots
It’s this 20% that causes the majority of problems on websites as described in OWASPs Handbook of automated threats-- an 80 page document which we’ve condensed into a single slide for you which you can reference after the webinar.
All these threats impact the travel industry.
But we’re going to spend most of our time talking about the impact of web scraping, account-based fraud, and skewed analytics and look-to-book ratios.
Then Antoine will take you through how he won the battle against bad bots.
Of that 20% of bad bot trafric, about 75% of it is what we would call advanced persisent bots.
These bots mimic human behavior, load javascript, cycle through IPs and user agents and hide behind proxy and peer to peer networks.
They slip through the cracks of most tools.
But it’s not just traffic from desktops and laptops that you have to worry about. We’re seeing a big increase, 43%, in bots identifying themselves as mobile browsers.
And it’s not just your website that you have to worry about as we’re seeing an increase in use of mobile app tools that the bad buys use to attack the APIs that power mobile apps.
For example Mobile Device Farms, Mobile Device Emulators that masquerade as a real mobile application, and debugging software that’s used to reverse engineer mobile SDKs.
Rob
The goal here was to learn if our thesis was correct and discover the gaps in our knowledge so they could be addressed
97 out of 100 sites are scraped. It is the most prevalent automated threat there is.
Scraping resources just a click away...
Anyone with basic computer skills can get into the game
Inexpensive relative to the value of the content they steal
Difficult or impossible to prosecute
Customer data - Itineraries, contact information
Pricing info - prices, availability, vendors
Editorial content - unique articles about destinations, venues, etc.
incentive packages - bundled deals which are typically used by brands to overcome scrapers can themselves be scraped and mimiced
User reviews - think travel advisor reviews
Keyword placement
SEO optimization
A third (33%) of all websites experienced unexpected spikes in bad bot traffic that can lead to slowdowns or downtime. We’ve defined such a spike as an event that equates to at least three times the 30-day rolling daily average of bad bot traffic.
With a volumetric DDoS attack the website is flooded, preventing access to its services. It's a layer 3 attack and easy to spot; it can flood your upstream infrastructure to the point where the packets never arrive at the web server.
In contrast, an application denial of service event occurs when bots programmatically abuse the business logic of your website. This happens at layer seven, so you won't notice it on your firewall and your load balancer will be just fine. It's the web application and backend that keels over.
96% of websites have credential bots on their login page.
Credential cracking, also known as brute force dictionary attack is when they cycle through the dictionary to try and guess the password. This is why you’ll often see sites that require passwords with upper and lowercase letters, numbers and special characters.
Credential stuffing is when the bad guys take a bunch of stolen usernames and passwords and trys them out across multiple sites. They’re exploiting our propensity to reuse passwords across multiple sites.
Step 1 is to procure a list of username and password combinations. These can easily be found in underground forums on the dark web. A billion of these username and password combos were stolen from Yahoo alone. Step 2 requires the use of a bot to cycle through the list looking for matches—meaning those credentials will get you into the account. Step 2 can be done on any number of sites. Step 3 is to discard the credentials that didn’t work for a given site so you’re left with only verified credentials. Step 4 is to either sell the verified list on the dark web or to use the credentials to pilfer the account.
Once they get past the website the bad guys use bots to verify stolen payment card data in bulk, identify missing dates and security codes, or purchase goods or transfer loyalty points.
Lastly, let’s talk about skewed analytics. We see the type of bots that skew analytics on 94% of sites we monitor.
Rob
The goal here was to learn if our thesis was correct and discover the gaps in our knowledge so they could be addressed
Talk about why we have 30 sites, mobile sites being converted to responsive.
For Antoine:
How much of the solution runs on its own vs you having to manage, vs services from Distil Networks?
For Elias: