Helping the Internet to scale since 1998
Paweł Kuśmierski, Senior Engineer, Lead
System Operations, Akamai Krakow
©2013 AKAMAI | FASTER FORWARD
TM
What’s Akamai?
 Founded at MIT in 1998 by prof. Tom Leighton and Danny Lewin
 Akamai has the world’s most distributed Internet platform (over 150.000 servers, deployed in 2000 locations in 92
countries)
 The Akamai Intelligent Platform is leading cloud platform delivering beteween 15% and 30% of the worldwide web traffic.
 Accelerating Daily Traffic of:
 10+ Tbps
 20+ million hits per second
 2+ trillion deliveries per day
 30+ petabytes/day
 10+ million concurrent streams
©2013 AKAMAI | FASTER FORWARD
TM
Who do we serve?
 The top 30 media & entertainment companies
 All 20 top global eCommerce sites
 7 of the top 10 world banks
 9 of the top 10 largest newspapers
 9 out of 10 top social media sites
 6 of the top 7 computer manufacturers
 All of the top anti-virus companies
©2013 AKAMAI | FASTER FORWARD
TM
What’s the idea?
• Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
• ACMS: Akamai Configuration Management System
• Query (various publications, Scaling a Monitoring Infrastructure for the Akamai Network)
http://www.akamai.com/html/perspectives/techpubs.html
©2013 AKAMAI | FASTER FORWARD
TM
Why and how is Akamai helping the Internet to scale?
The Internet wasn’t designed for the ways in which we use it today.
• No single network dominates the Internet traffic with the largest
controlling less than 5% of the access traffic.
Trobule:
• Outages (cable cuts, de-peering)
• Congestion (packet loss)
• Lack of scalability
• Slow adaptability (IPv6 first proposed in 1998)
• Lack of security
©2013 AKAMAI | FASTER FORWARD
TM
10’000 feet view of Akamai
©2013 AKAMAI | FASTER FORWARD
TM
Akamai Cloud Optimization
The User Always Connects to a Nearby Akamai Server
Challenges with Cloud Adoption
Cloud servers reside in big data centers,
farther away from the end user…
...resulting in decreased performance and
security
End User
Cloud Datacenter
Akamai Edge Servers
©2013 AKAMAI | FASTER FORWARD
TM
End User
Problem 1
Route to datacenter
may perform poorly
Cloud Datacenter
X
X
Cloud Optimization: Route Selection
©2013 AKAMAI | FASTER FORWARD
TM
End User
Solution
Akamai SureRoute
to optimize route
Problem 1
Route to datacenter
may perform poorly
Akamai Edge Servers
X
Cloud Datacenter
Cloud Optimization: Route Selection
©2013 AKAMAI | FASTER FORWARD
TM
PacketLoss
50%
40%
30%
20%
10%
0%
Jan
25
Jan
27
Jan
29
Jan
31
Feb
02
Feb
04
Feb
06
Feb
08
Feb
10
Feb
12
Feb
14
Feb
16
Feb
18
Generic Internet
Akamai
Akamai SureRoute Makes a Big Difference
Packet loss into India after MidEast cable cut
©2013 AKAMAI | FASTER FORWARD
TM
End User
Solution
Akamai Communication
Protocol
Problem 2
Many round trips for
initial large download
Cloud Datacenter
Akamai Edge Servers
Cloud Optimization: Communication Protocol
©2013 AKAMAI | FASTER FORWARD
TM
Attacks on Akamai Customers
• Typical Attack Size: 3-10 Gbps
• Large Attack Size: 100-200 Gbps
• Attacks are originating from all
geographies and are moving between geographies during the attack
2009 2010 2011
0
100
200
300
400
500
600
NumberofAttacks
©2013 AKAMAI | FASTER FORWARD
TM
Denial of Service (DoS); 32%
SQL Injection (SQLi); 21%
Cross-Site Scripting (XSS); 9%
Brute Force; 4%
Cross-Site Request Forgery (CSRF); 4%
Process Automation; 4%
Known Vulnerability; 4%
Misconfiguration; 3%
Stolen Credentials; 1%
Banking Trojan; 1%
Predictable Resource Location; 1%
Content Spoofing; 1%
Abuse of Functionality; 1%
DNS Hijacking; 1%
Malware; 1%
Insufficient Authentication; 1%
OS Commanding; 1%
Unknown; 10%
Attack Methods
Source: TrustWave - 2010 - Web Hacking Incident Database
The Threat is Varied & Easier to Launch
74% of companies experienced one or more DDoS
attacks in the past year.
31% of these attacks resulted in service disruption.
New attack tools such as Low Orbit Ion Cannon
Users download the tool, insert the target URL or IP
and press GO!
©2013 AKAMAI | FASTER FORWARD
TM
(Cloud) Datacenters
End User
1
10
100
10000
Origin Traffic
1000
Akamai Traffic
10
100
10000
1000
Web Application With a Perimeter Defense
COVERED
1
©2013 AKAMAI | FASTER FORWARD
TM
Customer – PROTECTED
U.S. Government Customer 1
U.S. Government Customer 2
U.S. Government Customer 3
U.S. Government Customer 4
U.S. Government Customer 5
U.S. Government Customer 6
Peak Traffic
Times Above Normal Traffic
July 4
th
– 7
th
2009 DDoS Attack
400,000 Korean Bots Attack Key U.S. Government Web Sites
598x
369x
39x
19x
9x
6x
124 Gbps
32 Gbps
9 Gbps
9 Gbps
2 Gbps
1.9 Gbps
©2013 AKAMAI | FASTER FORWARD
TM
08:00 16:00 0:00 08:000:00 16:00
25
50
75
100
125
AttackSize—Gbps
July 5, 2009
16:00 Customer notified
20:00 Attack grows rapidly
23:00 Mitigation measures engaged
Spike 1
Spike 2
Spike 3
Unique IPs
21:00 Akamai identifies sources
23:50 Peak pageviews
July 4
th
– 7
th
2009 DDoS Attack
400,000 Korean Bots Attack Key U.S. Government Web Sites
©2013 AKAMAI | FASTER FORWARD
TM
Under the hood
©2013 AKAMAI | FASTER FORWARD
TM
• Syntax check
• File liveness checks
• Check number of objects changing
• Deploy to a subset
• Check for machine liveness (do we have a representative sample?)
• Check for relative change in machine liveness
• Check for service health
• Check relative changes in response codes %
• Check for self-suspension
Configuration change deployments
©2013 AKAMAI | FASTER FORWARD
TM
Ok, But how?
• Various web infrastructure services
• Over 150,000 machines
• Over 1 million distributed components
• Over 1000 autonomous systems
• 24/7/365 operation
• Failures, usage changes
• Massive, real-time monitoring
©2013 AKAMAI | FASTER FORWARD
TM
Query
• Distributed data collection
• Aggregation at several hundred points
• SQL-style interface
©2013 AKAMAI | FASTER FORWARD
TM
A Sample Query
SELECT
c.continent_name,
SUM(l.hits) hits
FROM
load_info l,
region_data r,
continent_data c
WHERE
l.georegion=r.id AND
r.continent=c.continent
GROUP BY
c.continent_name
ORDER BY
hits DESC;
c.continent_name hits
---------------- ---------
North America 4,620,551
Europe 3,392,102
South America 655,175
Asia 552,258
Africa 106,781
Oceania 39,905
Antarctica 135
©2013 AKAMAI | FASTER FORWARD
TM
Query at the Edge
• Each machine collects its own data
• Many processes may publish
• Snapshots every two minutes
©2013 AKAMAI | FASTER FORWARD
TM
Cluster proxies
• Collect data for the whole cluster
• Include themselves
©2013 AKAMAI | FASTER FORWARD
TM
Top-Level Aggregators
• Collect data for the whole network
• Snapshots every two minutes
• Static tables for data that doesn’t change much
©2013 AKAMAI | FASTER FORWARD
TM
SQL parsers
• Get tables from 1 TLA
• Only get the ones we need
• Answer queries based on them
©2013 AKAMAI | FASTER FORWARD
TM
Aggregator Sets
• Span different parts of the network
• Designated for different purposes
• Several replicated TLAs & SQLs
• Combined TLA/SQLs
• Shared hostnames
• Help meet reliability guarantees
• Help tolerate faults & keep localized
©2013 AKAMAI | FASTER FORWARD
TM
Scale
• Several hundred TLAs, SQLs, TLA/SQLs
• Thousands of queries per minute
• Tens of GB in the system
• Up to 16 GB per TLA (and growing fast)
• Internet usage
• Network growth
• Customer growth
• Data/customer
• More queries
• Age of data typically a few minutes
©2013 AKAMAI | FASTER FORWARD
TM
Result:
2-100X
compression
Result:
2-100X
compression
Result:
2-100X
compression
Download the Akamai Internet
Visualization app in the Apple store
©2013 AKAMAI | FASTER FORWARD
TM
Thanks!
Paweł Kuśmierski, pkusmier@akamai.com

Atmosphere 2014: Helping the Internet to scale since 1998 - Paweł Kuśmierski

  • 1.
    Helping the Internetto scale since 1998 Paweł Kuśmierski, Senior Engineer, Lead System Operations, Akamai Krakow
  • 2.
    ©2013 AKAMAI |FASTER FORWARD TM What’s Akamai?  Founded at MIT in 1998 by prof. Tom Leighton and Danny Lewin  Akamai has the world’s most distributed Internet platform (over 150.000 servers, deployed in 2000 locations in 92 countries)  The Akamai Intelligent Platform is leading cloud platform delivering beteween 15% and 30% of the worldwide web traffic.  Accelerating Daily Traffic of:  10+ Tbps  20+ million hits per second  2+ trillion deliveries per day  30+ petabytes/day  10+ million concurrent streams
  • 3.
    ©2013 AKAMAI |FASTER FORWARD TM Who do we serve?  The top 30 media & entertainment companies  All 20 top global eCommerce sites  7 of the top 10 world banks  9 of the top 10 largest newspapers  9 out of 10 top social media sites  6 of the top 7 computer manufacturers  All of the top anti-virus companies
  • 4.
    ©2013 AKAMAI |FASTER FORWARD TM What’s the idea? • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web • ACMS: Akamai Configuration Management System • Query (various publications, Scaling a Monitoring Infrastructure for the Akamai Network) http://www.akamai.com/html/perspectives/techpubs.html
  • 5.
    ©2013 AKAMAI |FASTER FORWARD TM Why and how is Akamai helping the Internet to scale? The Internet wasn’t designed for the ways in which we use it today. • No single network dominates the Internet traffic with the largest controlling less than 5% of the access traffic. Trobule: • Outages (cable cuts, de-peering) • Congestion (packet loss) • Lack of scalability • Slow adaptability (IPv6 first proposed in 1998) • Lack of security
  • 6.
    ©2013 AKAMAI |FASTER FORWARD TM 10’000 feet view of Akamai
  • 7.
    ©2013 AKAMAI |FASTER FORWARD TM Akamai Cloud Optimization The User Always Connects to a Nearby Akamai Server Challenges with Cloud Adoption Cloud servers reside in big data centers, farther away from the end user… ...resulting in decreased performance and security End User Cloud Datacenter Akamai Edge Servers
  • 8.
    ©2013 AKAMAI |FASTER FORWARD TM End User Problem 1 Route to datacenter may perform poorly Cloud Datacenter X X Cloud Optimization: Route Selection
  • 9.
    ©2013 AKAMAI |FASTER FORWARD TM End User Solution Akamai SureRoute to optimize route Problem 1 Route to datacenter may perform poorly Akamai Edge Servers X Cloud Datacenter Cloud Optimization: Route Selection
  • 10.
    ©2013 AKAMAI |FASTER FORWARD TM PacketLoss 50% 40% 30% 20% 10% 0% Jan 25 Jan 27 Jan 29 Jan 31 Feb 02 Feb 04 Feb 06 Feb 08 Feb 10 Feb 12 Feb 14 Feb 16 Feb 18 Generic Internet Akamai Akamai SureRoute Makes a Big Difference Packet loss into India after MidEast cable cut
  • 11.
    ©2013 AKAMAI |FASTER FORWARD TM End User Solution Akamai Communication Protocol Problem 2 Many round trips for initial large download Cloud Datacenter Akamai Edge Servers Cloud Optimization: Communication Protocol
  • 12.
    ©2013 AKAMAI |FASTER FORWARD TM Attacks on Akamai Customers • Typical Attack Size: 3-10 Gbps • Large Attack Size: 100-200 Gbps • Attacks are originating from all geographies and are moving between geographies during the attack 2009 2010 2011 0 100 200 300 400 500 600 NumberofAttacks
  • 13.
    ©2013 AKAMAI |FASTER FORWARD TM Denial of Service (DoS); 32% SQL Injection (SQLi); 21% Cross-Site Scripting (XSS); 9% Brute Force; 4% Cross-Site Request Forgery (CSRF); 4% Process Automation; 4% Known Vulnerability; 4% Misconfiguration; 3% Stolen Credentials; 1% Banking Trojan; 1% Predictable Resource Location; 1% Content Spoofing; 1% Abuse of Functionality; 1% DNS Hijacking; 1% Malware; 1% Insufficient Authentication; 1% OS Commanding; 1% Unknown; 10% Attack Methods Source: TrustWave - 2010 - Web Hacking Incident Database The Threat is Varied & Easier to Launch 74% of companies experienced one or more DDoS attacks in the past year. 31% of these attacks resulted in service disruption. New attack tools such as Low Orbit Ion Cannon Users download the tool, insert the target URL or IP and press GO!
  • 14.
    ©2013 AKAMAI |FASTER FORWARD TM (Cloud) Datacenters End User 1 10 100 10000 Origin Traffic 1000 Akamai Traffic 10 100 10000 1000 Web Application With a Perimeter Defense COVERED 1
  • 15.
    ©2013 AKAMAI |FASTER FORWARD TM Customer – PROTECTED U.S. Government Customer 1 U.S. Government Customer 2 U.S. Government Customer 3 U.S. Government Customer 4 U.S. Government Customer 5 U.S. Government Customer 6 Peak Traffic Times Above Normal Traffic July 4 th – 7 th 2009 DDoS Attack 400,000 Korean Bots Attack Key U.S. Government Web Sites 598x 369x 39x 19x 9x 6x 124 Gbps 32 Gbps 9 Gbps 9 Gbps 2 Gbps 1.9 Gbps
  • 16.
    ©2013 AKAMAI |FASTER FORWARD TM 08:00 16:00 0:00 08:000:00 16:00 25 50 75 100 125 AttackSize—Gbps July 5, 2009 16:00 Customer notified 20:00 Attack grows rapidly 23:00 Mitigation measures engaged Spike 1 Spike 2 Spike 3 Unique IPs 21:00 Akamai identifies sources 23:50 Peak pageviews July 4 th – 7 th 2009 DDoS Attack 400,000 Korean Bots Attack Key U.S. Government Web Sites
  • 17.
    ©2013 AKAMAI |FASTER FORWARD TM Under the hood
  • 18.
    ©2013 AKAMAI |FASTER FORWARD TM • Syntax check • File liveness checks • Check number of objects changing • Deploy to a subset • Check for machine liveness (do we have a representative sample?) • Check for relative change in machine liveness • Check for service health • Check relative changes in response codes % • Check for self-suspension Configuration change deployments
  • 19.
    ©2013 AKAMAI |FASTER FORWARD TM Ok, But how? • Various web infrastructure services • Over 150,000 machines • Over 1 million distributed components • Over 1000 autonomous systems • 24/7/365 operation • Failures, usage changes • Massive, real-time monitoring
  • 20.
    ©2013 AKAMAI |FASTER FORWARD TM Query • Distributed data collection • Aggregation at several hundred points • SQL-style interface
  • 21.
    ©2013 AKAMAI |FASTER FORWARD TM A Sample Query SELECT c.continent_name, SUM(l.hits) hits FROM load_info l, region_data r, continent_data c WHERE l.georegion=r.id AND r.continent=c.continent GROUP BY c.continent_name ORDER BY hits DESC; c.continent_name hits ---------------- --------- North America 4,620,551 Europe 3,392,102 South America 655,175 Asia 552,258 Africa 106,781 Oceania 39,905 Antarctica 135
  • 22.
    ©2013 AKAMAI |FASTER FORWARD TM Query at the Edge • Each machine collects its own data • Many processes may publish • Snapshots every two minutes
  • 23.
    ©2013 AKAMAI |FASTER FORWARD TM Cluster proxies • Collect data for the whole cluster • Include themselves
  • 24.
    ©2013 AKAMAI |FASTER FORWARD TM Top-Level Aggregators • Collect data for the whole network • Snapshots every two minutes • Static tables for data that doesn’t change much
  • 25.
    ©2013 AKAMAI |FASTER FORWARD TM SQL parsers • Get tables from 1 TLA • Only get the ones we need • Answer queries based on them
  • 26.
    ©2013 AKAMAI |FASTER FORWARD TM Aggregator Sets • Span different parts of the network • Designated for different purposes • Several replicated TLAs & SQLs • Combined TLA/SQLs • Shared hostnames • Help meet reliability guarantees • Help tolerate faults & keep localized
  • 27.
    ©2013 AKAMAI |FASTER FORWARD TM Scale • Several hundred TLAs, SQLs, TLA/SQLs • Thousands of queries per minute • Tens of GB in the system • Up to 16 GB per TLA (and growing fast) • Internet usage • Network growth • Customer growth • Data/customer • More queries • Age of data typically a few minutes
  • 28.
    ©2013 AKAMAI |FASTER FORWARD TM Result: 2-100X compression Result: 2-100X compression Result: 2-100X compression Download the Akamai Internet Visualization app in the Apple store
  • 29.
    ©2013 AKAMAI |FASTER FORWARD TM Thanks! Paweł Kuśmierski, pkusmier@akamai.com