Atmosphere 2014: Helping the Internet to scale since 1998 - Paweł Kuśmierski

283 views

Published on

Akamai runs a network of 150.000 servers distributed among 2.000 locations in 92 countries. It’s constantly outputting Terabits per second, accounting for between 15 and 30% of the Internet’s WWW traffic. Talk will cover the principles of operation of Akamai’s Inteligent Platform, aspects of monitoring and managing consistent configuration on such scale. Speaker will share interesting technical details and general ideas behind the scalability and performance of the Akamai network.

Paweł Kuśmierski - Pawel Kusmierski is a Senior Engineer and Lead of Akamai’s System Operations in Krakow, Poland. He’s responsible for operational oversight of Internet Mapping and Distributed Storage systems. In the past he interned at Google’s Mountain View office as a Software Engineer. He lives with his wife and three year old son in Krakow. Occasionally he finds time to fly sailplanes and build electronic devices.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
283
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Atmosphere 2014: Helping the Internet to scale since 1998 - Paweł Kuśmierski

  1. 1. Helping the Internet to scale since 1998 Paweł Kuśmierski, Senior Engineer, Lead System Operations, Akamai Krakow
  2. 2. ©2013 AKAMAI | FASTER FORWARD TM What’s Akamai?  Founded at MIT in 1998 by prof. Tom Leighton and Danny Lewin  Akamai has the world’s most distributed Internet platform (over 150.000 servers, deployed in 2000 locations in 92 countries)  The Akamai Intelligent Platform is leading cloud platform delivering beteween 15% and 30% of the worldwide web traffic.  Accelerating Daily Traffic of:  10+ Tbps  20+ million hits per second  2+ trillion deliveries per day  30+ petabytes/day  10+ million concurrent streams
  3. 3. ©2013 AKAMAI | FASTER FORWARD TM Who do we serve?  The top 30 media & entertainment companies  All 20 top global eCommerce sites  7 of the top 10 world banks  9 of the top 10 largest newspapers  9 out of 10 top social media sites  6 of the top 7 computer manufacturers  All of the top anti-virus companies
  4. 4. ©2013 AKAMAI | FASTER FORWARD TM What’s the idea? • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web • ACMS: Akamai Configuration Management System • Query (various publications, Scaling a Monitoring Infrastructure for the Akamai Network) http://www.akamai.com/html/perspectives/techpubs.html
  5. 5. ©2013 AKAMAI | FASTER FORWARD TM Why and how is Akamai helping the Internet to scale? The Internet wasn’t designed for the ways in which we use it today. • No single network dominates the Internet traffic with the largest controlling less than 5% of the access traffic. Trobule: • Outages (cable cuts, de-peering) • Congestion (packet loss) • Lack of scalability • Slow adaptability (IPv6 first proposed in 1998) • Lack of security
  6. 6. ©2013 AKAMAI | FASTER FORWARD TM 10’000 feet view of Akamai
  7. 7. ©2013 AKAMAI | FASTER FORWARD TM Akamai Cloud Optimization The User Always Connects to a Nearby Akamai Server Challenges with Cloud Adoption Cloud servers reside in big data centers, farther away from the end user… ...resulting in decreased performance and security End User Cloud Datacenter Akamai Edge Servers
  8. 8. ©2013 AKAMAI | FASTER FORWARD TM End User Problem 1 Route to datacenter may perform poorly Cloud Datacenter X X Cloud Optimization: Route Selection
  9. 9. ©2013 AKAMAI | FASTER FORWARD TM End User Solution Akamai SureRoute to optimize route Problem 1 Route to datacenter may perform poorly Akamai Edge Servers X Cloud Datacenter Cloud Optimization: Route Selection
  10. 10. ©2013 AKAMAI | FASTER FORWARD TM PacketLoss 50% 40% 30% 20% 10% 0% Jan 25 Jan 27 Jan 29 Jan 31 Feb 02 Feb 04 Feb 06 Feb 08 Feb 10 Feb 12 Feb 14 Feb 16 Feb 18 Generic Internet Akamai Akamai SureRoute Makes a Big Difference Packet loss into India after MidEast cable cut
  11. 11. ©2013 AKAMAI | FASTER FORWARD TM End User Solution Akamai Communication Protocol Problem 2 Many round trips for initial large download Cloud Datacenter Akamai Edge Servers Cloud Optimization: Communication Protocol
  12. 12. ©2013 AKAMAI | FASTER FORWARD TM Attacks on Akamai Customers • Typical Attack Size: 3-10 Gbps • Large Attack Size: 100-200 Gbps • Attacks are originating from all geographies and are moving between geographies during the attack 2009 2010 2011 0 100 200 300 400 500 600 NumberofAttacks
  13. 13. ©2013 AKAMAI | FASTER FORWARD TM Denial of Service (DoS); 32% SQL Injection (SQLi); 21% Cross-Site Scripting (XSS); 9% Brute Force; 4% Cross-Site Request Forgery (CSRF); 4% Process Automation; 4% Known Vulnerability; 4% Misconfiguration; 3% Stolen Credentials; 1% Banking Trojan; 1% Predictable Resource Location; 1% Content Spoofing; 1% Abuse of Functionality; 1% DNS Hijacking; 1% Malware; 1% Insufficient Authentication; 1% OS Commanding; 1% Unknown; 10% Attack Methods Source: TrustWave - 2010 - Web Hacking Incident Database The Threat is Varied & Easier to Launch 74% of companies experienced one or more DDoS attacks in the past year. 31% of these attacks resulted in service disruption. New attack tools such as Low Orbit Ion Cannon Users download the tool, insert the target URL or IP and press GO!
  14. 14. ©2013 AKAMAI | FASTER FORWARD TM (Cloud) Datacenters End User 1 10 100 10000 Origin Traffic 1000 Akamai Traffic 10 100 10000 1000 Web Application With a Perimeter Defense COVERED 1
  15. 15. ©2013 AKAMAI | FASTER FORWARD TM Customer – PROTECTED U.S. Government Customer 1 U.S. Government Customer 2 U.S. Government Customer 3 U.S. Government Customer 4 U.S. Government Customer 5 U.S. Government Customer 6 Peak Traffic Times Above Normal Traffic July 4 th – 7 th 2009 DDoS Attack 400,000 Korean Bots Attack Key U.S. Government Web Sites 598x 369x 39x 19x 9x 6x 124 Gbps 32 Gbps 9 Gbps 9 Gbps 2 Gbps 1.9 Gbps
  16. 16. ©2013 AKAMAI | FASTER FORWARD TM 08:00 16:00 0:00 08:000:00 16:00 25 50 75 100 125 AttackSize—Gbps July 5, 2009 16:00 Customer notified 20:00 Attack grows rapidly 23:00 Mitigation measures engaged Spike 1 Spike 2 Spike 3 Unique IPs 21:00 Akamai identifies sources 23:50 Peak pageviews July 4 th – 7 th 2009 DDoS Attack 400,000 Korean Bots Attack Key U.S. Government Web Sites
  17. 17. ©2013 AKAMAI | FASTER FORWARD TM Under the hood
  18. 18. ©2013 AKAMAI | FASTER FORWARD TM • Syntax check • File liveness checks • Check number of objects changing • Deploy to a subset • Check for machine liveness (do we have a representative sample?) • Check for relative change in machine liveness • Check for service health • Check relative changes in response codes % • Check for self-suspension Configuration change deployments
  19. 19. ©2013 AKAMAI | FASTER FORWARD TM Ok, But how? • Various web infrastructure services • Over 150,000 machines • Over 1 million distributed components • Over 1000 autonomous systems • 24/7/365 operation • Failures, usage changes • Massive, real-time monitoring
  20. 20. ©2013 AKAMAI | FASTER FORWARD TM Query • Distributed data collection • Aggregation at several hundred points • SQL-style interface
  21. 21. ©2013 AKAMAI | FASTER FORWARD TM A Sample Query SELECT c.continent_name, SUM(l.hits) hits FROM load_info l, region_data r, continent_data c WHERE l.georegion=r.id AND r.continent=c.continent GROUP BY c.continent_name ORDER BY hits DESC; c.continent_name hits ---------------- --------- North America 4,620,551 Europe 3,392,102 South America 655,175 Asia 552,258 Africa 106,781 Oceania 39,905 Antarctica 135
  22. 22. ©2013 AKAMAI | FASTER FORWARD TM Query at the Edge • Each machine collects its own data • Many processes may publish • Snapshots every two minutes
  23. 23. ©2013 AKAMAI | FASTER FORWARD TM Cluster proxies • Collect data for the whole cluster • Include themselves
  24. 24. ©2013 AKAMAI | FASTER FORWARD TM Top-Level Aggregators • Collect data for the whole network • Snapshots every two minutes • Static tables for data that doesn’t change much
  25. 25. ©2013 AKAMAI | FASTER FORWARD TM SQL parsers • Get tables from 1 TLA • Only get the ones we need • Answer queries based on them
  26. 26. ©2013 AKAMAI | FASTER FORWARD TM Aggregator Sets • Span different parts of the network • Designated for different purposes • Several replicated TLAs & SQLs • Combined TLA/SQLs • Shared hostnames • Help meet reliability guarantees • Help tolerate faults & keep localized
  27. 27. ©2013 AKAMAI | FASTER FORWARD TM Scale • Several hundred TLAs, SQLs, TLA/SQLs • Thousands of queries per minute • Tens of GB in the system • Up to 16 GB per TLA (and growing fast) • Internet usage • Network growth • Customer growth • Data/customer • More queries • Age of data typically a few minutes
  28. 28. ©2013 AKAMAI | FASTER FORWARD TM Result: 2-100X compression Result: 2-100X compression Result: 2-100X compression Download the Akamai Internet Visualization app in the Apple store
  29. 29. ©2013 AKAMAI | FASTER FORWARD TM Thanks! Paweł Kuśmierski, pkusmier@akamai.com

×