Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Как Web-акселератор акселерирует ваш сайт / Александр Крижановский (Tempesta Technologies)

423 views

Published on

В докладе я расскажу, что такое Web-акселератор, он же reverse proxy и он же - фронтенд. Как следует из названия, он ускоряет сайт. Но за счет чего он это делает? Какие они, вообще, бывают? Что они умеют, а что нет? В чем особенности каждого из решений? И, вообще, постараюсь рассказать о них вглубь и вширь.

Еще я расскажу про еще один Open Source Web-акселератор - Tempesta FW. Уникальность проекта в том, что это гибрид Web-акселератора и файервола, разрабатываемый специально для обработки и фильтрации больших объемов HTTP трафика. Основные сценарии использования системы — это защита от DDoS прикладного уровня и просто доставка больших объемов HTTP трафика малыми затратами на оборудование.

- Что такое Web-акселератор, зачем он был придуман и как понять когда он нужен;
- Типичный функционал reverse proxy, его отличия от Web-сервера;
- Упомянем про SSL акселераторы;
- Заглянем вглубь HTTP, и как он управляет кэшированием и проксированием, что может быть закэшированно, а что - нет;
- Мы сравним наиболее популярные акселераторы (Nginx, Varnish, Apache Traffic Server, Apache HTTPD, Squid) по фичам и внутренностям;
- Зачем нужен еще один Web-акселератор Tempesta FW, и в чем его отличие от других акселераторов.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Как Web-акселератор акселерирует ваш сайт / Александр Крижановский (Tempesta Technologies)

  1. 1. Web-acceleration Technologies Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  2. 2. Who am I? CEO & CTO at NatSys Lab & Tempesta Technologies Tempesta Technologies (Seattle, WA) ● Subsidiary of NatSys Lab. developing Tempesta FW – a first and only hybrid of HTTP accelerator and firewall for DDoS mitigation & WAF NatSys Lab (Moscow, Russia) ● Custom software development in: • high performance network traffic processing • databases
  3. 3. Web-content Acceleration Web-framework caching (e.g. Django caching) => whole site, pages, compiler objects, templates, any data Downstream caching (RFC 7234, e.g. mod_cache): reduces origin server requests (thundering herd) => whole site, pages ● forward proxy cache (e.g. Squid, ATS) ● reverse proxy (Web-accelerator) cache (e.g. Squid, Varnish etc.) SSL acceleration Private caching (Web-browser cache) ...eAccelerator, xslcache etc.
  4. 4. Web-acceleration
  5. 5. Web-caching (how Web-accelerator accelerates your site)
  6. 6. To Cache static (e.g. video, images, CSS, HTML) some dynamic Negative results (e.g. 404) Permanent redirects Incomplete results (206, RFC 7233 Range Requests) Methods: GET, POST, whatever GET /script?action=delete – this is your responsibility (but some servers don't cache URIs w/ arguments)
  7. 7. Not to Cache Responses to Authenticated requests Unsafe methods (RFC 7231 4.2.1) (safe methods: GET, HEAD, OPTIONS, TRACE) Explicit no-cache directive Set-Cookie (?)
  8. 8. Cache POST? Idempotent POST (e.g. web-serarch) – just like GET Non-idempotent POST (e.g. blog comment) – cache response for following GET RFC 7234 4.4: URI must be invalidated
  9. 9. Cache Cookies? Varnish, Nginx, ATS don't cache responses w/ Set-Cookie by default mod_cache and Squid do cache responses w/ Set-Cookie by default RFC 7234: Note that the Set-Cookie response header field [RFC6265] does not inhibit caching; a cacheable response with a Set-Cookie header field can be (and often is) used to satisfy subsequent requests to caches. Servers who wish to control caching of these responses are encouraged to emit appropriate Cache- Control response header fields.
  10. 10. Cache Entries Freshness RFC 7234: freshness_lifetime > current_age Freshness calculation: ● Last-Modified – when a resource was modified at origin server ● Date – response generation timestamp ● Age – the age the object has been in proxy cache ● Expires – when a cache entry expires Revalidation: Conditional requests (RFC 7232, e.g. If-Modified-Since) Background activity or on-request job
  11. 11. Stale Cache Entries Sometimes is OK, e.g. Nginx: proxy_cache_use_stale Expired responses Invalidated by unsafe methods Error responses for the URI Timeout Etc.
  12. 12. Cache-Control A cache MUST obey the requirements of the Cache-Control directives Freshness and staleness control Explicit cache/no-cache Private caching (browser vs proxy) caching – not privacy! Pragma: no-cache
  13. 13. Vary (secondary keys say hello to databases) Accept-Language – return localized version of page (no need /en/index.html) User-Agent – mobile vs desktop (bad!) Accept-Encoding – don't send compressed page if browser doesn't understaind it Request headers normalization is required!
  14. 14. Buffering vs Streaming Buffering ● Seems everyone by default ● Performance degradation on large messages ● 200 means Ok, not incomplete response Streaming ● Tengine (patched Nginx) w/ proxy_request_buffering & fastcgi_request_buffering ● More performance, but 200 doesn't mean full response
  15. 15. Cache Storage Plain files (Nginx, Squid, Apache HTTPD) ● Meta-data in RAM ● Filesystem database ● Easy to manage Database (Apache Traffic Server, Tempesta FW) ● Faster access Persistency (experimental in Varnish, upcoming in Tempesta FW) ● no real consistency
  16. 16. Cache Storage: mmap(2) Alistair Wooldrige, “BBC Digital Media Distribution: How we improved throughput by 4x”, http://www.bbc.co.uk/blogs/internet/entries/17d22fb8-cea2-49d5-be14-86e7 48 CPUs, 512GB RAM, 8TB SSD
  17. 17. Cache Key Primary key: URI path + Host POST key: URI path + Host + body Secondary (Vary) key: any headers E.g. Nginx custom cache key: proxy_cache_key "$request_uri|$request_body"
  18. 18. Cache Purging $ curl -X PURGE <URL> Not RFC-defined Squid, Varnish, Nginx (by wildcard) Use case: 1. Update some resource at upstream (POST can invalidate an entry) 2. Send PURGE & GET reuests to the cache 3. Now cache is up to date
  19. 19. Cache Busting No access to Web-accelerator or Web-server E.g. force users to use a new version of CSS or Ad? <?php $css_ver=”1.1”; ?> <link rel=”stylesheet” href=”my.css?v=<?php echo $css_ver; ?>”>
  20. 20. IO & multitasking Bryan Call, “Choosing A Proxy Server”, ApacheCon 2014 ATS Nginx Squid Varnish Apache HTTPD Tempesta Threads X per-session! X Events X X X partial X ~ Processes X X X CPU locality X
  21. 21. Sessions vs Linux RFS RFS: Receive Flow Steering, linux/Documentation/networking/scaling.txt
  22. 22. Logging ATS, Nginx, Squid, Apache HTTPD: write(2) Varnish: logs in shared memory → varnishlog
  23. 23. SSL Accelerators SSL termination: Nginx, HAProxy, stud, bud Varnish – no SSL by design Solaris KSSL – kernel SSL termination (up to 30% performance boost) Tempesta SSL – kernel SSL termination (upcoming) Intel QuickAssist Technology (QAT) – crypto & compression acceleration ● Xeon e5-2600v2 (Ivy Bridge), 89xx chipset ● OpenSSL & zLib patches + user-space library
  24. 24. Tempesta FW: Challenges Normal Web-servers deliver content There are a lot of bad guys in modern Internet There are also good guys filtering bad guys out
  25. 25. Good Guys: WAF Technologies: XHTML, WSDL, Machine learning, Regexps Platforms: Nginx, Apache Traffic Server etc.
  26. 26. WAF: Performance Issues Nginx + regexps 0 5000 10000 15000 20000 25000 30000 35000 0 5000 10000 15000 20000 25000 30000 4168 4133 6855 HyperScan Vanilla Nginx PCRE-JIT Target RPS ActualRPS
  27. 27. WAF: Performance Issues 15K HTTP RPS on 24 cores (but >100KRPS would be nice)
  28. 28. WAF: Acceleration Again
  29. 29. Good Guys: Anti-DDoS CDN Technologies: DPI or Firewall + Machine Learning Platforms: Nginx
  30. 30. Application Layer DDoS Service from Cache Rate limit Nginx 22us 23us Fail2Ban: write to the log, parse the log, write to the log, parse the log…
  31. 31. Application Layer DDoS Service from Cache Rate limit Nginx 22us 23us Fail2Ban: write to the log, parse the log, write to the log, parse the log… - really in 21th century?!
  32. 32. Web-accelerator Capabilities Nginx, Varnish, Apache Traffic Server, Squid, Apache HTTPD etc. ● cache static Web-content ● load balancing ● rewrite URLs, ACL, Geo, filtering etc. ● C10K Kernel-mode Web-accelerators: TUX, kHTTPd ● basically the same sockets and threads ● zero-copy
  33. 33. Web-accelerator Capabilities Nginx, Varnish, Apache Traffic Server, Squid, Apache HTTPD etc. ● cache static Web-content ● load balancing ● rewrite URLs, ACL, Geo, filtering? etc. ● C10K – is it a problem for bot-net? SSL? ● what about tons of 'GET / HTTP/1.0nn'? Kernel-mode Web-accelerators: TUX, kHTTPd ● basically the same sockets and threads ● zero-copy → sendfile() - not needed
  34. 34. Tempesta FW: Yet Another Web-accelerator First and only hybrid of HTTP accelerator and FireWall FireWall: layer 3 (IP) – layer 7 (HTTP) filter FrameWork: high performance and flexible platform to build intelligent DDoS mitigation systems and Web Application Firewalls (WAF) Directly embedded into Linux TCP/IP stack NUMA-aware x86-64 cache conscious Web-cache on huge pages This is Open Source (GPLv2)
  35. 35. Frang: HTTP DoS Rate limits ● request_rate, request_burst ● connection_rate, connection_burst ● concurrent_connections Slow HTTP ● client_header_timeout, client_body_timeout ● http_header_cnt ● http_header_chunk_cnt, http_body_chunk_cnt
  36. 36. Frang: WAF Length limits ● http_uri_len ● http_field_len ● http_body_len Content validation ● http_ct_required ● http_ct_vals ● http_methods
  37. 37. Frang: Filtering
  38. 38. Frang: Filtering
  39. 39. Frang: Filtering
  40. 40. Frang: Filtering
  41. 41. Load Balancing Dynamic reconnections Configurable number of upstream connections Schedulers ● HTTP (server groups) – Wildcards, full match, prefix – Method, URI, Host – Other headers ● Round-Robin (inside server group) ● Rendezvous hashing (inside server group)
  42. 42. Configuration Example srv_group static { # sched=round-robin server 10.10.0.1:8080; server [fc00::2]:8081; } srv_group dynamic sched=hash { server 10.10.0.3:8080; # conns_n = 4 server [fc00::4]:8081 conns_n=8; } srv_group black_hole { } sched_http_rules { match black_hole hdr_raw prefix "X-Bad:"; match static uri prefix "/static/"; match dynamic * * *; }
  43. 43. Sticky Cookie User identification Enforce: HTTP 302 redirect sticky name=__tfw_user_id__ enforce;
  44. 44. Prerequisites SSE 4.2 (“sse4_2” in /proc/cpuinfo) Huge pages (“pse” in /proc/cpuinfo) Custom Linux kernel (KVM or dedicated server)
  45. 45. Build Kernel $ git clone https://github.com/tempesta-tech/linux-4.1-tfw.git $ cd linux-4.1-tfw $ make && make modules && make modules_install && make install $ reboot
  46. 46. Build & Run Tempesta $ git clone https://github.com/natsys/tempesta.git $ cd tempesta && make $ cat > etc/tempesta_fw.conf server 127.0.0.1:8080; # upstream cache 1; # cache sharding ^D $ ./scripts/tempesta.sh --start
  47. 47. Thanks! Web site: http://tempesta-tech.com (Powered by Tempesta since 06.06.16) Availability: https://github.com/tempesta-tech/tempesta Blog: http://natsys-lab.blogspot.com Contact: ak@tempesta-tech.com We are hiring!

×