Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tempesta FW 
a FrameWork and FireWall 
for HTTP DDoS mitigation and WAF 
Alexander Krizhanovsky 
NatSys Lab. 
ak@natsys-la...
What Tempesta FW Is? 
FireWall: layer 3 (IP) – layer 7 (HTTP) filter 
FrameWork: high performance and flexible platform to...
Why? 
All is about application layer (HTTP) DDoS: 
● sometimes very small HTTP requests 
● sometimes very short-lived TCP ...
Existing Solutions: 
How To Filter HTTP requests? 
Modules on Application HTTP servers 
Firewalls 
Deep Packet Inspection ...
Existing Solutions 
Deep Packet Inspection (DPI) - not an active TCP participant 
● can't accelerate content to mitigate d...
L7 DDoS is About Performance: 
How To Accelerate Web-application 
DDoS mitigation CDN 
Filter 
● DPI 
● FireWall 
+ HTTP a...
L7 DDoS is About Performance: 
How To Accelerate Web-application 
DDoS mitigation CDN 
Filter 
● DPI 
● FireWall 
+ HTTP a...
What's Wrong With Traditional HTTP Servers: 
profile 
% symbol name 
1.5719 ngx_http_parse_header_line 
1.0303 ngx_vslprin...
What's Wrong With Traditional HTTP Servers: 
syscalls 
epoll_wait(12, {{EPOLLIN, ....}}, 512, 500) = 1 
recvfrom(3, "GET /...
What's Wrong With Traditional HTTP Servers: 
In General 
User-space & monolithic OS kernel (exokernel approach helps much)...
Synchronous Sockets 
Reading from a socket in a 
context other than deferred 
interrupt context is asynchronous 
to arriva...
Faster HTTP Parser 
Switch-driven (widespread): poor 
C-cache usage & CPU intensive 
Table-driven (with possible 
compress...
HTTP benchmark 
I7 (BPU!) 
Classic HTTP parser: 
ngx_request_line: 730ms 
ngx_header_line: 422ms 
ngx_lw_header_line: 428m...
Generic Finite State Machine (GFSM) 
Protocol FSMs context switch for ICAP etc.: 
(1) HTTP FSM: receive & process HTTP req...
Web-cache 
mmap()'ed & mlock()'ed in-memory persistent database – 
no disk IO (size is limited, but can be processed in so...
Filtering 
Dynamic persistent rules with eviction (Tempesta DB) 
Set of callbacks on all network layers: 
● classify_ipv{4...
Benchmark (bit outdated) 
10-core Intel Xeon E7-4850 
2.4GHz, 64GB RAM (One CPU 
with 10 cores 
NIC RX and TX queues bindi...
Features & TODO 
(by Mar 2015) 
Simple HTTP proxy, GFSM, classification hooks 
Load balancing 
Simple rate limiting module...
Thanks! 
Availability: https://github.com/natsys/tempesta 
Contact: ak@natsys-lab.com
Upcoming SlideShare
Loading in …5
×

Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижановский (NatSys Lab)

1,553 views

Published on

Доклад Александра Крижановского на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижановский (NatSys Lab)

  1. 1. Tempesta FW a FrameWork and FireWall for HTTP DDoS mitigation and WAF Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com
  2. 2. What Tempesta FW Is? FireWall: layer 3 (IP) – layer 7 (HTTP) filter FrameWork: high performance and flexible platform to build intelligent DDoS mitigation systems and Web Application Firewalls (WAF) First and only hybrid of HTTP accelerator and FireWall Directly embedded into Linux TCP/IP stack This is Open Source (GPLv2)
  3. 3. Why? All is about application layer (HTTP) DDoS: ● sometimes very small HTTP requests ● sometimes very short-lived TCP connections ● requests prevail responses ● a lot of concurrent connection ● need access to all network layers eg. Slow HTTP: • how many TCP segments in a request? • what are delays between the segments?
  4. 4. Existing Solutions: How To Filter HTTP requests? Modules on Application HTTP servers Firewalls Deep Packet Inspection (DPI)
  5. 5. Existing Solutions Deep Packet Inspection (DPI) - not an active TCP participant ● can't accelerate content to mitigate defended Web-resource under DDoS ● SSL termination is hard User-space HTTP accelerators are too slow due to context switches, copies and are designed for old hardware Firewalls – low layers only (IP and partially TCP) ● rules generation for app. layer is messy (fail2ban etc.) ● no dynamic rules persistency
  6. 6. L7 DDoS is About Performance: How To Accelerate Web-application DDoS mitigation CDN Filter ● DPI ● FireWall + HTTP accelerator Accelerator ● HTTP server
  7. 7. L7 DDoS is About Performance: How To Accelerate Web-application DDoS mitigation CDN Filter ● DPI ● FireWall + HTTP accelerator Accelerator ● HTTP server Extra communications Can be much faster
  8. 8. What's Wrong With Traditional HTTP Servers: profile % symbol name 1.5719 ngx_http_parse_header_line 1.0303 ngx_vslprintf 0.6401 memcpy 0.5807 recv 0.5156 ngx_linux_sendfile_chain 0.4990 ngx_http_limit_req_handler
  9. 9. What's Wrong With Traditional HTTP Servers: syscalls epoll_wait(12, {{EPOLLIN, ....}}, 512, 500) = 1 recvfrom(3, "GET / HTTP/1.1rnHost: ....", 1024, 0, NULL, NULL) = 327 // parse HTTP write(11, “...limiting requests, excess...", 176) = 176 writev(3, [{"HTTP/1.1 503 Service Temporarily Una....", 200}], 1) = 200 sendfile(3, 7, [0], 383) = 383 recvfrom(3, 0xa1bac0, 1024, 0, 0, 0) = -1 EAGAIN epoll_wait(12, {{EPOLLIN, ....}}, 512, 500) = 1 recvfrom(3, "", 1024, 0, NULL, NULL) = 0 close(3) = 0
  10. 10. What's Wrong With Traditional HTTP Servers: In General User-space & monolithic OS kernel (exokernel approach helps much): ● context switches ● copies ● no uniform access to information on all network layers designed for old hardware and/or oblivious to hardware features
  11. 11. Synchronous Sockets Reading from a socket in a context other than deferred interrupt context is asynchronous to arrival of TCP segments Synchronous Sockets: ● process packets while they're hot in CPU caches ● no queues – do work when data is ready
  12. 12. Faster HTTP Parser Switch-driven (widespread): poor C-cache usage & CPU intensive Table-driven (with possible compression): poor D-cache usage Hybrid State Machine (combinations of two previous) Direct jumps (Ragel) PCMPSTR (~strspn(3) – very limited) while (++*str_ptr): switch (state) { case 1: switch (*str_ptr) { case 'a': ... state = 1 case 'b': ... state = 2 case 2: ...
  13. 13. HTTP benchmark I7 (BPU!) Classic HTTP parser: ngx_request_line: 730ms ngx_header_line: 422ms ngx_lw_header_line: 428ms ngx_big_header_line: 1725ms HTTP Hybrid State Machine: hsm_header_line: 553ms Table-driven Automaton tbl_header_line: 473ms tbl_big_header_line: 840ms Goto-driven Automaton: goto_request_line: 470ms goto_opt_request_line: 458ms goto_header_line: 237ms goto_big_header_line: 589ms Core Classic HTTP parser: ngx_request_line: 909ms ngx_header_line: 583ms ngx_lw_header_line: 661ms ngx_big_header_line: 1983ms HTTP Hybrid State Machine: hsm_header_line: 433ms Table-driven Automaton tbl_header_line: 562ms tbl_big_header_line: 1570ms Goto-driven Automaton: goto_request_line: 747ms goto_opt_request_line: 736ms goto_header_line: 375ms goto_big_header_line: 975ms
  14. 14. Generic Finite State Machine (GFSM) Protocol FSMs context switch for ICAP etc.: (1) HTTP FSM: receive & process HTTP request; (2) ICAP FSM: the callback is called at particular HTTP state, current HTTP FSM state is push()'ed to stack (3) ICAP FSM: send the request to ICAP server and get results (4) HTTP FSM: the callback is called at particular ICAP state, stored HTTP FSM state is pop()'ed back
  15. 15. Web-cache mmap()'ed & mlock()'ed in-memory persistent database – no disk IO (size is limited, but can be processed in softirq) Cache conscious Burst Hash Trie: ● NUMA-aware: independent databases for each node (retrieved by less significant bits); ● Can be lock-freed ● Almost zero-copy (only NIC → disk) ● Suitable to store fixed- and variable-size records ● Quick for large string keys (e.g. URI) as well as for integer keys
  16. 16. Filtering Dynamic persistent rules with eviction (Tempesta DB) Set of callbacks on all network layers: ● classify_ipv{4,6} - called for each received IPv4/IPv6 client packet ● classify_tcp - called for each received TCP segment ● classify_conn_{estab,close} - a client connection is established/closed ● classify_tcp_timer_retrans - called on retransmissions to client ● …and other TCP stuff ● and surely HTTP processing phases
  17. 17. Benchmark (bit outdated) 10-core Intel Xeon E7-4850 2.4GHz, 64GB RAM (One CPU with 10 cores NIC RX and TX queues binding to CPU cores RFS enabled Nginx: 10 workers, multi_accept, sendfile, epoll, tcp_nopush and tcp_nodelay
  18. 18. Features & TODO (by Mar 2015) Simple HTTP proxy, GFSM, classification hooks Load balancing Simple rate limiting module Web-cache – in progress Filtering – in progress Cluster failovering – in progress SSL – TODO Advanced HTTP DoS and DDoS protection – TODO
  19. 19. Thanks! Availability: https://github.com/natsys/tempesta Contact: ak@natsys-lab.com

×