Lisp Meetup #22 Eitaro Fukamachi 
Writing 
a fast HTTP parser
Thank you for coming.
I’m Eitaro Fukamachi 
@nitro_idiot fukamachi
(and 'web-application-developer 
'common-lisper)
We’re hiring! 
Tell @Rudolph_Miller.
fast-http 
• HTTP request/response parser 
• Written in portable Common Lisp 
• Fast 
• Chunked body parser
fast-http 
Benchmarked with SBCL 1.2.5 / GCC v6.0.0
Let me tell 
why I had to write 
a fast HTTP parser.
Wookie is slower than Node.js 
• Wookie is 2 times slower than Node.js 
• Profiling result was saying 
“WOOKIE:READ-DATA” was pretty slow. 
• It was only calling “http-parse”. 
• “http-parse” which is an HTTP parser 
Wookie is using.
The bottleneck was 
HTTP parsing.
Wookie is slower than Node.js 
• Node.js’s HTTP parse is “http-parser”. 
• Written in C. 
• General version of Nginx’s HTTP parser. 
• Is it possible to beat it with Common Lisp?
Today, I’m talking 
what I did for writing 
a fast Common Lisp program.
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
A brief introduction of HTTP
HTTP request look like… 
GET /media HTTP/1.1↵ 
Host: somewrite.jp↵ 
Connection: keep-alive↵ 
Accept: */*↵ 
↵
HTTP request look like… 
GET /media HTTP/1.1↵ 
Host: somewrite.jp↵ 
Connection: keep-alive↵ 
Accept: */*↵ 
↵ 
First Line 
Headers 
Body (empty, in this case)
HTTP request look like… 
GET /media HTTP/1.1↵ 
Host: somewrite.jp↵ 
Connection: keep-alive↵ 
Accept: */*↵ 
↵ CR + LF 
CRLF * 2 at the end of headers
HTTP response look like… 
HTTP/1.1 200 OK↵ 
Cache-Control: max-age=0↵ 
Content-Type: text/html↵ 
Date: Wed, 26 Nov 2014 04:52:55 GMT↵ 
↵ 
<html> 
…
HTTP response look like… 
HTTP/1.1 200 OK↵ 
Status Line 
Cache-Control: max-age=0↵ 
Content-Type: text/html↵ 
Headers 
Date: Wed, 26 Nov 2014 04:52:55 GMT↵ 
↵ 
<html> 
… 
Body
HTTP is… 
• Text-based protocol. (not binary) 
• Lines terminated with CRLF 
• Very lenient. 
• Ignore multiple spaces 
• Allow continuous header values
And, 
there’s another difficulty.
HTTP messages are 
sent over a network.
Which means, 
we need to think about 
long & incomplete 
HTTP messages.
There’s 2 ways 
to resolve this problem.
1. Stateful (http-parser)
http-parser (used in Node.js) 
• https://github.com/joyent/http-parser 
• Written in C 
• Ported from Nginx’s HTTP parser 
• Written as Node.js’s HTTP parser 
• Stateful
http-parser (used in Node.js) 
for (p=data; p != data + len; p++) { 
… 
switch (parser->state) { 
case s_dead: 
… 
case s_start_req_or_res: 
… 
case s_res_or_resp_H: 
… 
} 
}
http-parser (used in Node.js) 
for (p=data; p != data + len; p++) { 
… 
switch (parser->state) { 
Process char by char 
case s_dead: 
… 
case s_start_req_or_res: 
… 
case s_res_or_resp_H: 
… 
} 
} 
Do something 
for each state
2. Stateless (PicoHTTPParser)
PicoHTTPParser (used in H2O) 
• https://github.com/h2o/picohttpparser 
• Written in C 
• Stateless 
• Reparse when the data is incomplete 
• Most HTTP request is small
And fast-http is…
fast-http is in the middle 
• Not track state for every character 
• Set state for every line 
• It makes the program simple 
• And easy to optimize
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
Memory allocation is slow 
• (in general) 
• Make sure not to allocate memory during 
processing 
• cons, make-instance, make-array… 
• subseq, append, copy-seq
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
Data types 
• Wrong data type makes your program slow. 
• List or Vector 
• Hash Table or Structure or Class
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
Benchmark is quite important 
• “Don’t guess, measure!” 
• Check if your changes improve the 
performance. 
• Benchmarking also keeps your motivation.
Profiling 
• SBCL has builtin profiler 
• (sb-profile:profile “FAST-HTTP” …) 
• (sb-profile:report)
5 important things 
• Architecture 
• Reducing memory allocation 
• Choosing the right data types 
• Benchmark & Profile 
• Type declarations
Type declaration 
• Common Lisp has type declaration 
(optional) 
• (declare (type <type> <variable symbol>)) 
• It’s a hint for your Lisp compiler 
• (declare (optimize (speed 3) (safety 0))) 
• It’s your wish to your Lisp compiler 
See also: Cより高速なCommon Lispコードを書く
(safety 0) 
• (safety 0) means “don’t check the type & 
array index in run-time”. 
• Fast & unsafe (like C) 
• Is fixnum enough? 
• What do you do when someone passes a 
bignum to the function?
(safety 0) 
• fast-http has 2 layers 
• Low-level API 
• (speed 3) (safety 0) 
• High-level API (safer) 
• Check the variable type 
• (speed 3) (safety 2)
Attitude
Attitude 
• Write carefully. 
• It’s possible to beat C program 
• (if the program is complicated enough) 
• Don’t give up easily 
• Safety is more important than speed
Thanks.
EITARO FUKAMACHI 
8arrow.org 
@nitro_idiot fukamachi

Writing a fast HTTP parser

  • 1.
    Lisp Meetup #22Eitaro Fukamachi Writing a fast HTTP parser
  • 2.
  • 3.
    I’m Eitaro Fukamachi @nitro_idiot fukamachi
  • 4.
  • 7.
    We’re hiring! Tell@Rudolph_Miller.
  • 8.
    fast-http • HTTPrequest/response parser • Written in portable Common Lisp • Fast • Chunked body parser
  • 9.
    fast-http Benchmarked withSBCL 1.2.5 / GCC v6.0.0
  • 10.
    Let me tell why I had to write a fast HTTP parser.
  • 12.
    Wookie is slowerthan Node.js • Wookie is 2 times slower than Node.js • Profiling result was saying “WOOKIE:READ-DATA” was pretty slow. • It was only calling “http-parse”. • “http-parse” which is an HTTP parser Wookie is using.
  • 13.
    The bottleneck was HTTP parsing.
  • 14.
    Wookie is slowerthan Node.js • Node.js’s HTTP parse is “http-parser”. • Written in C. • General version of Nginx’s HTTP parser. • Is it possible to beat it with Common Lisp?
  • 15.
    Today, I’m talking what I did for writing a fast Common Lisp program.
  • 16.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 17.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 18.
  • 19.
    HTTP request looklike… GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵ ↵
  • 20.
    HTTP request looklike… GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵ ↵ First Line Headers Body (empty, in this case)
  • 21.
    HTTP request looklike… GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵ ↵ CR + LF CRLF * 2 at the end of headers
  • 22.
    HTTP response looklike… HTTP/1.1 200 OK↵ Cache-Control: max-age=0↵ Content-Type: text/html↵ Date: Wed, 26 Nov 2014 04:52:55 GMT↵ ↵ <html> …
  • 23.
    HTTP response looklike… HTTP/1.1 200 OK↵ Status Line Cache-Control: max-age=0↵ Content-Type: text/html↵ Headers Date: Wed, 26 Nov 2014 04:52:55 GMT↵ ↵ <html> … Body
  • 24.
    HTTP is… •Text-based protocol. (not binary) • Lines terminated with CRLF • Very lenient. • Ignore multiple spaces • Allow continuous header values
  • 25.
  • 26.
    HTTP messages are sent over a network.
  • 27.
    Which means, weneed to think about long & incomplete HTTP messages.
  • 28.
    There’s 2 ways to resolve this problem.
  • 29.
  • 30.
    http-parser (used inNode.js) • https://github.com/joyent/http-parser • Written in C • Ported from Nginx’s HTTP parser • Written as Node.js’s HTTP parser • Stateful
  • 31.
    http-parser (used inNode.js) for (p=data; p != data + len; p++) { … switch (parser->state) { case s_dead: … case s_start_req_or_res: … case s_res_or_resp_H: … } }
  • 32.
    http-parser (used inNode.js) for (p=data; p != data + len; p++) { … switch (parser->state) { Process char by char case s_dead: … case s_start_req_or_res: … case s_res_or_resp_H: … } } Do something for each state
  • 33.
  • 34.
    PicoHTTPParser (used inH2O) • https://github.com/h2o/picohttpparser • Written in C • Stateless • Reparse when the data is incomplete • Most HTTP request is small
  • 35.
  • 36.
    fast-http is inthe middle • Not track state for every character • Set state for every line • It makes the program simple • And easy to optimize
  • 37.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 38.
    Memory allocation isslow • (in general) • Make sure not to allocate memory during processing • cons, make-instance, make-array… • subseq, append, copy-seq
  • 39.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 40.
    Data types •Wrong data type makes your program slow. • List or Vector • Hash Table or Structure or Class
  • 41.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 42.
    Benchmark is quiteimportant • “Don’t guess, measure!” • Check if your changes improve the performance. • Benchmarking also keeps your motivation.
  • 43.
    Profiling • SBCLhas builtin profiler • (sb-profile:profile “FAST-HTTP” …) • (sb-profile:report)
  • 44.
    5 important things • Architecture • Reducing memory allocation • Choosing the right data types • Benchmark & Profile • Type declarations
  • 45.
    Type declaration •Common Lisp has type declaration (optional) • (declare (type <type> <variable symbol>)) • It’s a hint for your Lisp compiler • (declare (optimize (speed 3) (safety 0))) • It’s your wish to your Lisp compiler See also: Cより高速なCommon Lispコードを書く
  • 46.
    (safety 0) •(safety 0) means “don’t check the type & array index in run-time”. • Fast & unsafe (like C) • Is fixnum enough? • What do you do when someone passes a bignum to the function?
  • 47.
    (safety 0) •fast-http has 2 layers • Low-level API • (speed 3) (safety 0) • High-level API (safer) • Check the variable type • (speed 3) (safety 2)
  • 48.
  • 49.
    Attitude • Writecarefully. • It’s possible to beat C program • (if the program is complicated enough) • Don’t give up easily • Safety is more important than speed
  • 50.
  • 51.
    EITARO FUKAMACHI 8arrow.org @nitro_idiot fukamachi