Based on a True Story• NOT AN AD!• Qrator: distributed network ● Custom TCP/IP at the bottom ● Custom management protocol at the top ● Interacting with plenty of Web servers and Web browsers on a daily basis ● 2 years of continuous debug^W Product ImprovementTM 2
Issue #1• Message delivery is unreliable in TCP.
Issue #1• Message delivery is unreliable in TCP: theres no estimation on when (and if) the message will arrive at all• Timeouts!• Limit all resources, including time• No action is itself an action
Timeouts• Between recvfrom()• Between requests• Request timeout• Lifetime of a session• Lifetime of %OBJECTNAME%• Long polling may be a bad idea
Ex. 1• Slowloris (Apache): DoS ● (not distributed, just denial of service)• Slow HTTP POST ● Apache, IIS, Lighttpd: DoS ● Nginx: DDoS with a botnet
Ex. 212 rpm AJAX page update ● Backup script switched the server off
Content-Length– Limit resources for all actions– Custom protocol should define limits on the input length
errno(3)– The connection may be closed for no good reason– Check errno after recvfrom(), sendto(), etc. ● ENOMEM ● ECONNRESET ● EANYTHING
Ex. 3● Internet Explorer: ECONNRESET means successful connection termination – Download status is being ignored – Content-Length is being ignored
Optimization– Text-based protocols are convenient to debug ● And you will debug – Maybe even in production– Making use of binary protocols is often a premature optimization ● BSON, Google Protocol Buffers
Optimization● TCP socket options: – TCP_NODELAY: disables Nagles algorithm ● Speedup with small portions of data – TCP_CORK (Linux): multiple portions of data in a single TCP segment – "socket corking"