SlideShare a Scribd company logo
IPVS for Docker Containers
Production-level load balancing and request routing without spending a single penny.
What is IPVS
• Stands for IP Virtual Server. Built on top of Netfilter and works in kernel space. No userland copying of network
packets involved.

• It’s in the mainline Linux Kernel since 2.4.x. Surprisingly, it’s one of the technologies which successfully pumps
your stuff through world’s networks for more than 15 years and almost nobody ever heard of it.

• Used in many world-scale companies such as Google, Facebook, LinkedIn, Dropbox, GitHub, Alibaba, Yandex
and so on. During my time in Yandex, we used to route millions of requests per second through a few IPVS
boxes without a sweat.

• Tested with fire, water and brass trombones (untranslatable Russian proverb).

• Provides flexible, configurable load-balancing and request routing done in kernel space – it plugs in even before
PCAP layer – making it bloody fucking fast.

• As all kernel tech, it looks like a incomprehensible magical artifact from outer space, but bear with me – it’s
actually very simple to use.
And why didn’t I hear about it before?
What is IPVS
• Supports TCP, SCTP and UDP (both v4 and v6), works on L4 in kernel space.

• Load balancing using weighted RR, LC, replicated locality-based LC (LBLCR), destination and source-
based hashing, shortest expected delay (SED) and more via plugins:

‐ LBLCR: «sticky» least-connections, picks a backend and sends all jobs to it until it’s full, then picks
the next one. If all backends are full, replicates one with the least jobs and adds it to the service
backend set.

• Supports persistent connections for SSL, FTP and One Packet Mode for connectionless protocols such
as DNS – will trigger scheduling for each packet.

• Supports request forwarding with standard Masquerading (DNAT), Tunneling (IPIP) or Direct Routing (DR).
Supports Netfilter Marks (FWMarks) for virtual service aggregation, e.g. http and https.

• Built-in cluster synchronization daemon – only need to configure one router box, other boxes will keep
their configurations up-to-date automatically.
And what can it do if you say it’s so awesome?
What is IPVS
• DNAT is your ordinary NAT. It will replace each packet’s destination IP with the chosen backend IP and put the packet
back into the networking stack.

‐ Pros: you don’t need to do anything to configure it.

‐ Cons: responses will be a martian packets in your network. But you can use a local reverse proxy to workaround
this or point the default gateway on backends to load balancer’s IP address for IPVS to take care of it.

• DR (or DSR: direct server response) is the fastest forwarding method in the world.

‐ Pros: speed. It will be as fast as possible given your network conditions. In fact, it will be so fast that you will be
able to pump more traffic through your balancers than the interface capacity they have.

‐ Cons: all backends and routers should be in the same L2 segment, need to configure NoARP devices.

• IPIP is a tun/tap alternative to DR. Like IPSEC, but without encryption.

‐ Pros: it can be routed anywhere and won’t trigger martian packets.
‐ Cons: lower MSS. Also, you need to configure NoARP tun devices on all backend boxes, like in DR.
And a little bit more about all these weird acronyms.
What is IPVS
And a little bit more about all these weird acronyms.
IPIP
Encapsulates IP
Routable anywhere
DNAT
Rewrites DST IP
Same L4
DSR
Rewrites DST MAC
Same L2
What is IPVS
• Works by replacing the job’s destination MAC with the chosen backend MAC and then putting
the packet back into the INPUT queue. Once again: the requirement for this mode is that all
backends should be in the same L2 network segment.

• The response will travel directly to the requesting client, bypassing the router box. This
essentially allows to cut router box traffic by more than 50% since responses are usually heavier
than requests.

• This also allows you to load balance more traffic than the load balancer interface capacity
since with a proper setup it will only load balance TCP SYNs.

• It’s a little bit tricky to set up, but if done well is indistinguishable from magic (as well as any
sufficiently advanced technology).

• Fun fact: this is how video streaming services survive and avoid bankruptcy!
And a little bit more about DR since it’s awesome!
I don’t need this
• If you have more than one instance of your application, you need load balancing, otherwise you
won’t be able to distribute requests among those instances.

• If you have more than one application or more than one version deployed at the same time in
production, you need request routing. Typically, only one version is deployed to production at the
same time, and one testing version is waiting in line.

• Imagine what you can do if you could deploy 142 different versions of your application to
production:

‐ A/B testing: route 50% to version A, route 50% to version B.

‐ Intelligent routing: if requester’s cookie is set to «NSA», route to version «HoneyPot».

‐ Experiments: randomly route 1% of users to experimental version «EvenMorePointlessAds»
and stick them to that version.
And why would I load balance and route anything at all?
I don’t need this
• In modern world, instances come and go as they may, sometimes multiple times per second. This rate
will only go up in the future.

• Instances migrate across physical nodes. Instances might be stopped when no load or low load is in
effect. New instances might appear to accommodate the growing request rate and be torn down later.

• Neither nginx (for free) nor haproxy allows for online configuration. Many hacks exist which
essentially regenerate the configuration file on the routing box and then cycle the reverse proxy, but
it’s neither sustainable nor mainstream.

• Neither hipache (written in node.js) nor vulcand (under development) allows for non-RR balancing
methods.

• The fastest userland proxy to date – HAProxy – is ~40% slower than IPVS in DR mode according to
many publicly available benchmark results.
Also, my nginx (haproxy, third option) setup works fine, get off the stage please!
I don’t need this
• Lucky you!

• But running stuff in the cloud is not an option in many circumstances:

‐ Performance and/or reliability critical applications.

‐ Exotic hardware or software requirements. CG or CUDA farms, BSD environments and so on.

‐ Clouds are a commodity in US and Europe, but in some countries with a thriving IT culture AWS
is more of a fairy tale (e.g. Vietnam).

‐ Security considerations and vendor lock-in.

• Cloud-based load balancers are surprisingly dumb, for example AWS ELB doesn’t really allow you to
configure anything except real servers and some basic stuff like SSL and Health Checks.

• After a certain point, AWS/GCE becomes quite expensive, unless you’re Netflix.
And I run my stuff in the cloud, it takes care of everything – my work is perpetual siesta.
What is IPVS, again
• Make sure you have a properly configured kernel (usually you do):

‐ cat /boot/config-$(uname -r) | grep IP_VS
• Install IPVS CLI to verify that it’s healthy and up:

‐ sudo apt-get install ipvsadm
‐ sudo ipvsadm -l
• Command-line tool allows you to do everything out there with IPVS, but it’s the 21st century and CLIs are only
good to show off your keyboard-fu ;)

• That’s why I’ve coded a dumb but loyal REST API daemon that talks to kernel and configures IPVS for you. In
Go, because, you know.

• You can use it to add and remove virtual services and backends in runtime and get metrics about existing virtual
services. It’s totally open source and free but might not work (as everything open source and free).
And how do I use it now since it sounds amazing!
GORB
• It’s here: https://github.com/kobolog/gorb

• Based on native Go netlink library – https://github.com/tehnerd/gnl2go. The library itself talks directly to the
Kernel and already supports the majority of IPVS operations. This library is a port of Facebook’s gnl2py – https://
github.com/facebook/gnlpy

• It exposes a very straightforward JSON-based REST API:

‐ PUT or DELETE /service/<vsID> – create or remove a virtual service.

‐ PUT or DELETE /service/<vsID>/<rsID> – add or remove a backend for a virtual service

‐ GET /service/<vsID> – get virtual service metrics (incl. health checks).

‐ GET /service/<vsID>/<rsID> – get backend configuration.

‐ PATCH /service/<vsID>/<rsID> – update backend weight and other parameters without restarting
anything.
Go Routing and Balancing.
GORB
• Every time you spin up a new container, it can be magically automatically registered with
GORB, so that your application’s clients could reach it via the balancer-provided endpoint.

• GORB will automatically do TCP checks (if it’s a TCP service) on your container and inhibit all
traffic coming to it if, for some reason, it disappeared without gracefully de-registering first:
network outage, oom-killer or whatnot.

• HTTP checks are also available, if required – e.g. your app can have a /health endpoint
which will respond 200 only if all downstream dependencies are okay as well.

• Essentially, the only thing you need to do is to start a tiny little daemon on Docker Engine
boxes that will listen for events on Events API and send commands to GORB servers.

• More checks can be added in the future, e.g. Consul, ZooKeeper or etcd!
And why is it cool for Docker Containers.
GORB
kobolog@bulldozer:~$ docker-machine create -d virtualbox gorb
kobolog@bulldozer:~$ docker-machine ssh gorb sudo modprobe ip_vs
kobolog@bulldozer:~$ eval $(docker-machine env gorb)
kobolog@bulldozer:~$ docker build -t gorb src/gorb
kobolog@bulldozer:~$ docker run -d —net=host --privileged gorb -f -i eth1
kobolog@bulldozer:~$ docker build -t gorb-link src/gorb/gorb-docker-link
kobolog@bulldozer:~$ docker run -d --net=host -v /var/run/docker.sock:/var/run/
docker.sock gorb-link -r $(docker-machine ip default):4672 -i eth1
kobolog@bulldozer:~$ docker run -d -p 80 nginx (repeat 4 times)
kobolog@bulldozer:~$ curl -i http://$(docker-machine ip gorb):80
And how do I use it? Live demo or GTFO!
A few words about BGP
• Once you have a few router boxes to avoid having a SPOF, you might wonder how clients would find out which one to
use. One option could be DNS RR, where you put them all behind one domain name and let DNS resolvers to do their job.

• But there’s a better way – BGP host routes, also known as anycast routing:

‐ It’s a protocol used by network routers to agree on network topologies.

‐ The idea behind anycast is that each router box announces the same IP address on the network via BGP host
route advertisements.

‐ When a client hits this IP address, it’s automatically routed to one of the GORB boxes based on configured routing
metrics.

‐ You don’t need any special hardware – it’s already built into your routers.

‐ You can do this using one of the BGP packages, such as Bird or Quagga. Notable ecosystem projects in this area:
Project Calico.

‐ Also can be done with IPv6 anycast, but I’ve never seen it implemented.
Black belt in networking is not complete without a few words about BGP
GORB
• No, you cannot blame me, but I’m here to help and answer questions!

• IPVS is production ready since I was in college.

• GORB is not production ready and is not tested in production, but since it’s just a configuration daemon, it won’t actually
affect your traffic, I hope.

• Here are some nice numbers to think about:

‐ 1 GBPS line rate uses 1% CPU in DR mode.

‐ Can utilize 40G/100G interface.

‐ Costs you €0.

‐ Typical enterprise hardware load-balancer – €25,000.

• Requires generic hardware, software, tools – also easy to learn. Ask your Ops whether they want to read a 1000-page
vendor manual or a 1000-word manpage.

• Also, no SNMP involved. Amazing and unbelievable!
Is it stable? Is it production-ready? Can I blame you if it doesn’t work?
This guy on the stage
• You shouldn’t. In fact, don’t trust anybody on this stage – get your hands dirty and verify
everything yourself. Although last month I turned 30, so I’m trustworthy!

• I’ve been building distributed systems and networking for more than 7 years in companies with
worldwide networks and millions of users, multiple datacenters and other impressive things.

• These distributed systems and networks are still operational and serve billions of user requests
each day.

• I’m the author of IPVS-based routing and load balancing framework for Yandex’s very own open-
source platform called Cocaine: https://github.com/cocaine.

• As of now, I’m a Senior Infrastructure Software Engineer in Uber in NYC, continuing my self-
proclaimed crusade to make magic infrastructure look less magical and more useful for humans.
Who the hell are you and why should I believe a Russian?
Gracias, Cataluña
• This guy:

‐ Twitter (it’s boring and mostly in Russian): @kobolog

‐ Questions when I’m not around: me@kobology.ru

‐ My name is Andrey Sibiryov.

• IPVS: http://www.linuxvirtualserver.org/software/ipvs.html
• GORB sources: https://github.com/kobolog/gorb
• Bird BGP: http://bird.network.cz
• Stage version of this slide deck: http://bit.ly/1S1A3cT
• Also, it’s right about time to ask your questions!
Some links, more links, some HR and questions!

More Related Content

What's hot

Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg
 
OVN operationalization at scale at eBay
OVN operationalization at scale at eBayOVN operationalization at scale at eBay
OVN operationalization at scale at eBay
Aliasgar Ginwala
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVS
Docker, Inc.
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
Pradeep Kumar
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
James Denton
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
ScyllaDB
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27
Kentaro Ebisawa
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
Saju Madhavan
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
Brendan Gregg
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
YongKi Kim
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
SUSE Labs Taipei
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
Alexei Starovoitov
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
Brendan Gregg
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
LF Events
 

What's hot (20)

Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
OVN operationalization at scale at eBay
OVN operationalization at scale at eBayOVN operationalization at scale at eBay
OVN operationalization at scale at eBay
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVS
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
 

Similar to [En] IPVS for Docker Containers

The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...
OVHcloud
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
harryvanhaaren
 
Rapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP NetworksRapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP Networks
Skeeve Stevens
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
Chiradeep Vittal
 
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
OpenStack Korea Community
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
Cumulus Networks
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
Redge Technologies
 
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabric
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS FabricNFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabric
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabricozkan01
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
Kubernetes
KubernetesKubernetes
Kubernetes
Anastasios Gogos
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Igalia
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
Bob Killen
 
VMworld 2013: Real-world Deployment Scenarios for VMware NSX
VMworld 2013: Real-world Deployment Scenarios for VMware NSX VMworld 2013: Real-world Deployment Scenarios for VMware NSX
VMworld 2013: Real-world Deployment Scenarios for VMware NSX
VMworld
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
Evan McGee
 
Learning series fundamentals of Networking and Medical Imaging
Learning series fundamentals of Networking and Medical ImagingLearning series fundamentals of Networking and Medical Imaging
Learning series fundamentals of Networking and Medical Imaging
Ryan Furlough, BSCPE CPAS
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al Sanders
Jim St. Leger
 

Similar to [En] IPVS for Docker Containers (20)

The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
Rapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP NetworksRapid IPv6 Deployment for ISP Networks
Rapid IPv6 Deployment for ISP Networks
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabric
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS FabricNFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabric
NFV SDN Summit March 2014 D1 07 kireeti_kompella Native MPLS Fabric
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
 
3hows
3hows3hows
3hows
 
Pluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and DockerPluggable Infrastructure with CI/CD and Docker
Pluggable Infrastructure with CI/CD and Docker
 
VMworld 2013: Real-world Deployment Scenarios for VMware NSX
VMworld 2013: Real-world Deployment Scenarios for VMware NSX VMworld 2013: Real-world Deployment Scenarios for VMware NSX
VMworld 2013: Real-world Deployment Scenarios for VMware NSX
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
Learning series fundamentals of Networking and Medical Imaging
Learning series fundamentals of Networking and Medical ImagingLearning series fundamentals of Networking and Medical Imaging
Learning series fundamentals of Networking and Medical Imaging
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al Sanders
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

[En] IPVS for Docker Containers

  • 1. IPVS for Docker Containers Production-level load balancing and request routing without spending a single penny.
  • 2. What is IPVS • Stands for IP Virtual Server. Built on top of Netfilter and works in kernel space. No userland copying of network packets involved. • It’s in the mainline Linux Kernel since 2.4.x. Surprisingly, it’s one of the technologies which successfully pumps your stuff through world’s networks for more than 15 years and almost nobody ever heard of it. • Used in many world-scale companies such as Google, Facebook, LinkedIn, Dropbox, GitHub, Alibaba, Yandex and so on. During my time in Yandex, we used to route millions of requests per second through a few IPVS boxes without a sweat. • Tested with fire, water and brass trombones (untranslatable Russian proverb). • Provides flexible, configurable load-balancing and request routing done in kernel space – it plugs in even before PCAP layer – making it bloody fucking fast. • As all kernel tech, it looks like a incomprehensible magical artifact from outer space, but bear with me – it’s actually very simple to use. And why didn’t I hear about it before?
  • 3. What is IPVS • Supports TCP, SCTP and UDP (both v4 and v6), works on L4 in kernel space. • Load balancing using weighted RR, LC, replicated locality-based LC (LBLCR), destination and source- based hashing, shortest expected delay (SED) and more via plugins: ‐ LBLCR: «sticky» least-connections, picks a backend and sends all jobs to it until it’s full, then picks the next one. If all backends are full, replicates one with the least jobs and adds it to the service backend set. • Supports persistent connections for SSL, FTP and One Packet Mode for connectionless protocols such as DNS – will trigger scheduling for each packet. • Supports request forwarding with standard Masquerading (DNAT), Tunneling (IPIP) or Direct Routing (DR). Supports Netfilter Marks (FWMarks) for virtual service aggregation, e.g. http and https. • Built-in cluster synchronization daemon – only need to configure one router box, other boxes will keep their configurations up-to-date automatically. And what can it do if you say it’s so awesome?
  • 4. What is IPVS • DNAT is your ordinary NAT. It will replace each packet’s destination IP with the chosen backend IP and put the packet back into the networking stack. ‐ Pros: you don’t need to do anything to configure it. ‐ Cons: responses will be a martian packets in your network. But you can use a local reverse proxy to workaround this or point the default gateway on backends to load balancer’s IP address for IPVS to take care of it. • DR (or DSR: direct server response) is the fastest forwarding method in the world. ‐ Pros: speed. It will be as fast as possible given your network conditions. In fact, it will be so fast that you will be able to pump more traffic through your balancers than the interface capacity they have. ‐ Cons: all backends and routers should be in the same L2 segment, need to configure NoARP devices. • IPIP is a tun/tap alternative to DR. Like IPSEC, but without encryption. ‐ Pros: it can be routed anywhere and won’t trigger martian packets. ‐ Cons: lower MSS. Also, you need to configure NoARP tun devices on all backend boxes, like in DR. And a little bit more about all these weird acronyms.
  • 5. What is IPVS And a little bit more about all these weird acronyms. IPIP Encapsulates IP Routable anywhere DNAT Rewrites DST IP Same L4 DSR Rewrites DST MAC Same L2
  • 6. What is IPVS • Works by replacing the job’s destination MAC with the chosen backend MAC and then putting the packet back into the INPUT queue. Once again: the requirement for this mode is that all backends should be in the same L2 network segment. • The response will travel directly to the requesting client, bypassing the router box. This essentially allows to cut router box traffic by more than 50% since responses are usually heavier than requests. • This also allows you to load balance more traffic than the load balancer interface capacity since with a proper setup it will only load balance TCP SYNs. • It’s a little bit tricky to set up, but if done well is indistinguishable from magic (as well as any sufficiently advanced technology). • Fun fact: this is how video streaming services survive and avoid bankruptcy! And a little bit more about DR since it’s awesome!
  • 7. I don’t need this • If you have more than one instance of your application, you need load balancing, otherwise you won’t be able to distribute requests among those instances. • If you have more than one application or more than one version deployed at the same time in production, you need request routing. Typically, only one version is deployed to production at the same time, and one testing version is waiting in line. • Imagine what you can do if you could deploy 142 different versions of your application to production: ‐ A/B testing: route 50% to version A, route 50% to version B. ‐ Intelligent routing: if requester’s cookie is set to «NSA», route to version «HoneyPot». ‐ Experiments: randomly route 1% of users to experimental version «EvenMorePointlessAds» and stick them to that version. And why would I load balance and route anything at all?
  • 8. I don’t need this • In modern world, instances come and go as they may, sometimes multiple times per second. This rate will only go up in the future. • Instances migrate across physical nodes. Instances might be stopped when no load or low load is in effect. New instances might appear to accommodate the growing request rate and be torn down later. • Neither nginx (for free) nor haproxy allows for online configuration. Many hacks exist which essentially regenerate the configuration file on the routing box and then cycle the reverse proxy, but it’s neither sustainable nor mainstream. • Neither hipache (written in node.js) nor vulcand (under development) allows for non-RR balancing methods. • The fastest userland proxy to date – HAProxy – is ~40% slower than IPVS in DR mode according to many publicly available benchmark results. Also, my nginx (haproxy, third option) setup works fine, get off the stage please!
  • 9. I don’t need this • Lucky you! • But running stuff in the cloud is not an option in many circumstances: ‐ Performance and/or reliability critical applications. ‐ Exotic hardware or software requirements. CG or CUDA farms, BSD environments and so on. ‐ Clouds are a commodity in US and Europe, but in some countries with a thriving IT culture AWS is more of a fairy tale (e.g. Vietnam). ‐ Security considerations and vendor lock-in. • Cloud-based load balancers are surprisingly dumb, for example AWS ELB doesn’t really allow you to configure anything except real servers and some basic stuff like SSL and Health Checks. • After a certain point, AWS/GCE becomes quite expensive, unless you’re Netflix. And I run my stuff in the cloud, it takes care of everything – my work is perpetual siesta.
  • 10. What is IPVS, again • Make sure you have a properly configured kernel (usually you do): ‐ cat /boot/config-$(uname -r) | grep IP_VS • Install IPVS CLI to verify that it’s healthy and up: ‐ sudo apt-get install ipvsadm ‐ sudo ipvsadm -l • Command-line tool allows you to do everything out there with IPVS, but it’s the 21st century and CLIs are only good to show off your keyboard-fu ;) • That’s why I’ve coded a dumb but loyal REST API daemon that talks to kernel and configures IPVS for you. In Go, because, you know. • You can use it to add and remove virtual services and backends in runtime and get metrics about existing virtual services. It’s totally open source and free but might not work (as everything open source and free). And how do I use it now since it sounds amazing!
  • 11. GORB • It’s here: https://github.com/kobolog/gorb • Based on native Go netlink library – https://github.com/tehnerd/gnl2go. The library itself talks directly to the Kernel and already supports the majority of IPVS operations. This library is a port of Facebook’s gnl2py – https:// github.com/facebook/gnlpy • It exposes a very straightforward JSON-based REST API: ‐ PUT or DELETE /service/<vsID> – create or remove a virtual service. ‐ PUT or DELETE /service/<vsID>/<rsID> – add or remove a backend for a virtual service ‐ GET /service/<vsID> – get virtual service metrics (incl. health checks). ‐ GET /service/<vsID>/<rsID> – get backend configuration. ‐ PATCH /service/<vsID>/<rsID> – update backend weight and other parameters without restarting anything. Go Routing and Balancing.
  • 12. GORB • Every time you spin up a new container, it can be magically automatically registered with GORB, so that your application’s clients could reach it via the balancer-provided endpoint. • GORB will automatically do TCP checks (if it’s a TCP service) on your container and inhibit all traffic coming to it if, for some reason, it disappeared without gracefully de-registering first: network outage, oom-killer or whatnot. • HTTP checks are also available, if required – e.g. your app can have a /health endpoint which will respond 200 only if all downstream dependencies are okay as well. • Essentially, the only thing you need to do is to start a tiny little daemon on Docker Engine boxes that will listen for events on Events API and send commands to GORB servers. • More checks can be added in the future, e.g. Consul, ZooKeeper or etcd! And why is it cool for Docker Containers.
  • 13. GORB kobolog@bulldozer:~$ docker-machine create -d virtualbox gorb kobolog@bulldozer:~$ docker-machine ssh gorb sudo modprobe ip_vs kobolog@bulldozer:~$ eval $(docker-machine env gorb) kobolog@bulldozer:~$ docker build -t gorb src/gorb kobolog@bulldozer:~$ docker run -d —net=host --privileged gorb -f -i eth1 kobolog@bulldozer:~$ docker build -t gorb-link src/gorb/gorb-docker-link kobolog@bulldozer:~$ docker run -d --net=host -v /var/run/docker.sock:/var/run/ docker.sock gorb-link -r $(docker-machine ip default):4672 -i eth1 kobolog@bulldozer:~$ docker run -d -p 80 nginx (repeat 4 times) kobolog@bulldozer:~$ curl -i http://$(docker-machine ip gorb):80 And how do I use it? Live demo or GTFO!
  • 14. A few words about BGP • Once you have a few router boxes to avoid having a SPOF, you might wonder how clients would find out which one to use. One option could be DNS RR, where you put them all behind one domain name and let DNS resolvers to do their job. • But there’s a better way – BGP host routes, also known as anycast routing: ‐ It’s a protocol used by network routers to agree on network topologies. ‐ The idea behind anycast is that each router box announces the same IP address on the network via BGP host route advertisements. ‐ When a client hits this IP address, it’s automatically routed to one of the GORB boxes based on configured routing metrics. ‐ You don’t need any special hardware – it’s already built into your routers. ‐ You can do this using one of the BGP packages, such as Bird or Quagga. Notable ecosystem projects in this area: Project Calico. ‐ Also can be done with IPv6 anycast, but I’ve never seen it implemented. Black belt in networking is not complete without a few words about BGP
  • 15. GORB • No, you cannot blame me, but I’m here to help and answer questions! • IPVS is production ready since I was in college. • GORB is not production ready and is not tested in production, but since it’s just a configuration daemon, it won’t actually affect your traffic, I hope. • Here are some nice numbers to think about: ‐ 1 GBPS line rate uses 1% CPU in DR mode. ‐ Can utilize 40G/100G interface. ‐ Costs you €0. ‐ Typical enterprise hardware load-balancer – €25,000. • Requires generic hardware, software, tools – also easy to learn. Ask your Ops whether they want to read a 1000-page vendor manual or a 1000-word manpage. • Also, no SNMP involved. Amazing and unbelievable! Is it stable? Is it production-ready? Can I blame you if it doesn’t work?
  • 16. This guy on the stage • You shouldn’t. In fact, don’t trust anybody on this stage – get your hands dirty and verify everything yourself. Although last month I turned 30, so I’m trustworthy! • I’ve been building distributed systems and networking for more than 7 years in companies with worldwide networks and millions of users, multiple datacenters and other impressive things. • These distributed systems and networks are still operational and serve billions of user requests each day. • I’m the author of IPVS-based routing and load balancing framework for Yandex’s very own open- source platform called Cocaine: https://github.com/cocaine. • As of now, I’m a Senior Infrastructure Software Engineer in Uber in NYC, continuing my self- proclaimed crusade to make magic infrastructure look less magical and more useful for humans. Who the hell are you and why should I believe a Russian?
  • 17. Gracias, Cataluña • This guy: ‐ Twitter (it’s boring and mostly in Russian): @kobolog ‐ Questions when I’m not around: me@kobology.ru ‐ My name is Andrey Sibiryov. • IPVS: http://www.linuxvirtualserver.org/software/ipvs.html • GORB sources: https://github.com/kobolog/gorb • Bird BGP: http://bird.network.cz • Stage version of this slide deck: http://bit.ly/1S1A3cT • Also, it’s right about time to ask your questions! Some links, more links, some HR and questions!