2. Base network model for the VM infrastructure
32bit subnet, VRF, BGP and single NIC ( Introduced Tokyo 2016)
kakao
eth0
Compute node
nova-compute
neutron-
linuxbridge-
agent
neutron-dhcp-
agent
Gateway
10.10.100.1
linux bridge
vm
IP:10.10.100.2/32
Routing Table
1 10.10.100.2/32 via 192.1.1.201
BGP
192.1.1.202 BGP
Virtual Switch block
Process block
vlan.60
vlan.0
Virtual Router
Service Route Table
1
192.1.1.202
3. What we solved?
Compute node2
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
Compute node1
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
AZ1
Compute node2
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
Compute node1
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
AZ2
Compute node2
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
Compute node1
nova-compute
neutron-
linuxbridge-agent
neutron-dhcp-agentlinux bridge
Switch Namespace
global name space
AZ3
tor1 tor2 tor3
vm10.10.100.2/32
Routing Table
1 10.100.10.2/32 via tor1
rt1
rt2
Routing Table
1 10.100.10.2/32 via RT1
rt3
rt4
rt5
rt6
Routing Table
1 10.100.10.2/32 via RT3
Routing Table
1 10.100.10.2/32 via tor2
4. What we solve?
Simple IP planning
– only IP ranges matter. (no more VLAN, IP subnet, Router planning)
Resource imbalancing
– No chance of IP imbalancing.
Fault Resilience
– If one router gone, it propagated by Dynamic routing protocol to other router
Distributed
– deciding routing path is very distributed. No single point of failure.
– scale out nature.
5. 1563projects
632
pull request since 2014.9
88about
VMs are created/deleted per day
8703
vms
2,xxxprojects
913
pull request since 2014.9
100about
VMs are created/deleted per day
20,xxx
vms
2016.8 2017.11
1xx,xxx active cores
The Result is Stunning
7. What is container?
¡ Container comprises multiple namespaces
¡ Standardized resource
¡ Brick or Lego of Process
kakaoNetdevconf
8. Typical container orchestrator’s network
¡ Nothing different from anything, In terms of
Networking
l But if you don’t prepare or just listen to developers
requirement, it’s getting different and difficult!
l Now, you need to understand, what is the container
kakaoNetdevconf
9. Scalable container network: Kakao’s case
¡ Have to deal with those when you try to use overlay.
l Have to re-think about performance
l Have to think about fault-resiliency, and migration issues.
l Still consider how send the packet out of the system.
kakaoNetdevconf
10. Scalable container network: Kakao’s case, cont.
l Use node port and Load balancer
¡ It’s very easy.
¡ Had issue with scalability node port has limited port range.
¡ Only have 5digits number of containers
¡ Load balancer is expensive.
kakaoNetdevconf
11. Scalable container network: Kakao’s case, cont.
l Use routable container bridge subnet and bgp injector
¡ Predefine subnet for each containers bridge router
¡ Have to provision before resource depleted.
Route
Injector
BGP
Router
Cluster
Container
Node1
subnet1 subnet2 subnet3
Container
Node2
Container
Node3
12. Application of /32bit network: /32bit route + DNAT
Ă 1:1 NAT (A.K.A FloatingIP )
eth1
Compute node1
linux bridge
vm
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
connected dest 192.168.100.2
Routing Table
1 10.10.100.2/32 via 192.1.1.201
2 10.10.100.3/32 via 192.168.1.202
3 192.168.100.2/32 via
192.168.1.201
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is forwarded to 10.
10.100.2
Compute Node Router
13. Application of /32bit network: ECMP + DNAT
Ă Scalable Loadbalancer
eth1
Compute node1
linux bridge
LB
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
connected dest 192.168.100.2
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is forwarded to 10.
10.100.2
Compute Node Router
eth1
Compute node2
linux bridge
LB
IP:10.10.100.3/3
2
192.1.1.202
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.3/32
to 10.10.100.1
connected dest 192.168.100.2
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is forwarded to 10.
10.100.3
Compute Node Router
TOR1 TOR2
Aggregation
VIP: 192.168.100.2 is ECMPed
14. Application of /32bit network:
Multiple Routing Entry ( AKA, Fixed IPs) + Container Bridge Network
Ă Scalable Container Network
eth1
Compute node1
linux bridge
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.3~33/32
to 10.10.100.1
Routing Table
1 10.10.100.3~33/32 via 192.168.1.201
192.1.1.202
Switch
Namespace
global name
space
Compute Node Router
vm linux bridge
Container Container
Routable IP to Container:
• Can use legacy IP base Monitoring
• No Overlay ( No complexity )
15. Auto Scaleout Loadbalancer Design Requirements
1. Create/Delete Software Loadbalancer with API
2. The API should have compatibility with inhouse cloud (KRANE, openstack based)
3. Loadbalancers cluster log and metric can be gathered by inhouse measuring cloud.
4. Loadbalancers cluster should have same entering IP
5. Loadbalancers cluster should check each other’s state automatically
16. Auto Scaleout Loadbalancer Model I (1)
1. It’s starting from neutron floating IP
2. It’s mixed with BGP advertise option (connected),
virtual IP and IPtables
eth1
Compute node1
linux bridge
vm
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
connected dest 192.168.100.2
Routing Table
1 10.10.100.2/32 via 192.1.1.201
2 10.10.100.3/32 via 192.168.1.202
3 192.168.100.2/32 via 192.168.1.201
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is for
warded to 10.10.100.2
Compute Node Router
Veth pair
Gateway 10.10.100.1
neutron-
linuxbridge-agent
neutron-dhcp-agent
neutron-l3-agent
Host Rout
e
dest 10.10.100.2/32
to 10.10.100.1
New IP 192.168.100.2
connected dest 192.168.100.2
Neutron Floating IP
17. Auto Scaleout Loadbalancer Model I (2)
Basic scenario is " install haproxy to VM, attach floating IP, grouping it and double it”
haproxy
attach
floating ip
grouping double
18. Auto Scaleout Loadbalancer Model I (3)
Basic scenario is " install haproxy to VM, attach floating IP, grouping it and double it”
eth1
Compute node1
linux bridge
vm
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
connected dest 192.168.100.2
Routing Table
1 10.10.100.2/32 via 192.1.1.201
2 10.10.100.3/32 via 192.168.1.202
3 192.168.100.2/32 via 192.168.1.201, 192.1
68.1.202
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is for
warded to 10.10.100.2
Compute Node Router
Veth pair
Gateway 10.10.100.1
neutron-
linuxbridge-agent
neutron-dhcp-agent
neutron-l3-agent
Host Rout
e
dest 10.10.100.2/32
to 10.10.100.1
New IP 192.168.100.2
connected dest 192.168.100.2
Neutron Floating IP
eth1
Compute node1
linux bridge
vm
IP:10.10.100.3/32
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.3/32
to 10.10.100.1
connected dest 192.168.100.2
Switch
Namespace
global name
space
IPTable
DNAT Dest 192.168.100.2 is for
warded to 10.10.100.2
Compute Node Router
Veth pair
Gateway 10.10.100.1
neutron-
linuxbridge-agent
neutron-dhcp-agent
neutron-l3-agent
Host Rout
e
dest 10.10.100.3/32
to 10.10.100.1
New IP 192.168.100.2
connected dest 192.168.100.2
Neutron Floating IP
192.1.1.202
19. CBL alpha
First thing to mention is that
“Model is one thing, Implementation is the other thing”
20. CBL alpha tries
First target was “Kubernetes”
1. It’s Golang based API server
2. Using ETCD for a database
3. Follow the model I (floating ip based ECMP)
4. Support scale in / out api
5. SNMP support for the haproxy status ( the metric can be pulled by central event system)
6. It’s working
21. CBL alpha
First target was “Kubernetes”
1. Using Kubernetes event stream as triggering source
2. For approach from external, kubernetes have to have loadblancer for container
3. In alpha stage, k8s(from legacy container candidate) is dropped (OMG)
etcd
CBL
(API
server)
kubernetes
VM
LB
CBL
cloudprovider
POD POD
Kubernetes nodes
Openstack
VM
22. CBL alpha
Next target was “Integrating with Openstack Loadbalancer API”
1. Openstack has its own resource management cycle with loadbalancer
2. Now, CBL support pool management, vip management, member management
3. BTW, where is floating IP?
– I used VIP ip address for the floating IP resource and attaching it to the VM
– But from openstack perspective, floating IP also can be attached to VIP (What?)
4. What if just forget about Openstack?
– Openstack is company’s legacy cloud.
– I don’t want to recreate the new cloud
23. CBL alpha
Except for the Floating IP issues, There are Other considerations
1. There’s no LB scale out API in openstack
– I implemented this by scale-in/out when ’8.8.8.8’ backend members added and deleted
– But you cannot add more than 2 backend with the same ip address in openstack
– so scale in/out should be done in real automatic way
2. SSL off loading should be added.
3. LB cluster membership management should be added to CBL
4. I can’t do everything from server api to client sdk and module
24. CBL Beta
Giving up the floating IP based ECMP, Multi NIC based ECMP
1. The good thing about the floating IP approach is that you don’t need to care about
anything inside of the vm. But you can’t do this with openstack.
2. Changed Multi NIC based Design
– There’s changes over Base model
– have changes on haproxy VM in terms of network model
25. Auto Scaleout Loadbalancer Model II
• Multi NIC for VM model doesn’t seem change much. but it radically changes
everything
eth1
Compute node1
linux bridge
LB
vm
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
dest 192.168.100.2 to
10.10.100.1
connected
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT empty
Compute Node Router
Veth pair
Gateway 10.10.100.1
neutron-
linuxbridge-agent
neutron-dhcp-agent
neutron-l3-agent
Host
Route
dest 10.10.100.2/32 to 10.10.100.1
dest 192.168.100.2 to 10.10.100.1
connected empty
Neutron Floating IP
LB vm
10.10.100.2
dhclient
deactivate
vm agent
eth0 eth1
VIP192.168.100.2
26. CBL beta – using vm orchestrator
• CBL alpha is based on inhouse code
• It maintains by One and Only
• SDKs for Every Cloud driver
• API and API engine/worker
• UI
• Using Inhouse vm orchestrator instead
• Inhouse Cloud Now supports Openstack Heat
27. CBL beta – Based on VM orchestrator (openstack HEAT)
• Heat provides Template (HOT & AWS CFN)
• Heat provides Resource (as resource plug-in) like instance, volume, networking
• Heat provides Stack ,
• 1 stack == several resources
• Heat provides Auto-scaling
• you don’t need to write code for auto-scaling
• Heat provides Software-Deployment
• you don’t need to write code for deployment
• How to configure the cloud application?
• Heat provides Events
• you don’t need to write code for track the progress of life-cycle
28. CBL beta – Based on VM orchestrator (openstack HEAT)
• Heat provides Template (HOT & AWS CFN)
• Heat provides Resource (as resource plug-in) like instance, volume, networking
• Heat provides Stack ,
• 1 stack == several resources
• Heat provides Auto-scaling
• you don’t need to write code for auto-scaling
• Heat provides Software-Deployment
• you don’t need to write code for deployment
• How to configure the cloud application?
• Heat provides Events
• you don’t need to write code for track the progress of life-cycle
29. CBL beta – Based on VM orchestrator (openstack HEAT) II
Templates Heat-API AMPQ
Heat-Engine-1
DB
Heat-Engine-2
Heat-Engine-3
Stack-1
Stack-2
30. CBL beta – Based on VM orchestrator (openstack HEAT) II
heat_template_version: 2015-04-15
description: >
parameters:
resources:
my_instance:
….
my_scalingroup:
….
outputs:
ecmpip:
Heat-API
output:
• LB IP
• scale in url
• scale out url
31. CBL beta – Based on VM orchestrator (openstack HEAT) III
• With VM orchestrator, You can have the triggering end points. All that matter is How
you trigger!
• Kakao has inhouse metric information system called KEMI
eth1
Compute node1
linux bridge
LB
vm
IP:10.10.100.2/32
192.1.1.201
Routing Table
Default GW 192.168.1.1 eth1
Host Route dest 10.10.100.2/32
to 10.10.100.1
dest 192.168.100.2 to
10.10.100.1
connected
192.1.1.202
Switch
Namespace
global name
space
IPTable
DNAT empty
Compute Node Router
Veth pair
Gateway 10.10.100.1
neutron-
linuxbridge-agent
neutron-dhcp-agent
neutron-l3-agent
Host
Route
dest 10.10.100.2/32 to 10.10.100.1
dest 192.168.100.2 to 10.10.100.1
connected empty
Neutron Floating IP
LB vm
10.10.100.2
dhclient
deactivate
vm agent
eth0 eth1
VIP192.168.100.2
32. CBL beta – Based on VM orchestrator (openstack HEAT) & KEMI
• From KEMI, We can setup the alarm condition and notification for each VM and serivce
group
heat_template_version: 2015-04-15
description: >
parameters:
resources:
my_instance:
….
my_scalingroup:
….
outputs:
ecmpip:
Heat-API
output:
• LB IP
• scale in url
• scale out url
KEMI:
• alarm triggering
33. CBL beta – Based on VM orchestrator (openstack HEAT) & KEMI
• From Performance Point of view Nothing Changed From CBL Alpha to Beta
• But From Developing Cost, It has dramatic savings and progress
• The CBL engine code is from 4404 line to 446 line (1/10)
• After model changes, I used three day to change engine
• IF you have a good Cloud, It’s very easy to make new network function