Damien Garros, Network Reliability Engineer @ Roblox
Network Device Properties
As Code
NANOG 75, San Francisco
Twitter @damgarros
Github @dgarros
Agenda
1. Introduction / Roblox
2. Network device properties definition
3. How are we managing device properties today ?
4. How to manage Network device properties as code
5. Questions
Introduction /
Roblox
1
What is Roblox ?
● Educational platform for young software developers
● Gaming and Social platform
● Core audience for player is kids ages 9-12
● 2 Million Active developers
● 80+ Million monthly active users
● AS 22697
Rebuilt everything in 2018
201920182017
1 Legacy DC
0 Legacy DC
2 New DCs
12 POPs
Manual
Provisioning
Source Of
Truth
SW and RTR
Automated
LB
Automated
● Couldn’t have done it without a proper Source of Truth
● The Source Of Truth is the Network Property Store
Start with the Source of Truth
Network
Source of
Truth
● Integrate everything with it
● Your Source Of Truth is only
as good as the quality of the
data it contains
Vendors don’t tell you to deploy a Source
of Truth because it introduces
dependencies and requirements.
But it’s the most important part.
Automation Stack @ Roblox
Network
Source of
Truth
Ansible / Jinja
Observium
Influxdb / Grafana
Custom Collector
(REST / Netconf)
Netbox
Extensions
Netbox
Git
Custom
Alert Manager
Network device properties
2
Network device properties
● Name
● IP addresses
● Cabling information / Peer properties
● Vlans
● BGP Peering
● Device specific info (ASN, etc.. )
Each device has a unique set of properties
P5 P6P3 P4
T2
● Unique set of properties
per device
T1
P1 P2
● 1 template per role
Your properties reflect your network design
P5 P6P3 P4
P1 P2Network Design
Naming Convention
Cabling Convention
Datacenter Layout
Vendor Specific Information
People are failing to automate their
network because they simplify the
problem and assume that everything is
homogeneous
Example
Console server
SFO NYC
Be prepared to manage MANY version of your
properties
Network Design
Naming Convention
Cabling Convention
Datacenter Layout
Vendor Specific Information
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
For every rule, there is an exception.
So you always follow the rule,
except when there is an exception
In which case you follow a new rule
based on that exception.
Be prepared to manage MANY version of your
properties
Network Design
Naming Convention
Cabling Convention
Datacenter Layout
Vendor Specific Information
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
Properties @ Roblox
In 12 months we had to manage
● 1 000 Network Devices
● 26 000 IP addresses
● 4 500 Prefixes
● 42 different design revision just for the network
● Up to 9 versions for a given network device role
We also added
How are organizations managing
Network device properties today ?
3
What are the ways to generate these properties?
By Hand By Script / Code
Pros / Cons with Script / Code approach
Cons
● Hard to support multiple
version of properties
● Need to “Write code” to adapt
the design
● Hard to maintain
Pros
● Can generate large number of
properties quickly
● Very flexible
What are the ways to store these properties?
Source of Truth
Database
Git
All of the above
Network devices
configuration
How to manage network device
properties as Code ?
4
Infrastructure as Code principles
● Idempotent > Always the same results
● Version Control Friendly > Input as text file, peer
review
● Safe & Predictable > Plan everything before,
know what changes will be made before you run it.
High level workflow
Source of Truth
P1
P2
P3
P4
P7
P7
P6P5
v1
Generate
Read
Network
Builder
High level workflow - Plan & Apply
v1
PLAN
Source of Truth
Network
Builder
P3
P4
Read
APPLY
Generate
Diff
Read
Source of Truth
Network
Builderv1
Infrastructure as Code principles
● Idempotent > Always the same results
● Version Control Friendly > Input as text file, peer
review
● Safe & Predictable > Plan everything before,
know what changes will be made before you run it.
How to capture your design for a rack switch ?
Name rsw, id of the cluster, name of the site separated by dashes
Loopback Any IP from the management network of the site
Uplinks
1x100G interface connected to each aggregation device
1 /31 allocated per interface from the /22 block reserved for point
to point links
Console port Any port on the the nearest console server
Server ports A /24 network allocated from the /16 block reserved for server
Network Builder - Building blocks
Create Properties
Manage Diff
Source of Truth
Resource Manager
Network Builder
Resource Manager
Manage all your resources as you manage IPs with DHCP
1. Possible to reserve resource in advance
2. Each resource allocated is associated with an ID
3. Same ID always get the same response.
Resource Manager - Example
asn: [ 65100, 65200 ]
prefixes:
loopback: 10.10.10.0/24
point-to-point:10.128.0.0/22
Site SFO● Create pools of resource
Identifiable with name or roles
● Query resources by defining
○ WHAT type of resource
○ From WHICH pool
○ WHO is requesting Resource
Manager
WHAT : Loopback (/32)
WHICH : Loopback in SFO
WHO : device1
Query
Resp
10.10.10.1/32
Represent a property in a compact way
<LO4::sfo/loopback>
Variable Type
ASN LO4
NET_IP
INT VLAN
Pool Name / Path
Can be different per
Variable Type
WHAT WHICH
WHO is determined based on when this query is invoked
name: "rsw1-1-sfo"
elevation: 30
type: qfx5100
role: rack-switch
ASN: "<ASN::sfo/private>"
network:
lo0.0:
ips:
- addr: "<LO4::sfo/internal-loopbacks>"
- addr: "<LO6::external-loopbacks>"
p2p:
et-0/0/48:
peer: "<DEV_INT::psw1-sfo/rack-switch>"
ips:
- addr: "<NET_IP4::sfo/point-to-point/31>"
et-0/0/49:
peer: "<DEV_INT::psw1-sfo/rack-switch>"
ips:
- addr: "<NET_IP4::sfo/point-to-point/31>"
Define your device properties with variables
name: "rsw1-1-sfo"
elevation: 30
type: qfx5100
role: rack-switch
ASN: 65100
network:
lo0.0:
ips:
- addr: 10.10.10.1/32
- addr: 2020:1234:beef::756/128
p2p:
et-0/0/48:
peer: psw1-sfo::et-0/0/1
ips:
- addr: 10.128.195.124/31
et-0/0/49:
peer: psw2-sfo::et-0/0/1
ips:
- addr: 10.128.195.126/31
Define your device properties with variables
rack_switch_v1:
name: "rsw{{id}}-1-sfo"
elevation: 30
type: qfx5100
role: rack-switch
ASN: "<ASN::sfo/private>"
network:
lo0.0:
ips:
- addr: "<LO4::sfo/internal-loopbacks>"
- addr: "<LO6::external-loopbacks>"
p2p:
et-0/0/48:
peer: "<DEV_INT::psw1-sfo/rack-switch>"
ips:
- addr: "<NET_IP4::sfo/point-to-point/31>"
Create template of design
site: sfo
racks:
101:
racks: rack_switch_v1
id: 1
102:
racks: rack_switch_v1
id: 2
103:
racks: rack_switch_v1
id: 3
Reuse templates across site and rack
site: sfo
racks:
101:
name: rsw1-1-sfo
ASN: 62100
[ .. ]
102:
name: rsw2-1-sfo
ASN: 62101
[ .. ]
103:
name: rsw3-1-sfo
ASN: 62102
[ .. ]
Design is often dependent on the location
● Which Console Server should I connect to ?
● Which Cluster / Pod is this rack part of ?
● Which out-of-band device should I connect to ?
● What is the out-of-band network for this rack ?
● ...
Context Resolution
Input File
text / yaml
Convert
Back to
Txt
extract
Site
and
Rack info
YML
TXT
Import
site:
name: sfo
elevation: 30
pod_id: 1
rack:
console: co1-sfo
mgmt_nwk: 10.0.0.0/16
Calculate / Resolve
Rack Specific info
YMLJinja
name: "rsw{{ id }}-{{rack.pod_id}}-{{site.name}}"
elevation: 30
rack_face: front
type: qfx5100
nb_role: rack-switch
ASN: "<ASN::{{site.name}}/private>"
network:
lo0.0:
ips:
- addr: "<LO4::{{site.name}}/internal-loopbacks>"
- addr: "<LO6::external-loopbacks>"
p2p:
et-0/0/48:
peer: "<DEV_INT::psw1-{{site.name}}/rack-switch>"
ips:
- addr: "<NET_IP4::{{site.name}}/point-to-point/31>"
et-0/0/49:
peer: "<DEV_INT::psw1-{{site.name}}/rack-switch>"
ips:
- addr: "<NET_IP4::{{site.name}}/point-to-point/31>"
Use Jinja to add contextual information
Network Builder - 3 main components
Context Resolution
Pull information specific per
site and rack
Pod and Cluster Info
Console Server
PDU
OOB devices, Ips…
Variables Resolution
Resolve/generate
properties using the
resource manager
Apply / Create
Understand
what already exist
what needs to be created
Apply the diff
PLAN APPLY
Next steps
● Get feedback on this approach
● Open Source the resource manager
● Open Source the network builder
Thank You

Nanog75, Network Device Property as Code

  • 1.
    Damien Garros, NetworkReliability Engineer @ Roblox Network Device Properties As Code NANOG 75, San Francisco Twitter @damgarros Github @dgarros
  • 2.
    Agenda 1. Introduction /Roblox 2. Network device properties definition 3. How are we managing device properties today ? 4. How to manage Network device properties as code 5. Questions
  • 3.
  • 4.
    What is Roblox? ● Educational platform for young software developers ● Gaming and Social platform ● Core audience for player is kids ages 9-12 ● 2 Million Active developers ● 80+ Million monthly active users ● AS 22697
  • 5.
    Rebuilt everything in2018 201920182017 1 Legacy DC 0 Legacy DC 2 New DCs 12 POPs Manual Provisioning Source Of Truth SW and RTR Automated LB Automated ● Couldn’t have done it without a proper Source of Truth ● The Source Of Truth is the Network Property Store
  • 6.
    Start with theSource of Truth Network Source of Truth ● Integrate everything with it ● Your Source Of Truth is only as good as the quality of the data it contains
  • 7.
    Vendors don’t tellyou to deploy a Source of Truth because it introduces dependencies and requirements. But it’s the most important part.
  • 8.
    Automation Stack @Roblox Network Source of Truth Ansible / Jinja Observium Influxdb / Grafana Custom Collector (REST / Netconf) Netbox Extensions Netbox Git Custom Alert Manager
  • 9.
  • 10.
    Network device properties ●Name ● IP addresses ● Cabling information / Peer properties ● Vlans ● BGP Peering ● Device specific info (ASN, etc.. )
  • 11.
    Each device hasa unique set of properties P5 P6P3 P4 T2 ● Unique set of properties per device T1 P1 P2 ● 1 template per role
  • 12.
    Your properties reflectyour network design P5 P6P3 P4 P1 P2Network Design Naming Convention Cabling Convention Datacenter Layout Vendor Specific Information
  • 13.
    People are failingto automate their network because they simplify the problem and assume that everything is homogeneous
  • 14.
  • 15.
    Be prepared tomanage MANY version of your properties Network Design Naming Convention Cabling Convention Datacenter Layout Vendor Specific Information v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1 v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1
  • 16.
    For every rule,there is an exception. So you always follow the rule, except when there is an exception In which case you follow a new rule based on that exception.
  • 17.
    Be prepared tomanage MANY version of your properties Network Design Naming Convention Cabling Convention Datacenter Layout Vendor Specific Information v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1 v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1
  • 18.
    Properties @ Roblox In12 months we had to manage ● 1 000 Network Devices ● 26 000 IP addresses ● 4 500 Prefixes ● 42 different design revision just for the network ● Up to 9 versions for a given network device role We also added
  • 19.
    How are organizationsmanaging Network device properties today ? 3
  • 20.
    What are theways to generate these properties? By Hand By Script / Code
  • 21.
    Pros / Conswith Script / Code approach Cons ● Hard to support multiple version of properties ● Need to “Write code” to adapt the design ● Hard to maintain Pros ● Can generate large number of properties quickly ● Very flexible
  • 22.
    What are theways to store these properties? Source of Truth Database Git All of the above Network devices configuration
  • 23.
    How to managenetwork device properties as Code ? 4
  • 24.
    Infrastructure as Codeprinciples ● Idempotent > Always the same results ● Version Control Friendly > Input as text file, peer review ● Safe & Predictable > Plan everything before, know what changes will be made before you run it.
  • 25.
    High level workflow Sourceof Truth P1 P2 P3 P4 P7 P7 P6P5 v1 Generate Read Network Builder
  • 26.
    High level workflow- Plan & Apply v1 PLAN Source of Truth Network Builder P3 P4 Read APPLY Generate Diff Read Source of Truth Network Builderv1
  • 27.
    Infrastructure as Codeprinciples ● Idempotent > Always the same results ● Version Control Friendly > Input as text file, peer review ● Safe & Predictable > Plan everything before, know what changes will be made before you run it.
  • 28.
    How to captureyour design for a rack switch ? Name rsw, id of the cluster, name of the site separated by dashes Loopback Any IP from the management network of the site Uplinks 1x100G interface connected to each aggregation device 1 /31 allocated per interface from the /22 block reserved for point to point links Console port Any port on the the nearest console server Server ports A /24 network allocated from the /16 block reserved for server
  • 29.
    Network Builder -Building blocks Create Properties Manage Diff Source of Truth Resource Manager Network Builder
  • 30.
    Resource Manager Manage allyour resources as you manage IPs with DHCP 1. Possible to reserve resource in advance 2. Each resource allocated is associated with an ID 3. Same ID always get the same response.
  • 31.
    Resource Manager -Example asn: [ 65100, 65200 ] prefixes: loopback: 10.10.10.0/24 point-to-point:10.128.0.0/22 Site SFO● Create pools of resource Identifiable with name or roles ● Query resources by defining ○ WHAT type of resource ○ From WHICH pool ○ WHO is requesting Resource Manager WHAT : Loopback (/32) WHICH : Loopback in SFO WHO : device1 Query Resp 10.10.10.1/32
  • 32.
    Represent a propertyin a compact way <LO4::sfo/loopback> Variable Type ASN LO4 NET_IP INT VLAN Pool Name / Path Can be different per Variable Type WHAT WHICH WHO is determined based on when this query is invoked
  • 33.
    name: "rsw1-1-sfo" elevation: 30 type:qfx5100 role: rack-switch ASN: "<ASN::sfo/private>" network: lo0.0: ips: - addr: "<LO4::sfo/internal-loopbacks>" - addr: "<LO6::external-loopbacks>" p2p: et-0/0/48: peer: "<DEV_INT::psw1-sfo/rack-switch>" ips: - addr: "<NET_IP4::sfo/point-to-point/31>" et-0/0/49: peer: "<DEV_INT::psw1-sfo/rack-switch>" ips: - addr: "<NET_IP4::sfo/point-to-point/31>" Define your device properties with variables
  • 34.
    name: "rsw1-1-sfo" elevation: 30 type:qfx5100 role: rack-switch ASN: 65100 network: lo0.0: ips: - addr: 10.10.10.1/32 - addr: 2020:1234:beef::756/128 p2p: et-0/0/48: peer: psw1-sfo::et-0/0/1 ips: - addr: 10.128.195.124/31 et-0/0/49: peer: psw2-sfo::et-0/0/1 ips: - addr: 10.128.195.126/31 Define your device properties with variables
  • 35.
    rack_switch_v1: name: "rsw{{id}}-1-sfo" elevation: 30 type:qfx5100 role: rack-switch ASN: "<ASN::sfo/private>" network: lo0.0: ips: - addr: "<LO4::sfo/internal-loopbacks>" - addr: "<LO6::external-loopbacks>" p2p: et-0/0/48: peer: "<DEV_INT::psw1-sfo/rack-switch>" ips: - addr: "<NET_IP4::sfo/point-to-point/31>" Create template of design
  • 36.
    site: sfo racks: 101: racks: rack_switch_v1 id:1 102: racks: rack_switch_v1 id: 2 103: racks: rack_switch_v1 id: 3 Reuse templates across site and rack site: sfo racks: 101: name: rsw1-1-sfo ASN: 62100 [ .. ] 102: name: rsw2-1-sfo ASN: 62101 [ .. ] 103: name: rsw3-1-sfo ASN: 62102 [ .. ]
  • 37.
    Design is oftendependent on the location ● Which Console Server should I connect to ? ● Which Cluster / Pod is this rack part of ? ● Which out-of-band device should I connect to ? ● What is the out-of-band network for this rack ? ● ...
  • 38.
    Context Resolution Input File text/ yaml Convert Back to Txt extract Site and Rack info YML TXT Import site: name: sfo elevation: 30 pod_id: 1 rack: console: co1-sfo mgmt_nwk: 10.0.0.0/16 Calculate / Resolve Rack Specific info YMLJinja
  • 39.
    name: "rsw{{ id}}-{{rack.pod_id}}-{{site.name}}" elevation: 30 rack_face: front type: qfx5100 nb_role: rack-switch ASN: "<ASN::{{site.name}}/private>" network: lo0.0: ips: - addr: "<LO4::{{site.name}}/internal-loopbacks>" - addr: "<LO6::external-loopbacks>" p2p: et-0/0/48: peer: "<DEV_INT::psw1-{{site.name}}/rack-switch>" ips: - addr: "<NET_IP4::{{site.name}}/point-to-point/31>" et-0/0/49: peer: "<DEV_INT::psw1-{{site.name}}/rack-switch>" ips: - addr: "<NET_IP4::{{site.name}}/point-to-point/31>" Use Jinja to add contextual information
  • 40.
    Network Builder -3 main components Context Resolution Pull information specific per site and rack Pod and Cluster Info Console Server PDU OOB devices, Ips… Variables Resolution Resolve/generate properties using the resource manager Apply / Create Understand what already exist what needs to be created Apply the diff PLAN APPLY
  • 41.
    Next steps ● Getfeedback on this approach ● Open Source the resource manager ● Open Source the network builder
  • 42.