SlideShare a Scribd company logo
1 of 53
Download to read offline
Adam Mills, Principal Network Engineer
Damien Garros, Network Reliability Engineer
Ansiblefest, Austin October 3rd 2018
Network Automation Journey at Roblox
from manual to highly automated network in 6 months
1. How did you get started with Ansible?
2. How long have you been using it?
3. What's your favorite thing to do when you Ansible?
Adam Mills Damien Garros
1. What is Roblox ?
2. Automation Project Architecture
3. Managing Device Configurations
4. Managing Changes from Design to Implementation
5. Culture & Organization
Questions ?
Agenda
What is Roblox ?
1
● Educational platform for young software developers
● Gaming and Social platform
● Core audience is children ages 9-12
● 70 million monthly active users
What is Roblox?
Source: comScore Custom Analysis, Total Digital (does not include Mobile data), December 2017
Comparison of Top Online Media Properties,
Total Monthly Hours
51.5
32.5
59.4
19.4
3.4
6.8
1.9
7.3
.52
2.3
.27
1.7
.18
1.3
.11
1.2
.07 .96 .04 .23
(in Millions)
The Challenge in Front of us
DC1 DC2
POP
POPPOP
POP POP
POP POP
POP
POP
DC3
POP
Dec 2017
Dec 2018
Automation Project
Architecture
2
How we got started ..
A Github
Account
Network Engineer
with Laptops Some
VMs
You don’t need much to get started
● Doesn’t require a lot of resources
● Doesn’t require a team of python developers
● Doesn’t requires a CI/CD pipeline
Netbox as a DCIM / IPAM Solution
● We decided to use Netbox for
○ IPAM Solution
○ Cabling information
○ Device Inventory management system
■ Network and Server
https://github.com/digitalocean/netbox
Track Device status in Netbox
Planned for future deployment
Physically Racked and Powered, not configured yet
Configured but not in production / maintenance mode
Production
Device configuration changes based on its own status
and status of peers
High level Workflow
Devices list / Role
IP addresses
Connections
Jinja
Existing
Devices
New
Devices
Network
Builder
(Roblox developed)
Import existing devices
1. Create inventory file manually
2. Create a playbook to create devices in netbox using API
3. Create a playbook to get interfaces list from devices using
Napalm and create them in Netbox
4. Create a playbook to get Ips from devices using Napalm and
create them in Netbox
5. Create a playbook to get LLDP info and create links in netbox
Ansible integration with Netbox
Dynamic
Inventory
Custom Module
rblx_dev_variables local
cache
Pull information from Netbox
Dynamic Inventory
Run before each playbook
Pull device list and basics device attributes
Create Group Dynamically based on :
role, custom fields, location etc ..
Need to be Fast
Custom Module
Execute a lot of queries and Merge all the
information into a single Data structure
(device model)
1 Execution per device
Create a local cache in host_vars
Somehow slow
takes couple mins to run
Custom module to generate device model
● Pull interfaces / IP / links / circuits information from Netbox
● Create a single data structure with all information
● Pre-calculated all peers IPs for Point to Point links (/31&/127)
● Generate interface description based on internal rules
● Save all information in a local cache under host_vars
Precalculated Peers IP address for point to point
links
p2p_peers:
- ip_family: 4
link_is_active: true
local_int: et-0/0/1.0
local_ip: 10.10.194.31/31
local_status: Active
peer_int: et-0/0/63
peer_ip: 10.10.194.30/31
peer_name: cs1-c1-chi1
- ip_family: 4
link_is_active: true
local_int: et-0/0/7.0
local_ip: 10.10.194.39/31
local_status: Active
peer_int: et-0/0/63
peer_ip: 10.10.194.38/31
peer_name: cs2-c1-chi1
Local Variables directory
├── group_vars
│ ├── all
│ │ ├── netbox-auto-generated.yaml
│ │ └── peeringdb.yaml
│ ├── rack-switch.yaml
│ └── sjc1
│ ├── sjc1-ix.yaml
│ └── sjc1.yaml
├── host_vars
│ ├── br1-sjc1
│ │ ├── br1-sjc1.yaml
│ │ └── netbox-auto-generated.yaml
│ ├── br2-sjc1
│ │ ├── br2-sjc1.yaml
│ │ └── netbox-auto-generated.yaml
Packaged Ansible inside a Docker Container
● Docker container to run Ansible.
● Predictable Environment
● Easy to ship new:
○ modules / plugins
○ external library etc ..
● Easy to control
Packaged Ansible inside a Docker Container
● Python
● Ansible
● Required Library
Container
Host
● configuration templates
● playbooks
● all yaml files
Volume Mapping /script
Use docker to create multiple Environment
Datacenter
Dynamic inventory config
Specific Playbooks
Specific Local Variables
Backbone
Dynamic inventory config
Specific Playbooks
Specific Local Variables
Load Balancer
Dynamic inventory config
Specific Playbooks
Specific Local Variables
Servers
Dynamic inventory config
Specific Playbooks
Specific Local Variables
Dynamic Inventory Script
Shared Roles and Modules
Shared Playbooks
netbox:
group_by:
default: [ device_role, rack, site ]
custom: [ design_rev, service_group ]
filters:
dc:
- site: [ dc1, dc2 ]
border-router:
- role: border-router
hosts_vars:
ip:
ansible_ssh_host: primary_ip
general:
platform: platform
role: device_role
site: site
device_type:
device_type: slug
status:
status: label
Dynamic Inventory Configuration
Based on AAbouZaid/netbox-as-ansible-inventory project
Dynamic Inventory script behavior defined in a config file
group_by to define the ansible groups we need
Filters to limit the devices list that get pulled from netbox
host_vars to define device host_vars to populate
● This is a very important piece of the puzzle
● So strategic that we decided to fork the initial project
and maintain our own version
├── _datacenter
├── _pop
├── _load_balancer
├── shared
│ ├── ansible.cfg
│ ├── filters
│ ├── inventory
│ ├── library
│ ├── playbooks
│ ├── plugins
│ ├── roles
│ └── variables.yaml
├── Dockerfile
├── Makefile
├── README.md
└── requirements.txt
Root Directory
Datacenter Environment Directory
├── configs
├── group_vars
├── host_vars
├── netbox.yml
├── pb.check.cabling.yaml
├── pb.check.p2p.yaml
├── pb.config.commit.yaml
├── pb.config.diff.yaml
├── pb.config.generate.yaml
├── pb.variable.yaml
└── shared -> /script/shared/playbooks/
Makefile
DOCKER_IMG = roblox/neteng
DOCKER_TAG = 0.0.12
build:
docker build -t $(DOCKER_IMG):$(DOCKER_TAG) .
datacenter:
docker run -it -v $(shell pwd):/script/ 
-e NETBOX_CONFIG_FILE=/script/_datacenter/netbox.yml 
-e ROOT_DIR=/script 
-e LDAP_USER=$(shell whoami) 
-w /script/_datacenter $(DOCKER_IMG):$(DOCKER_TAG) bash
Managing
Device Configurations
3
Different approach to automation
Config deployment Load Override Merge-ish
Change Diff Supported/Easy --check
Add new elements Easy Easy
Remove elements Easy Hard
Build the configs with reusable templates
P5 P6P3 P4
T2
● Unique set of properties
per device
T1
P1 P2
● Template per role
● Reusable base
B
● Banners
● Logging strings
● Communities
Leverage power of hostvars and local cache
Building in Flight & Dealing with Legacy
● Had to start without the full tool kit
● Handle different stages of the life cycle
○ New
○ Retrofit
○ Maintain
● We needed away to Test before commit
Playbooks used
├── pb.config.generate.yaml
├── pb.config.diff.yaml
├── pb.config.commit.yaml
● Junos Diff
○ Iterate on one device at a time
○ Bring “legacy” and “brownfield” devices under Ansible.
○ Once the templates match reality “commit and-quit” with
confidence
Test:
Validate template changes with Diff
Intended Result:
Diff file empty
When the results are True:
Automation matches reality
TDD with Junos using “Diff”
ExpectationsManaging
VIP/Pool/Node Management
● No clustering, GSLB
● Deploy similar pools on different load
balancers
● Ansible:
○ Node → 8 (add)
○ Pool → 2 (add)
○ Node to Pool → 16 (add)
○ VIP → 2 (add)
○ 28 actions * 4 F5s = 112
● Extremely slow
● Adding 30 nodes = 94 * 4 = 376
● Pools of 300+ servers :(
● Wait 2 are missing
● How to handle removals
● Do we keep track of removed hosts?
● How can we create a diff?
● Custom Module: Processor
● Every run
○ Monitors
○ Pools
○ Virtual Server/Virtual Addresses
● Processor Actions
○ Add Nodes
○ Add Nodes to Pools
○ Remove Nodes
○ Remove Pools
○ Remove Virtual Servers/Virtual Addresses
Normal Playbook Activities After the Processor
StateSources of Truth
bigip_facts
node/
pool/
ProcessorActions
node/ [‘remove’]=2
[‘add’]=30
pool/[‘remove’]=4
[‘add’]=60
● Diff for other vendors
○ Arista added this feature
● F5
○ Full config not an option in Ansible
○ Custom modules for bootstrap
○ Manage VIPs, nodes, pools
After thoughts...
Managing Changes from
Design to Implementation
4
On Paper vs Implementation
Quest for the “Golden Config”
Examples of Rack Switch Variation
web
web
application
virtualization
database
virtualization
game
virtualization
provisioning
Build the configs with reusable parts
P5 P6P3 P4
T2
● Unique set of properties
per device
T1
P1 P2
● Template per role
● Reusable base
B
● Banners
● Logging strings
● Communities
P4 P6
Device Revisions
● Allows for iterations on
design principles
Device Revisions
Rack Revisions
Rack Revisions
● Systems Engineers own
● Ownership is pushed out
● Avoids asynchronous communication
● Keeps both teams honest
● Ensures that all things are codified
Helping both server teams and networking teams
If it is captured in code, it’s not a one off.
Network Design
Naming Convention
Cabling Convention
Datacenter Layout
Vendor Specific Information
Device Revision
Rack Revision
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
v1
v2
v2.1
v1.1
v2.2
v2.1
v1.2
v2.4
v2.1
Culture & Organization
5
How to win
● Strong support within the organization
○ Automation is the only long term solution
● Move quickly and iterate
● It’s okay if it’s not perfect the first time
○ It WON’T be right the first time
● Persistence
○ Insist the solution
○ But, listen and adapt
The winning team of NE + NRE
Network Engineer
(NE)
Responsible to define the network
architecture
Consume automation tools
Own config templates
Comfortable with Git
Network Reliability Engineer
(NRE)
Responsible to define the automation suite
architecture
Package / Develop / Maintain the tools
Comfortable with network devices and
architecture.
Thank you!
Reference
Netbox > https://github.com/digitalocean/netbox
Netbox Builder (meetup)
Slides https://www.slideshare.net/dgarros/banog-meetup-august-30th-network-device-property-as-code
Video https://youtu.be/sUJt26MXVl4
Ansible Dynamic Inventory https://github.com/AAbouZaid/netbox-as-ansible-inventory

More Related Content

What's hot

Software Testing Capability doc
Software Testing Capability doc Software Testing Capability doc
Software Testing Capability doc PM Venkatesha Babu
 
Janus/SIP @ OpenSIPS 2019
Janus/SIP @ OpenSIPS 2019Janus/SIP @ OpenSIPS 2019
Janus/SIP @ OpenSIPS 2019Lorenzo Miniero
 
Identity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibilityIdentity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibilityRyan Dawson
 
Testing your APIs Performance.pptx
Testing your APIs Performance.pptxTesting your APIs Performance.pptx
Testing your APIs Performance.pptxPricilla Bilavendran
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21Lorenzo Miniero
 
Transform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph TechnologiesTransform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph TechnologiesNeo4j
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
Simplifying The S's: Single Sign-On, SPNEGO and SAML
Simplifying The S's: Single Sign-On, SPNEGO and SAMLSimplifying The S's: Single Sign-On, SPNEGO and SAML
Simplifying The S's: Single Sign-On, SPNEGO and SAMLGabriella Davis
 
MuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysMuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysAngel Alberici
 
Monitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and GrafanaMonitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and GrafanaJustin Reock
 
Introduction to Protractor
Introduction to ProtractorIntroduction to Protractor
Introduction to ProtractorFlorian Fesseler
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessDerek Collison
 
IBM Datapower Security Scenarios - Using JWT to secure microservices
IBM Datapower Security Scenarios - Using JWT  to secure microservicesIBM Datapower Security Scenarios - Using JWT  to secure microservices
IBM Datapower Security Scenarios - Using JWT to secure microservicessandipg123
 
Exploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex GatewayExploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex Gatewayshyamraj55
 
Xray for Jira - Overview
Xray for Jira - OverviewXray for Jira - Overview
Xray for Jira - OverviewXpand IT
 
Jira BigPicture
Jira BigPictureJira BigPicture
Jira BigPictureOnlio
 
Orchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQOrchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQVMware Tanzu
 
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...StampedeCon
 

What's hot (20)

Software Testing Capability doc
Software Testing Capability doc Software Testing Capability doc
Software Testing Capability doc
 
Janus/SIP @ OpenSIPS 2019
Janus/SIP @ OpenSIPS 2019Janus/SIP @ OpenSIPS 2019
Janus/SIP @ OpenSIPS 2019
 
Identity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibilityIdentity management and single sign on - how much flexibility
Identity management and single sign on - how much flexibility
 
API Testing for everyone.pptx
API Testing for everyone.pptxAPI Testing for everyone.pptx
API Testing for everyone.pptx
 
Testing your APIs Performance.pptx
Testing your APIs Performance.pptxTesting your APIs Performance.pptx
Testing your APIs Performance.pptx
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21
 
Transform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph TechnologiesTransform Your Telecom Operations with Graph Technologies
Transform Your Telecom Operations with Graph Technologies
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Simplifying The S's: Single Sign-On, SPNEGO and SAML
Simplifying The S's: Single Sign-On, SPNEGO and SAMLSimplifying The S's: Single Sign-On, SPNEGO and SAML
Simplifying The S's: Single Sign-On, SPNEGO and SAML
 
MuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleysMuleSoft Sizing Guidelines - VirtualMuleys
MuleSoft Sizing Guidelines - VirtualMuleys
 
Monitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and GrafanaMonitoring Java Applications with Prometheus and Grafana
Monitoring Java Applications with Prometheus and Grafana
 
Introduction to Protractor
Introduction to ProtractorIntroduction to Protractor
Introduction to Protractor
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
 
IBM Datapower Security Scenarios - Using JWT to secure microservices
IBM Datapower Security Scenarios - Using JWT  to secure microservicesIBM Datapower Security Scenarios - Using JWT  to secure microservices
IBM Datapower Security Scenarios - Using JWT to secure microservices
 
Postman
PostmanPostman
Postman
 
Exploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex GatewayExploring Universal API Management And Flex Gateway
Exploring Universal API Management And Flex Gateway
 
Xray for Jira - Overview
Xray for Jira - OverviewXray for Jira - Overview
Xray for Jira - Overview
 
Jira BigPicture
Jira BigPictureJira BigPicture
Jira BigPicture
 
Orchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQOrchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQ
 
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
 

Similar to Ansiblefest 2018 Network automation journey at roblox

Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Nicolas Brousse
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
Banog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeBanog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeDamien Garros
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Ruslan Meshenberg
 
DevOPS training - Day 2/2
DevOPS training - Day 2/2DevOPS training - Day 2/2
DevOPS training - Day 2/2Vincent Mercier
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3sHaggai Philip Zagury
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned RightScale
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyJérémy Wimsingues
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleAmir Moghimi
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tipsSamuel Chow
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and HerokuTapio Rautonen
 
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander KukushkinPGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander KukushkinEqunix Business Solutions
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 

Similar to Ansiblefest 2018 Network automation journey at roblox (20)

Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
Banog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeBanog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as code
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
 
DevOPS training - Day 2/2
DevOPS training - Day 2/2DevOPS training - Day 2/2
DevOPS training - Day 2/2
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success story
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battle
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tips
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
 
Before & After Docker Init
Before & After Docker InitBefore & After Docker Init
Before & After Docker Init
 
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander KukushkinPGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 

Recently uploaded

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Ansiblefest 2018 Network automation journey at roblox

  • 1. Adam Mills, Principal Network Engineer Damien Garros, Network Reliability Engineer Ansiblefest, Austin October 3rd 2018 Network Automation Journey at Roblox from manual to highly automated network in 6 months
  • 2. 1. How did you get started with Ansible? 2. How long have you been using it? 3. What's your favorite thing to do when you Ansible? Adam Mills Damien Garros
  • 3. 1. What is Roblox ? 2. Automation Project Architecture 3. Managing Device Configurations 4. Managing Changes from Design to Implementation 5. Culture & Organization Questions ? Agenda
  • 5. ● Educational platform for young software developers ● Gaming and Social platform ● Core audience is children ages 9-12 ● 70 million monthly active users What is Roblox?
  • 6. Source: comScore Custom Analysis, Total Digital (does not include Mobile data), December 2017 Comparison of Top Online Media Properties, Total Monthly Hours 51.5 32.5 59.4 19.4 3.4 6.8 1.9 7.3 .52 2.3 .27 1.7 .18 1.3 .11 1.2 .07 .96 .04 .23 (in Millions)
  • 7. The Challenge in Front of us DC1 DC2 POP POPPOP POP POP POP POP POP POP DC3 POP Dec 2017 Dec 2018
  • 9. How we got started .. A Github Account Network Engineer with Laptops Some VMs
  • 10. You don’t need much to get started ● Doesn’t require a lot of resources ● Doesn’t require a team of python developers ● Doesn’t requires a CI/CD pipeline
  • 11. Netbox as a DCIM / IPAM Solution ● We decided to use Netbox for ○ IPAM Solution ○ Cabling information ○ Device Inventory management system ■ Network and Server https://github.com/digitalocean/netbox
  • 12. Track Device status in Netbox Planned for future deployment Physically Racked and Powered, not configured yet Configured but not in production / maintenance mode Production Device configuration changes based on its own status and status of peers
  • 13. High level Workflow Devices list / Role IP addresses Connections Jinja Existing Devices New Devices Network Builder (Roblox developed)
  • 14. Import existing devices 1. Create inventory file manually 2. Create a playbook to create devices in netbox using API 3. Create a playbook to get interfaces list from devices using Napalm and create them in Netbox 4. Create a playbook to get Ips from devices using Napalm and create them in Netbox 5. Create a playbook to get LLDP info and create links in netbox
  • 15. Ansible integration with Netbox Dynamic Inventory Custom Module rblx_dev_variables local cache
  • 16. Pull information from Netbox Dynamic Inventory Run before each playbook Pull device list and basics device attributes Create Group Dynamically based on : role, custom fields, location etc .. Need to be Fast Custom Module Execute a lot of queries and Merge all the information into a single Data structure (device model) 1 Execution per device Create a local cache in host_vars Somehow slow takes couple mins to run
  • 17. Custom module to generate device model ● Pull interfaces / IP / links / circuits information from Netbox ● Create a single data structure with all information ● Pre-calculated all peers IPs for Point to Point links (/31&/127) ● Generate interface description based on internal rules ● Save all information in a local cache under host_vars
  • 18. Precalculated Peers IP address for point to point links p2p_peers: - ip_family: 4 link_is_active: true local_int: et-0/0/1.0 local_ip: 10.10.194.31/31 local_status: Active peer_int: et-0/0/63 peer_ip: 10.10.194.30/31 peer_name: cs1-c1-chi1 - ip_family: 4 link_is_active: true local_int: et-0/0/7.0 local_ip: 10.10.194.39/31 local_status: Active peer_int: et-0/0/63 peer_ip: 10.10.194.38/31 peer_name: cs2-c1-chi1
  • 19. Local Variables directory ├── group_vars │ ├── all │ │ ├── netbox-auto-generated.yaml │ │ └── peeringdb.yaml │ ├── rack-switch.yaml │ └── sjc1 │ ├── sjc1-ix.yaml │ └── sjc1.yaml ├── host_vars │ ├── br1-sjc1 │ │ ├── br1-sjc1.yaml │ │ └── netbox-auto-generated.yaml │ ├── br2-sjc1 │ │ ├── br2-sjc1.yaml │ │ └── netbox-auto-generated.yaml
  • 20. Packaged Ansible inside a Docker Container ● Docker container to run Ansible. ● Predictable Environment ● Easy to ship new: ○ modules / plugins ○ external library etc .. ● Easy to control
  • 21. Packaged Ansible inside a Docker Container ● Python ● Ansible ● Required Library Container Host ● configuration templates ● playbooks ● all yaml files Volume Mapping /script
  • 22. Use docker to create multiple Environment Datacenter Dynamic inventory config Specific Playbooks Specific Local Variables Backbone Dynamic inventory config Specific Playbooks Specific Local Variables Load Balancer Dynamic inventory config Specific Playbooks Specific Local Variables Servers Dynamic inventory config Specific Playbooks Specific Local Variables Dynamic Inventory Script Shared Roles and Modules Shared Playbooks
  • 23. netbox: group_by: default: [ device_role, rack, site ] custom: [ design_rev, service_group ] filters: dc: - site: [ dc1, dc2 ] border-router: - role: border-router hosts_vars: ip: ansible_ssh_host: primary_ip general: platform: platform role: device_role site: site device_type: device_type: slug status: status: label Dynamic Inventory Configuration Based on AAbouZaid/netbox-as-ansible-inventory project Dynamic Inventory script behavior defined in a config file group_by to define the ansible groups we need Filters to limit the devices list that get pulled from netbox host_vars to define device host_vars to populate ● This is a very important piece of the puzzle ● So strategic that we decided to fork the initial project and maintain our own version
  • 24. ├── _datacenter ├── _pop ├── _load_balancer ├── shared │ ├── ansible.cfg │ ├── filters │ ├── inventory │ ├── library │ ├── playbooks │ ├── plugins │ ├── roles │ └── variables.yaml ├── Dockerfile ├── Makefile ├── README.md └── requirements.txt Root Directory
  • 25. Datacenter Environment Directory ├── configs ├── group_vars ├── host_vars ├── netbox.yml ├── pb.check.cabling.yaml ├── pb.check.p2p.yaml ├── pb.config.commit.yaml ├── pb.config.diff.yaml ├── pb.config.generate.yaml ├── pb.variable.yaml └── shared -> /script/shared/playbooks/
  • 26. Makefile DOCKER_IMG = roblox/neteng DOCKER_TAG = 0.0.12 build: docker build -t $(DOCKER_IMG):$(DOCKER_TAG) . datacenter: docker run -it -v $(shell pwd):/script/ -e NETBOX_CONFIG_FILE=/script/_datacenter/netbox.yml -e ROOT_DIR=/script -e LDAP_USER=$(shell whoami) -w /script/_datacenter $(DOCKER_IMG):$(DOCKER_TAG) bash
  • 28. Different approach to automation Config deployment Load Override Merge-ish Change Diff Supported/Easy --check Add new elements Easy Easy Remove elements Easy Hard
  • 29. Build the configs with reusable templates P5 P6P3 P4 T2 ● Unique set of properties per device T1 P1 P2 ● Template per role ● Reusable base B ● Banners ● Logging strings ● Communities
  • 30. Leverage power of hostvars and local cache
  • 31. Building in Flight & Dealing with Legacy ● Had to start without the full tool kit ● Handle different stages of the life cycle ○ New ○ Retrofit ○ Maintain ● We needed away to Test before commit
  • 32. Playbooks used ├── pb.config.generate.yaml ├── pb.config.diff.yaml ├── pb.config.commit.yaml ● Junos Diff ○ Iterate on one device at a time ○ Bring “legacy” and “brownfield” devices under Ansible. ○ Once the templates match reality “commit and-quit” with confidence
  • 33. Test: Validate template changes with Diff Intended Result: Diff file empty When the results are True: Automation matches reality TDD with Junos using “Diff”
  • 35. VIP/Pool/Node Management ● No clustering, GSLB ● Deploy similar pools on different load balancers ● Ansible: ○ Node → 8 (add) ○ Pool → 2 (add) ○ Node to Pool → 16 (add) ○ VIP → 2 (add) ○ 28 actions * 4 F5s = 112 ● Extremely slow ● Adding 30 nodes = 94 * 4 = 376 ● Pools of 300+ servers :( ● Wait 2 are missing ● How to handle removals ● Do we keep track of removed hosts? ● How can we create a diff? ● Custom Module: Processor
  • 36. ● Every run ○ Monitors ○ Pools ○ Virtual Server/Virtual Addresses ● Processor Actions ○ Add Nodes ○ Add Nodes to Pools ○ Remove Nodes ○ Remove Pools ○ Remove Virtual Servers/Virtual Addresses Normal Playbook Activities After the Processor
  • 37. StateSources of Truth bigip_facts node/ pool/ ProcessorActions node/ [‘remove’]=2 [‘add’]=30 pool/[‘remove’]=4 [‘add’]=60
  • 38. ● Diff for other vendors ○ Arista added this feature ● F5 ○ Full config not an option in Ansible ○ Custom modules for bootstrap ○ Manage VIPs, nodes, pools After thoughts...
  • 39. Managing Changes from Design to Implementation 4
  • 40. On Paper vs Implementation
  • 41. Quest for the “Golden Config”
  • 42. Examples of Rack Switch Variation web web application virtualization database virtualization game virtualization provisioning
  • 43. Build the configs with reusable parts P5 P6P3 P4 T2 ● Unique set of properties per device T1 P1 P2 ● Template per role ● Reusable base B ● Banners ● Logging strings ● Communities P4 P6
  • 44. Device Revisions ● Allows for iterations on design principles Device Revisions
  • 45. Rack Revisions Rack Revisions ● Systems Engineers own
  • 46. ● Ownership is pushed out ● Avoids asynchronous communication ● Keeps both teams honest ● Ensures that all things are codified Helping both server teams and networking teams
  • 47. If it is captured in code, it’s not a one off. Network Design Naming Convention Cabling Convention Datacenter Layout Vendor Specific Information Device Revision Rack Revision v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1 v1 v2 v2.1 v1.1 v2.2 v2.1 v1.2 v2.4 v2.1
  • 49. How to win ● Strong support within the organization ○ Automation is the only long term solution ● Move quickly and iterate ● It’s okay if it’s not perfect the first time ○ It WON’T be right the first time ● Persistence ○ Insist the solution ○ But, listen and adapt
  • 50. The winning team of NE + NRE Network Engineer (NE) Responsible to define the network architecture Consume automation tools Own config templates Comfortable with Git Network Reliability Engineer (NRE) Responsible to define the automation suite architecture Package / Develop / Maintain the tools Comfortable with network devices and architecture.
  • 51.
  • 53. Reference Netbox > https://github.com/digitalocean/netbox Netbox Builder (meetup) Slides https://www.slideshare.net/dgarros/banog-meetup-august-30th-network-device-property-as-code Video https://youtu.be/sUJt26MXVl4 Ansible Dynamic Inventory https://github.com/AAbouZaid/netbox-as-ansible-inventory