SlideShare a Scribd company logo
1 of 43
Building a Private Cloud to Efficiently
Handle 40 Billion Requests / Day
October 28th, 2015
Pierre Gohon | Sr. Site Reliability Engineer | pierre.gohon@tubemogul.com
Pierre Grandin | Sr. Site Reliability Engineer | pierre.grandin@tubemogul.com
Who are we?
TubeMogul (Nasdaq : TUBE)
● Enterprise software company for digital branding
● Over 27 Billion Ads served in 2014
● Over 40 Billion Ad Auctions per day in Q3 2015
● Bids processed in less than 50 ms
● Bids served in less than 80 ms (inc. network round trip)
● 5 PB of monthly video traffic served
● 1.6 EB of data stored
Who are we?
Operations Engineering
● Ensure the smooth day to day operation of the platform
infrastructure
● Provide a cost effective and cutting edge infrastructure
● Provide support to dev teams
● Team composed of SREs, SEs and DBAs (US and UA)
● Managing over 2,500 servers (virtual and physical)
Our Infrastructure
Public Cloud On Premises
Multiple locations with a mix of Public Cloud and On Premises
● 6 AWS Regions (us-east*2, us-west*2, europe, apac)
● Physical servers in Michigan / Arizona (Web/Databases)
● DNS served by third party (UltraDNS +Dynect)
● External monitoring using Catchpoint
● CDNs to deliver content
● External security audits
We’re not adding complexity!
Before Openstack: we’re already very “Hybrid”…
Why?
● Own your infrastructure stack
● Physical proximity matters (reduced/controlled latency)
● Better infrastructure planning
● Technological transparency
● … $$ !
Project timeline
Where do we stand?
● DIY ?
○ Small OPS team
■ 12 members in two timezones
■ 3 only dedicated to OpenStack
○ New challenges
■ Internal training
■ Little external support (really ?) vs AWS
■ Manage data centers (Servers, Network, …)
OpenStack challenges - Operational aspect
● Are applications AWS dependent ?
○ Internal ops tools
○ Developer’s applications
○ AWS S3, DynamoDB, SNS, SQS, SES, SWF
● Convert developers to the project : we need their support
● OpenStack release cycle (when shall we update to latest
version?)
● OpenStack really needed components ?
● How far do we go (S3 replacement ? Network control ?
Hardware control ?)
OpenStack challenges - Application migration aspect
● Managing our own ASN / IPs (v4/v6)
● Choose “best for needs” transit providers (tier 1)
● Better control routes to/from our endpoints
● Allow dedicated AWS connections / others
● Allow direct peerings to ad networks
● Want to be accountable for networking issues
● Cost control
How? Networking - External connectivity
● Applications are already designed for redundancy/cloud
● Circumvent virtualized networking limitations
● Fine-tune baremetal nodes for HAProxy
● For the future equipments are “cloud ready” (nexus 5K for
top of rack switch)
○ automatic switch configuration
○ cisco software evolutions ?
● 1G for admin, X*10G for public ?
● Leverage multicast ?
How? Networking - Hybrid physical / virtualized
How? Networking - Hybrid physical / virtualized
Network node Compute node Load balancer
public network
private network
using VLANs
1
2 3 2
How? Networking - RTT
● Latency from our DC to AWS is 6ms average in US-WEST
rtb-bidder01(rtb):~$ mtr -r -c 50 gw01.us-west-1a.public
HOST: rtb-bidder01 Loss% Snt Last Avg Best Wrst StDev
1.|-- 10.0.4.1 0.0% 50 0.2 0.2 0.1 0.3 0.0
2.|-- XXX.XXX.XXX.XXX 0.0% 50 0.2 0.3 0.2 2.6 0.3
3.|-- ae-43.r02.snjsca04.us.bb. 0.0% 50 1.4 1.5 1.2 2.3 0.2
4.|-- ae-4.r06.plalca01.us.bb.g 0.0% 50 2.0 2.1 1.8 3.4 0.3
5.|-- ae-1.amazon.plalca01.us.b 0.0% 50 39.2 3.5 1.5 39.2 5.6
6.|-- 205.251.229.40 0.0% 50 3.5 2.8 2.2 4.9 0.6
7.|-- 205.251.230.120 0.0% 50 2.1 2.3 2.0 8.5 0.9
8.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0
9.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0
10.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0
11.|-- 216.182.237.133 0.0% 50 4.0 6.0 2.7 20.2 5.2
● If you are not building a multi-thousand hypervisors cloud,
you don’t need it to be complex
● Simplifies day-to-day operations
● Home made puppet catalog
○ because less lines of code
○ because of the learning curve
○ because need to tweak settings (ulimit?)
● No need for horizon
● No need for shared storage
How? Keep it simple
● Affinity / anti-affinity rules
○ Enforce resiliency using anti-affinity rules
○ Improve performances using affinity rules
How? Leverage your knowledge of your infrastructure
{"profile": "OpenStack", "cluster": "rtb-hbase",
"hostname": "rtb-hbase-region01", "nagios_host":
"mgmt01"}
How?
Treat your infrastructure as any other
engineering project
Infrastructure As Code
● Follow standard development lifecycle
● Repeatable and consistent server
provisioning
Continuous Delivery
● Iterate quickly
● Automated code review to improve code
quality
Reliability
Improve Production Stability
Enforce Better Security Practices
How? Continuous Delivery
● We already have a lot of automation:
● ~10,000 Puppet deployments last year
● Over 8,500 production deployments via jenkins last year
● On the infrastructure:
○ masterless mode for the deployment
○ master mode once the node is up and running
● On the VMs:
○ Puppet run is triggered by cloud-init, directly at boot
○ from boot to production ready: <5 minutes
Puppet
see also : http://www.slideshare.net/NicolasBrousse/puppet-camp-paris-2015
Infrastructure As Code - Code Review
Gerrit, an industry standard : OpenStack, Eclipse, Google, Chromium,
WikiMedia, LibreOffice, Spotify, GlusterFS, etc...
Fine Grained Permissions Rules
Plugged into LDAP
Code Review per commit
Stream Events
Integrated with Jenkins, Jira and Hipchat
Managing about 600 Git repositories
Infrastructure As Code - Gerrit Integration
Infrastructure As Code - Gerrit in Action
Automatic verify : -1 if the commit doesn’t pass Jenkins code validation
Infrastructure As Code - The Workflow
Lab / QA
Prod cluster
Infrastructure As Code - Continuous Delivery with Jenkins
Infrastructure As Code - Team Awareness
Infrastructure As Code - Safe upgrade paths
Easy as 1-2-3:
1. Test your upgrades using Jenkins
2. Deploy the upgrade by pressing a
single button*
3. Enjoy the rest of your day
* https://github.com/pgrandin/lcam
fig.1 : N. Brousse, Sr. Director of Operation Engineering,
switching our production workload to OpenStack
Get ready for production :
Monitor everything
Monitor as much as you can ?
● Existing monitoring (Nagios, Graphite) still in use
● Specific checks for OpenStack
○ check component API : performance /
availability / operability
○ check resources : ports, failed instances
● Monitoring capacity metrics for all hardware
● SNMP traps for network equipment
● Monitoring is just an extension of our existing
monitoring in AWS
Monitoring auto-discovery
● New OpenStack node is automatically monitored
○ automatically / upon request
○ nagios detects new hosts (API query)
○ nagios applies component related check by role
○ graphing is also automatically updated
Centralized monitoring
Monitoring is graphing
A look in the rearview mirror
Benefits - Transparency / visibility
Discover new odd/unexpected traffic/activity patterns
Benefits - Tailored Instances
Before After
m3.xlarge + 2GB RAM? m3.2xlarge!
# nova flavor-create
rtb.collector rtb.collector
17408 8 2
Benefits - Operational Transparency
AWS
OpenStack
# cerveza -m noc -- --zone tm-sjc-1a --start demo01
# cerveza -m noc -- --zone us-east-1a --start demo01
Benefits - Efficiency
Before After
Benefits - Efficiency
1+ million rx packets/s on only 2 Haproxy Load Balancers, full SSL
What does not fit?
Downscaling does not really make sense for us
cpus are online and paid for, we should use them
Upscaling has its limits : AWS is refreshing instance types
every year …
Sometime a small feature added can have huge load
impact.
It makes sense to keep the elastic workloads (machine
learning, ...) in AWS
● We can be “double hybrids” (aws + openstack + haproxy bare
metal)
● Dev environment is needed for Openstack (new versions / break
things)
● Storage is still a big issue due to our volume (1.6 EB)
● Some stuff may stay “forever” on AWS ?
● More dev/ops communication
● OpenStack is flexible
● No need for HA everywhere
● Spikes can be offloaded on AWS
(cloud bursting)
What we’ve learnt
Still a lot left to do
Technical aspect
Need to migrate other AWS Regions
Gain more experience
Version upgrades
Continue to adapt our tooling
Add more alarms for capacity issues
Different Regions, different issues ?
Human aspect
Dev team still thinks in the AWS world
( and sometimes OPS too…)
- Ad serving in production since 2015-05
- Bidding traffic in production since 2015-09
- 100% uptime since pre-production (2015-03)
Cost of operation for our current production workload:
- Reduced by a factor of two, including OpEx cost!
Aftermath
Questions?
Pierre Gohon
Pierre Grandin
@pierregohon
@p_grandin

More Related Content

What's hot

GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudGCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudSamuel Chow
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
 
Web後端技術的演變
Web後端技術的演變Web後端技術的演變
Web後端技術的演變inwin stack
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-finalRuslan Meshenberg
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
 
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021StreamNative
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015Arthur Berezin
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11ShapeBlue
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayentaaspyker
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Nagios
 
Ansible and CloudStack
Ansible and CloudStackAnsible and CloudStack
Ansible and CloudStackShapeBlue
 
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...OpenStack
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
 
Fuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsFuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsOpen-IT
 
CloudStack IPv6 in production
CloudStack IPv6 in productionCloudStack IPv6 in production
CloudStack IPv6 in productionShapeBlue
 

What's hot (20)

GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudGCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
Web後端技術的演變
Web後端技術的演變Web後端技術的演變
Web後端技術的演變
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021 Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
Infinitic: Building a Workflow Engine on Top of Pulsar - Pulsar Summit NA 2021
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
How OpenStack is Built - Anton Weiss - OpenStack Day Israel 2016
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Ansible and CloudStack
Ansible and CloudStackAnsible and CloudStack
Ansible and CloudStack
 
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
 
Fuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsFuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next steps
 
CloudStack IPv6 in production
CloudStack IPv6 in productionCloudStack IPv6 in production
CloudStack IPv6 in production
 

Similar to Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle 40 billion requests per day

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackAnimesh Singh
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...VMworld
 
Why Does Modular Middleware Matters
Why Does Modular Middleware MattersWhy Does Modular Middleware Matters
Why Does Modular Middleware MattersWSO2
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
 
Java Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudJava Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudMongoDB
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
Azure Serverless Toolbox
Azure Serverless ToolboxAzure Serverless Toolbox
Azure Serverless ToolboxJohan Eriksson
 
Benchmark of Alibaba Cloud capabilities
Benchmark of Alibaba Cloud capabilitiesBenchmark of Alibaba Cloud capabilities
Benchmark of Alibaba Cloud capabilitiesHuxi LI
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cnOpenCity Community
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosTimothy St. Clair
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Markus Eisele
 

Similar to Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle 40 billion requests per day (20)

HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...
VMworld 2013: Deploying vSphere with OpenStack: What It Means to Your Cloud E...
 
Why Does Modular Middleware Matters
Why Does Modular Middleware MattersWhy Does Modular Middleware Matters
Why Does Modular Middleware Matters
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
Java Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudJava Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the Cloud
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
Azure Serverless Toolbox
Azure Serverless ToolboxAzure Serverless Toolbox
Azure Serverless Toolbox
 
Benchmark of Alibaba Cloud capabilities
Benchmark of Alibaba Cloud capabilitiesBenchmark of Alibaba Cloud capabilities
Benchmark of Alibaba Cloud capabilities
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cn
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle 40 billion requests per day

  • 1. Building a Private Cloud to Efficiently Handle 40 Billion Requests / Day October 28th, 2015 Pierre Gohon | Sr. Site Reliability Engineer | pierre.gohon@tubemogul.com Pierre Grandin | Sr. Site Reliability Engineer | pierre.grandin@tubemogul.com
  • 2. Who are we? TubeMogul (Nasdaq : TUBE) ● Enterprise software company for digital branding ● Over 27 Billion Ads served in 2014 ● Over 40 Billion Ad Auctions per day in Q3 2015 ● Bids processed in less than 50 ms ● Bids served in less than 80 ms (inc. network round trip) ● 5 PB of monthly video traffic served ● 1.6 EB of data stored
  • 3. Who are we? Operations Engineering ● Ensure the smooth day to day operation of the platform infrastructure ● Provide a cost effective and cutting edge infrastructure ● Provide support to dev teams ● Team composed of SREs, SEs and DBAs (US and UA) ● Managing over 2,500 servers (virtual and physical)
  • 4. Our Infrastructure Public Cloud On Premises Multiple locations with a mix of Public Cloud and On Premises
  • 5. ● 6 AWS Regions (us-east*2, us-west*2, europe, apac) ● Physical servers in Michigan / Arizona (Web/Databases) ● DNS served by third party (UltraDNS +Dynect) ● External monitoring using Catchpoint ● CDNs to deliver content ● External security audits We’re not adding complexity! Before Openstack: we’re already very “Hybrid”…
  • 6. Why? ● Own your infrastructure stack ● Physical proximity matters (reduced/controlled latency) ● Better infrastructure planning ● Technological transparency ● … $$ !
  • 8. Where do we stand?
  • 9. ● DIY ? ○ Small OPS team ■ 12 members in two timezones ■ 3 only dedicated to OpenStack ○ New challenges ■ Internal training ■ Little external support (really ?) vs AWS ■ Manage data centers (Servers, Network, …) OpenStack challenges - Operational aspect
  • 10. ● Are applications AWS dependent ? ○ Internal ops tools ○ Developer’s applications ○ AWS S3, DynamoDB, SNS, SQS, SES, SWF ● Convert developers to the project : we need their support ● OpenStack release cycle (when shall we update to latest version?) ● OpenStack really needed components ? ● How far do we go (S3 replacement ? Network control ? Hardware control ?) OpenStack challenges - Application migration aspect
  • 11. ● Managing our own ASN / IPs (v4/v6) ● Choose “best for needs” transit providers (tier 1) ● Better control routes to/from our endpoints ● Allow dedicated AWS connections / others ● Allow direct peerings to ad networks ● Want to be accountable for networking issues ● Cost control How? Networking - External connectivity
  • 12. ● Applications are already designed for redundancy/cloud ● Circumvent virtualized networking limitations ● Fine-tune baremetal nodes for HAProxy ● For the future equipments are “cloud ready” (nexus 5K for top of rack switch) ○ automatic switch configuration ○ cisco software evolutions ? ● 1G for admin, X*10G for public ? ● Leverage multicast ? How? Networking - Hybrid physical / virtualized
  • 13. How? Networking - Hybrid physical / virtualized Network node Compute node Load balancer public network private network using VLANs 1 2 3 2
  • 14. How? Networking - RTT ● Latency from our DC to AWS is 6ms average in US-WEST rtb-bidder01(rtb):~$ mtr -r -c 50 gw01.us-west-1a.public HOST: rtb-bidder01 Loss% Snt Last Avg Best Wrst StDev 1.|-- 10.0.4.1 0.0% 50 0.2 0.2 0.1 0.3 0.0 2.|-- XXX.XXX.XXX.XXX 0.0% 50 0.2 0.3 0.2 2.6 0.3 3.|-- ae-43.r02.snjsca04.us.bb. 0.0% 50 1.4 1.5 1.2 2.3 0.2 4.|-- ae-4.r06.plalca01.us.bb.g 0.0% 50 2.0 2.1 1.8 3.4 0.3 5.|-- ae-1.amazon.plalca01.us.b 0.0% 50 39.2 3.5 1.5 39.2 5.6 6.|-- 205.251.229.40 0.0% 50 3.5 2.8 2.2 4.9 0.6 7.|-- 205.251.230.120 0.0% 50 2.1 2.3 2.0 8.5 0.9 8.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 9.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 10.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 11.|-- 216.182.237.133 0.0% 50 4.0 6.0 2.7 20.2 5.2
  • 15. ● If you are not building a multi-thousand hypervisors cloud, you don’t need it to be complex ● Simplifies day-to-day operations ● Home made puppet catalog ○ because less lines of code ○ because of the learning curve ○ because need to tweak settings (ulimit?) ● No need for horizon ● No need for shared storage How? Keep it simple
  • 16. ● Affinity / anti-affinity rules ○ Enforce resiliency using anti-affinity rules ○ Improve performances using affinity rules How? Leverage your knowledge of your infrastructure {"profile": "OpenStack", "cluster": "rtb-hbase", "hostname": "rtb-hbase-region01", "nagios_host": "mgmt01"}
  • 17. How? Treat your infrastructure as any other engineering project
  • 18. Infrastructure As Code ● Follow standard development lifecycle ● Repeatable and consistent server provisioning Continuous Delivery ● Iterate quickly ● Automated code review to improve code quality Reliability Improve Production Stability Enforce Better Security Practices How? Continuous Delivery
  • 19. ● We already have a lot of automation: ● ~10,000 Puppet deployments last year ● Over 8,500 production deployments via jenkins last year ● On the infrastructure: ○ masterless mode for the deployment ○ master mode once the node is up and running ● On the VMs: ○ Puppet run is triggered by cloud-init, directly at boot ○ from boot to production ready: <5 minutes Puppet see also : http://www.slideshare.net/NicolasBrousse/puppet-camp-paris-2015
  • 20. Infrastructure As Code - Code Review
  • 21. Gerrit, an industry standard : OpenStack, Eclipse, Google, Chromium, WikiMedia, LibreOffice, Spotify, GlusterFS, etc... Fine Grained Permissions Rules Plugged into LDAP Code Review per commit Stream Events Integrated with Jenkins, Jira and Hipchat Managing about 600 Git repositories Infrastructure As Code - Gerrit Integration
  • 22. Infrastructure As Code - Gerrit in Action Automatic verify : -1 if the commit doesn’t pass Jenkins code validation
  • 23. Infrastructure As Code - The Workflow Lab / QA Prod cluster
  • 24. Infrastructure As Code - Continuous Delivery with Jenkins
  • 25. Infrastructure As Code - Team Awareness
  • 26. Infrastructure As Code - Safe upgrade paths Easy as 1-2-3: 1. Test your upgrades using Jenkins 2. Deploy the upgrade by pressing a single button* 3. Enjoy the rest of your day * https://github.com/pgrandin/lcam fig.1 : N. Brousse, Sr. Director of Operation Engineering, switching our production workload to OpenStack
  • 27. Get ready for production : Monitor everything
  • 28. Monitor as much as you can ? ● Existing monitoring (Nagios, Graphite) still in use ● Specific checks for OpenStack ○ check component API : performance / availability / operability ○ check resources : ports, failed instances ● Monitoring capacity metrics for all hardware ● SNMP traps for network equipment ● Monitoring is just an extension of our existing monitoring in AWS
  • 29. Monitoring auto-discovery ● New OpenStack node is automatically monitored ○ automatically / upon request ○ nagios detects new hosts (API query) ○ nagios applies component related check by role ○ graphing is also automatically updated
  • 32. A look in the rearview mirror
  • 33. Benefits - Transparency / visibility Discover new odd/unexpected traffic/activity patterns
  • 34. Benefits - Tailored Instances Before After m3.xlarge + 2GB RAM? m3.2xlarge! # nova flavor-create rtb.collector rtb.collector 17408 8 2
  • 35. Benefits - Operational Transparency AWS OpenStack # cerveza -m noc -- --zone tm-sjc-1a --start demo01 # cerveza -m noc -- --zone us-east-1a --start demo01
  • 37. Benefits - Efficiency 1+ million rx packets/s on only 2 Haproxy Load Balancers, full SSL
  • 38. What does not fit? Downscaling does not really make sense for us cpus are online and paid for, we should use them Upscaling has its limits : AWS is refreshing instance types every year … Sometime a small feature added can have huge load impact. It makes sense to keep the elastic workloads (machine learning, ...) in AWS
  • 39. ● We can be “double hybrids” (aws + openstack + haproxy bare metal) ● Dev environment is needed for Openstack (new versions / break things) ● Storage is still a big issue due to our volume (1.6 EB) ● Some stuff may stay “forever” on AWS ? ● More dev/ops communication ● OpenStack is flexible ● No need for HA everywhere ● Spikes can be offloaded on AWS (cloud bursting) What we’ve learnt
  • 40. Still a lot left to do Technical aspect Need to migrate other AWS Regions Gain more experience Version upgrades Continue to adapt our tooling Add more alarms for capacity issues Different Regions, different issues ? Human aspect Dev team still thinks in the AWS world ( and sometimes OPS too…)
  • 41. - Ad serving in production since 2015-05 - Bidding traffic in production since 2015-09 - 100% uptime since pre-production (2015-03) Cost of operation for our current production workload: - Reduced by a factor of two, including OpEx cost! Aftermath