SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?

Nicolas Brousse
Nicolas BrousseCloud Technology Leader | Director, Operations Engineering at Adobe
Moving Large Workloads from a
Public Cloud to an OpenStack
Private Cloud: Is It Really Worth It?
April 7th, 2016
Nicolas Brousse | Sr. Director Of Operations Engineering | nicolas@tubemogul.com
Who are we?
An enterprise software company for digital branding
● Filtered over 12.6 Trillion Ad Auctions in 2015
● Served over 3 Billion Ad Impressions on linear TV via our PTV
solution
● Process bids in less than 50 ms
● Serve bids in less than 80 ms (includes network round-trip)
● Serve 5 PB of monthly video traffic
Who are we?
A team of Operations Engineers
● Comprised of SREs, SEs and DBAs
● Ensure the smooth day-to-day operation of the platform
infrastructure
● Provide a cost-effective and cutting edge infrastructure
● Manage over 2,500 servers (virtual and physical)
Mixed Infrastructure
Public Cloud On Premises 2016 Private Cloud Deployment
High Level Technical Overview
Bidding Layer
Ad
Serving
- High Volumes
- Low Latency
- Small Packets
- Large Data Sets
- Low Latency
- Fast Processing
- Large Caches
Low Latency User
Database for User
Targeting and Frequency
Capping
● Java (a lot!)
● MySQL (Percona, MariaDB)
● Memcached, Couchbase
● Aerospike, Vertica, Druid
● Kafka
● Storm
● Zookeeper, Exhibitor
● Hadoop, HBase, Hive
● Terracotta
● ElasticSearch, Logstash, Kibana
● Varnish
● PHP, Python, Ruby, Go...
● Apache httpd
● Nagios, Sensu
● Ganglia, Graphite, Grafana
Technology Hoarders
● Puppet
● HAproxy
● OpenStack
● Git and Gerrit
● Gor
● ActiveMQ, RabbitMQ
● OpenLDAP
● Redis
● Blackbox
● Jenkins, Sonar
● RunDeck
● Tomcat, Jetty, Netty
● Qubole
● Snowflake
● AWS DynamoDB, EC2, S3, SWF...
● Our high volume and low latency traffic makes our proximity to
some partners matter.
● Huge datasets used for decisioning require high performance
infrastructure, which costs a lot. Even with reserved capacity.
● Instances’ packet per second limitations lead us to large
public footprint and poor backend performances, especially for
load balancers.
● Network disruptions with no root causes.
Public Cloud: Technical Challenges
Go in-house in 4 locations and 3 continents, in less than 6
months, using a ready-to-go cloud solution and two part-time
engineers. Then, go celebrate in Vegas.
EASY!
Our Strategy
Go in-house in 4 locations and 3 continents, in less than 6
months, using a ready-to-go cloud solution and two part-time
engineers. Then, go celebrate in Vegas.
EASY!
Our Strategy (revised)
3 years
Go in-house in 4 locations and 3 continents, in less than 6
months, using a ready-to-go cloud solution and two part-time
engineers. Then, go celebrate in Vegas.
EASY!
Our Strategy (revised)
3 years
OpenStack and Bare Metal
Go in-house in 4 locations and 3 continents, in less than 6
months, using a ready to go cloud solution and two part-time
engineers. Then, go celebrate in Vegas.
EASY!
Our Strategy (revised)
3 years
OpenStack and Bare Metal 3 dedicated
Go in-house in 4 locations and 3 continents, in less than 6
months, using a ready to go cloud solution and two part-time
engineers. Then, go celebrate in Vegas.
EASY!
Our Strategy (revised)
3 years
OpenStack and Bare Metal
keep it up and running
3 dedicated
YEAH!
● Which infrastructure is being moved?
● How do you compare apples to apples?
● Do you plan to overcommit?
● Do you cost properly for engineering resources, software
maintenances, and various support?
● Does your design make trade-offs on High Availability?
● Do you plan for a test environment and R&D?
● Are you using your public cloud at its best?
● Are you building a public cloud or optimizing your environment?
● Do you plan for growth and how does it impact your cost models?
● Which locations are you deploying to? What is the impact on
bandwidth and data center costs?
TCO analysis: what to consider?
● Be Fair: Challenge your Public Cloud partners
and share your TCO with them.
● Keep It Simple: Limit the scope of your TCO to
a clear and well known subset.
● Make Clear Assumptions: Have a defined list
of feature sets on what you are building.
Three simple TCO rules
We built a test environment with
Eucalyptus to move our integration
environment on it.
How did we start?
We built a test lab environment with a
vendor using CloudStack to move our
integration environment on it.
How did we start (again)?
We built a test lab environment and first
data center location ourselves using
OpenStack on Gentoo and shared the
lab for our software engineers’ integration
environment.
How did we start (really)?
● Do not share your lab: Your lab is meant to fail and be
destroyed. Don’t assume people will be OK to work with
something unreliable.
● Don’t mess up your block storage strategy. No last minute
changes.
● Starting a first data center location may require a lot of
paperwork time, executive approval, and hardware mistakes.
Plan ahead.
● OpenStack is complex. Don’t make it more complex.
First Failures and Lessons Learned
We (re)built our lab and prod environment
with a vendor using OpenStack on
Ubuntu to move our QA environment into
one region.
How did we reboot?
We (re)built our lab and prod environment
ourselves by using OpenStack on
Ubuntu to move our production
environment into one region.
How did we reboot (again)?
In Production...
● First traffic switch went smoothly and allowed us to decrease our
footprint by 40% and our load balancer footprint by 95%.
● Progressive traffic migration is not easy. Consider the impact of
multiple environments to maintain and all application dependencies.
● Load Balancers, and core data services run on bare metal and
leverage VLANs.
● Fully automated bare metal and OpenStack provisioning.
● We are deploying three new on-premise locations in Q2 2016.
● Limit scope to our high volume and low latency infrastructure.
● OpenStack requires a long learning curve and design phase.
Account for it, in terms of cost, skill-set, and time.
● We are not building a Public Cloud. Be very clear on your
feature set and business case.
● You don’t know your application as well as you think. Be ready to
adapt quickly and don’t overlook the impact of network traffic
that switches from private to public.
● Really, you don’t know your application as well as you think. Be
ready to deal with ip conntrack table full.
Lessons Learned
● Moving in-house led to an estimated 30% cost savings and
reduced our server footprint.
● The improved visibility on our network traffic and our full
application stack greatly helped for troubleshooting and
performance improvements.
● Have a strong technical need for it. Cost shouldn’t be the
only driving factor.
So, is it worth it?
Nicolas Brousse @orieg
1 of 24

Recommended

Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ... by
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...Pierre GRANDIN
1.9K views43 slides
SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuite by
SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuiteSuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuite
SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuiteNicolas Brousse
659 views22 slides
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month by
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
12.8K views31 slides
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud by
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudGCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the CloudSamuel Chow
536 views16 slides
Netflix Open Source Meetup Season 4 Episode 2 by
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
19.4K views77 slides
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons by
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
1.2K views66 slides

More Related Content

What's hot

Scaling Monitoring At Databricks From Prometheus to M3 by
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
294 views36 slides
Netflix Open Source Meetup Season 4 Episode 1 by
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
23.4K views41 slides
Netflix Open Source Meetup Season 3 Episode 2 by
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
15.4K views51 slides
NetflixOSS Meetup S6E1 - Titus & Containers by
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
2.4K views80 slides
Netflix oss season 2 episode 1 - meetup Lightning talks by
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
107.5K views108 slides
Season 7 Episode 1 - Tools for Data Scientists by
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
445 views64 slides

What's hot(20)

Scaling Monitoring At Databricks From Prometheus to M3 by LibbySchulze
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
LibbySchulze294 views
Netflix Open Source Meetup Season 4 Episode 1 by aspyker
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
aspyker23.4K views
Netflix Open Source Meetup Season 3 Episode 2 by aspyker
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
aspyker15.4K views
NetflixOSS Meetup S6E1 - Titus & Containers by aspyker
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
aspyker2.4K views
Netflix oss season 2 episode 1 - meetup Lightning talks by Ruslan Meshenberg
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg107.5K views
Season 7 Episode 1 - Tools for Data Scientists by aspyker
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
aspyker445 views
Five years of operating a large scale globally replicated Pulsar installation... by StreamNative
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...
StreamNative1.3K views
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta by aspyker
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
aspyker1.6K views
Why observability matters - now and in the future (w/guest Grafana) by Weaveworks
Why observability matters - now and in the future (w/guest Grafana)Why observability matters - now and in the future (w/guest Grafana)
Why observability matters - now and in the future (w/guest Grafana)
Weaveworks1.7K views
An approach for migrating enterprise apps into open stack by Arthur Berezin
An approach for migrating enterprise apps into open stackAn approach for migrating enterprise apps into open stack
An approach for migrating enterprise apps into open stack
Arthur Berezin960 views
Order from chaos: automating monitoring configuration by Sensu Inc.
Order from chaos: automating monitoring configurationOrder from chaos: automating monitoring configuration
Order from chaos: automating monitoring configuration
Sensu Inc.230 views
Pull, Don't Push! Sensu Summit 2018 Talk by Julian Dunn
Pull, Don't Push! Sensu Summit 2018 TalkPull, Don't Push! Sensu Summit 2018 Talk
Pull, Don't Push! Sensu Summit 2018 Talk
Julian Dunn138 views
Netflix and Containers: Not A Stranger Thing by aspyker
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
aspyker2.3K views
Dev309 from asgard to zuul - netflix oss-final by Ruslan Meshenberg
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac... by InfluxData
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
Spacecrafts Made Simple: How Loft Orbital Delivers Unparalleled Speed-to-Spac...
InfluxData1.1K views
20140708 - Jeremy Edberg: How Netflix Delivers Software by DevOps Chicago
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
DevOps Chicago17.6K views
Success With OpenStack in Production - Frank Weyns - Openstack Day Israel 2016 by Cloud Native Day Tel Aviv
Success With OpenStack in Production - Frank Weyns - Openstack Day Israel 2016Success With OpenStack in Production - Frank Weyns - Openstack Day Israel 2016
Success With OpenStack in Production - Frank Weyns - Openstack Day Israel 2016
Triangle Devops Meetup 10/2015 by aspyker
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker1.1K views
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G... by GetInData
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData1K views

Similar to SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?

Rally--OpenStack Benchmarking at Scale by
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleMirantis
8.9K views32 slides
Running OpenStack in Production by
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in ProductionTesora
358 views33 slides
HPC on OpenStack by
HPC on OpenStackHPC on OpenStack
HPC on OpenStackErich Birngruber
191 views29 slides
OpenStack Ottawa Q2 MeetUp - May 31st 2017 by
OpenStack Ottawa Q2 MeetUp - May 31st 2017OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017Stacy Véronneau
289 views41 slides
Transforming to OpenStack: a sample roadmap to DevOps by
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsNicolas (Nick) Barcet
5.6K views31 slides
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds by
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsGene Kim
1.3K views57 slides

Similar to SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?(20)

Rally--OpenStack Benchmarking at Scale by Mirantis
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
Mirantis8.9K views
Running OpenStack in Production by Tesora
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in Production
Tesora358 views
OpenStack Ottawa Q2 MeetUp - May 31st 2017 by Stacy Véronneau
OpenStack Ottawa Q2 MeetUp - May 31st 2017OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017
Stacy Véronneau289 views
Transforming to OpenStack: a sample roadmap to DevOps by Nicolas (Nick) Barcet
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds by Gene Kim
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
Gene Kim1.3K views
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda... by DevOps Enterprise Summmit
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat... by Sungjin Kang
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
Sungjin Kang1.4K views
TOWARDS Hybrid OpenStack Clouds in the Real World by Andrew Hickey
TOWARDS Hybrid OpenStack Clouds in the Real WorldTOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real World
Andrew Hickey2.6K views
Data Science in the Cloud @StitchFix by C4Media
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media952 views
How to Migrate Applications Off a Mainframe by VMware Tanzu
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
VMware Tanzu6.2K views
openstackreferencearchitecturewhitepaper by Richard Haigh
openstackreferencearchitecturewhitepaperopenstackreferencearchitecturewhitepaper
openstackreferencearchitecturewhitepaper
Richard Haigh406 views
From prototype to production - The journey of re-designing SmartUp.io by Máté Lang
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
Máté Lang118 views
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ... by OpenNebula Project
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebula Project2.1K views
Are enterprises ready for the OpenStack transformation by Nicolas (Nick) Barcet
Are enterprises ready for the OpenStack transformationAre enterprises ready for the OpenStack transformation
Are enterprises ready for the OpenStack transformation
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT by OpenStack
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
OpenStack2K views
OpenStack Toronto Q2 MeetUp - June 1st 2017 by Stacy Véronneau
OpenStack Toronto Q2 MeetUp - June 1st 2017OpenStack Toronto Q2 MeetUp - June 1st 2017
OpenStack Toronto Q2 MeetUp - June 1st 2017
Stacy Véronneau219 views
Cloud comparison - AWS vs Azure vs Google by Patrick Pierson
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson1.8K views
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac... by Bharat Paliwal
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
[CON6985]Expanding DBaaS Beyond Data Centers Hybrid Cloud Onboarding via Orac...
Bharat Paliwal292 views
Denver Cloud Foundry Meetup - February 2016 by Josh Ghiloni
Denver Cloud Foundry Meetup - February 2016Denver Cloud Foundry Meetup - February 2016
Denver Cloud Foundry Meetup - February 2016
Josh Ghiloni535 views

More from Nicolas Brousse

<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente... by
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...Nicolas Brousse
1.4K views16 slides
Improving Adobe Experience Cloud Services Dependability with Machine Learning by
Improving Adobe Experience Cloud Services Dependability with Machine LearningImproving Adobe Experience Cloud Services Dependability with Machine Learning
Improving Adobe Experience Cloud Services Dependability with Machine LearningNicolas Brousse
1.5K views18 slides
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o... by
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...Nicolas Brousse
430 views21 slides
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ... by
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...Nicolas Brousse
347 views23 slides
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack by
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackNicolas Brousse
1.6K views24 slides
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme... by
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Nicolas Brousse
2.4K views32 slides

More from Nicolas Brousse(11)

<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente... by Nicolas Brousse
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...
<Programming> 2019 - ICW'19: The Issue of Monorepo and Polyrepo In Large Ente...
Nicolas Brousse1.4K views
Improving Adobe Experience Cloud Services Dependability with Machine Learning by Nicolas Brousse
Improving Adobe Experience Cloud Services Dependability with Machine LearningImproving Adobe Experience Cloud Services Dependability with Machine Learning
Improving Adobe Experience Cloud Services Dependability with Machine Learning
Nicolas Brousse1.5K views
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o... by Nicolas Brousse
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...
IEEE ISSRE 2018 - Use of Self-Healing Techniques to Improve the Reliability o...
Nicolas Brousse430 views
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ... by Nicolas Brousse
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
Nicolas Brousse347 views
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack by Nicolas Brousse
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Nicolas Brousse1.6K views
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme... by Nicolas Brousse
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployme...
Nicolas Brousse2.4K views
Improving Operations Efficiency with Puppet by Nicolas Brousse
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
Nicolas Brousse3.9K views
Scaling Bleeding Edge Technology in a Fast-paced Environment by Nicolas Brousse
Scaling Bleeding Edge Technology in a Fast-paced EnvironmentScaling Bleeding Edge Technology in a Fast-paced Environment
Scaling Bleeding Edge Technology in a Fast-paced Environment
Nicolas Brousse799 views
Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper) by Nicolas Brousse
Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper)Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper)
Scaling on EC2 in a fast-paced environment (LISA'11 - Full Paper)
Nicolas Brousse1.6K views
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2... by Nicolas Brousse
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Bringing Business Awareness to Your Operation Team (Nagios World Conference 2...
Nicolas Brousse846 views
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con... by Nicolas Brousse
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Optimizing your Monitoring and Trending tools for the Cloud (Nagios World Con...
Nicolas Brousse997 views

Recently uploaded

MongoDB.pdf by
MongoDB.pdfMongoDB.pdf
MongoDB.pdfArthyR3
45 views6 slides
Proposal Presentation.pptx by
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptxkeytonallamon
52 views36 slides
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...csegroupvn
5 views210 slides
DESIGN OF SPRINGS-UNIT4.pptx by
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
19 views47 slides
SPICE PARK DEC2023 (6,625 SPICE Models) by
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models) Tsuyoshi Horigome
33 views218 slides
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Tsuyoshi Horigome
38 views16 slides

Recently uploaded(20)

MongoDB.pdf by ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR345 views
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn5 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra816 views
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 6 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli10 views
SUMIT SQL PROJECT SUPERSTORE 1.pptx by Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 18 views
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla92 views
Ansari: Practical experiences with an LLM-based Islamic Assistant by M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous5 views
REACTJS.pdf by ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR334 views
_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 views
Generative AI Models & Their Applications by SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN10 views

SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?

  • 1. Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It? April 7th, 2016 Nicolas Brousse | Sr. Director Of Operations Engineering | nicolas@tubemogul.com
  • 2. Who are we? An enterprise software company for digital branding ● Filtered over 12.6 Trillion Ad Auctions in 2015 ● Served over 3 Billion Ad Impressions on linear TV via our PTV solution ● Process bids in less than 50 ms ● Serve bids in less than 80 ms (includes network round-trip) ● Serve 5 PB of monthly video traffic
  • 3. Who are we? A team of Operations Engineers ● Comprised of SREs, SEs and DBAs ● Ensure the smooth day-to-day operation of the platform infrastructure ● Provide a cost-effective and cutting edge infrastructure ● Manage over 2,500 servers (virtual and physical)
  • 4. Mixed Infrastructure Public Cloud On Premises 2016 Private Cloud Deployment
  • 5. High Level Technical Overview Bidding Layer Ad Serving - High Volumes - Low Latency - Small Packets - Large Data Sets - Low Latency - Fast Processing - Large Caches Low Latency User Database for User Targeting and Frequency Capping
  • 6. ● Java (a lot!) ● MySQL (Percona, MariaDB) ● Memcached, Couchbase ● Aerospike, Vertica, Druid ● Kafka ● Storm ● Zookeeper, Exhibitor ● Hadoop, HBase, Hive ● Terracotta ● ElasticSearch, Logstash, Kibana ● Varnish ● PHP, Python, Ruby, Go... ● Apache httpd ● Nagios, Sensu ● Ganglia, Graphite, Grafana Technology Hoarders ● Puppet ● HAproxy ● OpenStack ● Git and Gerrit ● Gor ● ActiveMQ, RabbitMQ ● OpenLDAP ● Redis ● Blackbox ● Jenkins, Sonar ● RunDeck ● Tomcat, Jetty, Netty ● Qubole ● Snowflake ● AWS DynamoDB, EC2, S3, SWF...
  • 7. ● Our high volume and low latency traffic makes our proximity to some partners matter. ● Huge datasets used for decisioning require high performance infrastructure, which costs a lot. Even with reserved capacity. ● Instances’ packet per second limitations lead us to large public footprint and poor backend performances, especially for load balancers. ● Network disruptions with no root causes. Public Cloud: Technical Challenges
  • 8. Go in-house in 4 locations and 3 continents, in less than 6 months, using a ready-to-go cloud solution and two part-time engineers. Then, go celebrate in Vegas. EASY! Our Strategy
  • 9. Go in-house in 4 locations and 3 continents, in less than 6 months, using a ready-to-go cloud solution and two part-time engineers. Then, go celebrate in Vegas. EASY! Our Strategy (revised) 3 years
  • 10. Go in-house in 4 locations and 3 continents, in less than 6 months, using a ready-to-go cloud solution and two part-time engineers. Then, go celebrate in Vegas. EASY! Our Strategy (revised) 3 years OpenStack and Bare Metal
  • 11. Go in-house in 4 locations and 3 continents, in less than 6 months, using a ready to go cloud solution and two part-time engineers. Then, go celebrate in Vegas. EASY! Our Strategy (revised) 3 years OpenStack and Bare Metal 3 dedicated
  • 12. Go in-house in 4 locations and 3 continents, in less than 6 months, using a ready to go cloud solution and two part-time engineers. Then, go celebrate in Vegas. EASY! Our Strategy (revised) 3 years OpenStack and Bare Metal keep it up and running 3 dedicated YEAH!
  • 13. ● Which infrastructure is being moved? ● How do you compare apples to apples? ● Do you plan to overcommit? ● Do you cost properly for engineering resources, software maintenances, and various support? ● Does your design make trade-offs on High Availability? ● Do you plan for a test environment and R&D? ● Are you using your public cloud at its best? ● Are you building a public cloud or optimizing your environment? ● Do you plan for growth and how does it impact your cost models? ● Which locations are you deploying to? What is the impact on bandwidth and data center costs? TCO analysis: what to consider?
  • 14. ● Be Fair: Challenge your Public Cloud partners and share your TCO with them. ● Keep It Simple: Limit the scope of your TCO to a clear and well known subset. ● Make Clear Assumptions: Have a defined list of feature sets on what you are building. Three simple TCO rules
  • 15. We built a test environment with Eucalyptus to move our integration environment on it. How did we start?
  • 16. We built a test lab environment with a vendor using CloudStack to move our integration environment on it. How did we start (again)?
  • 17. We built a test lab environment and first data center location ourselves using OpenStack on Gentoo and shared the lab for our software engineers’ integration environment. How did we start (really)?
  • 18. ● Do not share your lab: Your lab is meant to fail and be destroyed. Don’t assume people will be OK to work with something unreliable. ● Don’t mess up your block storage strategy. No last minute changes. ● Starting a first data center location may require a lot of paperwork time, executive approval, and hardware mistakes. Plan ahead. ● OpenStack is complex. Don’t make it more complex. First Failures and Lessons Learned
  • 19. We (re)built our lab and prod environment with a vendor using OpenStack on Ubuntu to move our QA environment into one region. How did we reboot?
  • 20. We (re)built our lab and prod environment ourselves by using OpenStack on Ubuntu to move our production environment into one region. How did we reboot (again)?
  • 21. In Production... ● First traffic switch went smoothly and allowed us to decrease our footprint by 40% and our load balancer footprint by 95%. ● Progressive traffic migration is not easy. Consider the impact of multiple environments to maintain and all application dependencies. ● Load Balancers, and core data services run on bare metal and leverage VLANs. ● Fully automated bare metal and OpenStack provisioning. ● We are deploying three new on-premise locations in Q2 2016. ● Limit scope to our high volume and low latency infrastructure.
  • 22. ● OpenStack requires a long learning curve and design phase. Account for it, in terms of cost, skill-set, and time. ● We are not building a Public Cloud. Be very clear on your feature set and business case. ● You don’t know your application as well as you think. Be ready to adapt quickly and don’t overlook the impact of network traffic that switches from private to public. ● Really, you don’t know your application as well as you think. Be ready to deal with ip conntrack table full. Lessons Learned
  • 23. ● Moving in-house led to an estimated 30% cost savings and reduced our server footprint. ● The improved visibility on our network traffic and our full application stack greatly helped for troubleshooting and performance improvements. ● Have a strong technical need for it. Cost shouldn’t be the only driving factor. So, is it worth it?