SlideShare a Scribd company logo
1 of 39
Download to read offline
| Confidential
| Confidential
DevOps at tradeshift
A small history on how stuff evolved the last few years (2015-2019)
Jesper Terkelsen, VP Global Platform Operations
jnt@tradeshift.com
| Confidential
What is Tradeshift?
| Confidential4
= Office locations
= Supplier density
Delivering an end-to-end supply-chain buying
experience in the cloud everywhere in the world
Covering five continents and more than 190
countries.
Tradeshift is the global business
commerce company
| Confidential
Tradeshift by the numbers
5
Real network of
1,500,000
companies
connected on
platform $500B in
transaction
volume500+ Enterprise
P2P Customers all
migrating to
next gen cloud
platform
30M SKU’s on
Tradeshift
Marketplace and
procurement
solution
1000 Employees
42 Nationalities
Global
Presence
US, Europe,
Apac
| Confidential
What is Tradeshift?
From a DevOps perspective
| Confidential
What is DevOps in Tradeshift?
7
• Delivering global, always-on SAAS to Enterprise customers that are used
to on-prem or managed hosting.
• With a rapidly expanding feature footprint
• In a rapidly growing organization
• To a rapidly growing user base
• With a rapidly growing usage
• While enabling 3rd parties to build on top of the platform
• … and not grow cost at the same rate ...
| Confidential
What happened in 2016?
All in on: “infrastructure as code”
| Confidential
2015 and earlier
9
• Organization growth was mostly in engineering not in operations.
• Lots of focus on features in the product, scale less of a concern
• About 70 engineers
• Small DevOps team
• 3 FTE in Copenhagen
• 1 FTE in San francisco
• Partially automated infrastructure
San Francisco
DevOps
Copenhagen
DevOps
| Confidential
Challenges in late 2015
• Lots of new engineering teams
wants
• a new microservice to
production today.
• a new version of a component
out today.
• Demo environments were a
sparse resource
(answer was we will work for 4 month on this
release)
10
Engineering Growth Other
• We needed to get a lot of new
certifications to be able to
operate in the market.
• Enable engineering to operate
independently from operations
| Confidential
Challenges in late 2015
• We expected to grow
engineering with 200%
headcount
• To around 300 people
• We were about to expand into
China.
11
Forecast was amazing
• We expected to grow the # of
instances with 2-3 orders of
magnitude
• Clustering for availability 2-3 x
• Introducing new services 3-5 x.
• Hosting in more data centers. 4-5 x
• Scaling up storage 5-10 x
• We did not plan to grow
operations headcount with the
same amount
| Confidential
We wanted to
• Migrate all services to docker. about 15 at the time.
• Introduce service discovery and clustering.
• Introduce zero downtime deployments.
• Migrate all hosts to VPCs.
• Rewrite all of the automation code.
• Introduce tests for all infrastructure automation code.
• Upgrade OS versions.
• Improve monitoring.
• While the system was running - since it's a cloud service.
12
| Confidential
First let's agree on some values
We strived towards the following values
● Never disrupt service.
● Speed matters.
● Self-service and horizontal ownership over “throw it over the wall”.
● Homogeneous operations over heterogeneous / many unique solutions.
● Code as documentation.
● Testability over one-off scripts.
● Everything is peer-reviewed.
● Clear ownership.
This made our design choices easier to argue about
13
| Confidential
1. Foundation
We started by building the foundation - base infrastructure for automation.
• Own internal certificate authority (CA)
• End2End encryption
• LDAP for authentication
• Puppet servers
• Private code registries
• Terraform - infrastructure templating
• And a current Ubuntu base role
14
| Confidential
2. Design a global network
• Build a internal CIDR ip allocation
scheme for all possible future
data centers.
• Support for
• Site to site network VPNs
• Public encrypted channels based
on TLS with mutual
authentication
• Use both private and public
subnets
15
| Confidential
3. Migrate databases
• SQL and NoSQL databases
• Use AWS Classic links for network connectivity
• For SQL databases we had to upgrade the version first.
• Then did a live streaming replication to a new slave. And then promoted that
as master.
• 10 Postgresql databases where the largest one was about 1 TB
• For NoSql databases we - migrated one node at the time
• 300TB Elasticsearch cluster
• 1.5 PB Riak cluster
16
| Confidential
4. Migrate services
• We added service discovery and routing
• Better leader election
• Put everything in docker
• Services were migrated.
• Clusterable services was migrated during uptime.
• Non clustered services was migrated during maintenance window in the
weekend.
• Load balancers was migrated
• We had to announce IP changes to customers, because AWS does not run the
same subnets for classic and VPC
17
| Confidential
5. Replace build pipeline
• Use same infrastructure automation as for production
• QA methodology in Tradeshift relies heavily on automated tests - and
every team has to own their own tests - no “throwing over the wall”
• Runs our 700+ UI end to end flows
• Introduce consumer driven tests
• Push more tests upstream
18
| Confidential
6. Replace demo sandboxes
• Make sandboxes and demo stacks - on demand.
• Tooling for sizing, scope, clustering, public/private
• No more fighting over who can use an environment
• Only runs as long as teams need them
• Hours -> Months
• Automate data creation
• Useful for demo storyboards
• Useful for performance tests.
• Useful for automated tests
• Promotes data generation
19
| Confidential
7. Optimize cost a bit
20
• Roughly 20 environments are created/destroyed daily
• Spot instances are used for both temp stacks, and build slaves + data
processing slaves.
| Confidential
8. Container adoption
21
• “Puppet and Terraform is fine, but i need my new docker image deployed
in hours not days, since i only used a few hours scaffolding it up +
coding the microservice” - ML team member
• In 2018 we rolled out Kubernetes in all test environments and production
currently running 30% of our services as flexible containers
• This can be challenging for our values
• homogeneous systems? Managed services mixed with K8
• PCI compliance?
• Very good for infra as code with helm
| Confidential| Confidential
Why automate?
22
4x growth in
8 months.
| Confidential
Why do we automate?
23
Infrastructure as code
• DevOps work should be like development work
• “Test driven operations”
• Treat infrastructure code as regular code
• Write tests for the puppet code and configuration.
• Have a code reviews within the team.
Benefits
● Recover faster from incidents.
● Fewer people can manage more servers. (5 people 5000+ servers)
● Less human error
● More transparency into what is on the servers.
| Confidential
Ability to scale fast
24
● We grew the number of versioned services from ~15 to ~120
● We increased our rate of deployment to prod
a. From once a week to 15 times a day
b. This distributes risk and shrinks potential blast radius
● We now have 6000 unit tests for puppet written in Ruby, which is about
60% code coverage.
a. This allows us to change the puppet code faster and with way more
confidence.
● Number of virtual machines is now above 5000+ across all environments,
and varies a lot from day to day.
● With containers accelerated the commit -> production delay (for new
services) even further
| Confidential
Self service engineering
• All engineering teams in tradeshift can write automation code
and test that on our AWS test account.
• Introducing a new service in prod is only a code review
exercise from operations
• Releasing new versions in production is fully automated
• The productivity teams maintains the tools
• Engineers are granted access to logs and error collection tools
which allows teams to always show metrics near their desks
25
| Confidential
Automated Security
26
“We don't really need an army of people patching servers, or following
human compliance processes, if we can automate the whole thing”
We can then focus human time on improving actual security
| Confidential
AWS China rollout
• 1 Full production
environment
• 5125 new lines of terraform
template code
• 1097 new lines of hiera
configuration
• In 120 pull requests
27
We provisioned Using
• A team of 5 people
• In 7 working days
• 40k lines of existing puppet
code
• Similar amount of existing
terraform code
| Confidential| Confidential
The power of automation
28
| Confidential
2018 Dealing with growth
Aka MORE!!!
| Confidential
More Teams
30
| Confidential
More Hosts / Static containers
31
| Confidential
More dynamic containers
32
| Confidential
More Data
33
| Confidential34
| Confidential
Tradeshift 2019
| Confidential
Current tech scale
36
~5000
VM’s on AWS
(daily average,
across all envs) ~15
Releases/D
ay~120 Services
45 running
in K8
(1-4 added per
month)
2-3m daily business
transactions
58
Developer
Teams
Hosting in
US, Europe,
China
| Confidential
Operations/Productivity Teams 2019
37
SREToolchain
Developer Productivity Site Reliability Engineering
Platform Infrastructure
Stacks
SRE
Compute
Containers
Storage
Roots Data
Data
BackendApp Frameworks
Dev Support
TBH
SRE - Århus
Compute China
| Confidential
The future
• Even larger engineering organization
• We are currently 350 in engineering - 58 developer teams
• We expect to grow engineering 100%-150% in 2019 and more in 2020
• Even more automation
• Immutable infrastructure
• In product testing: canary deploys, red-green deploys, improved A/B and feature testing.
• Even more often deployments
• We do roughly 10-15 deploys a day today, we want this to grow 10x
• More security certifications
• 20x more microservices
We are looking to consumer global scale SAAS for inspiration (Google, Uber,
LinkedIn, Facebook, Twitter, etc) - for processes as well as technology.
38
| Confidential
Questions

More Related Content

What's hot

10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the CloudNGINX, Inc.
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1aspyker
 
Container World 2018
Container World 2018Container World 2018
Container World 2018aspyker
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-finalRuslan Meshenberg
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSAWS Vietnam Community
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management toolsRan Silberman
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
 
How to Adopt Infrastructure as Code
How to Adopt Infrastructure as CodeHow to Adopt Infrastructure as Code
How to Adopt Infrastructure as CodeNGINX, Inc.
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Meetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSMeetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSAWS Vietnam Community
 
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...StreamNative
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureAWS Vietnam Community
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 

What's hot (20)

10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
 
Container World 2018
Container World 2018Container World 2018
Container World 2018
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWS
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management tools
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
How to Adopt Infrastructure as Code
How to Adopt Infrastructure as CodeHow to Adopt Infrastructure as Code
How to Adopt Infrastructure as Code
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Meetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSMeetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWS
 
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoTZephyr: Creating a Best-of-Breed, Secure RTOS for IoT
Zephyr: Creating a Best-of-Breed, Secure RTOS for IoT
 
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless Architecture
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 

Similar to DevOps at Tradeshift - AWS community day nordics

Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyBoyan Dimitrov
 
Integration in the Cloud, by Rob Davies
Integration in the Cloud, by Rob DaviesIntegration in the Cloud, by Rob Davies
Integration in the Cloud, by Rob DaviesJudy Breedlove
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWSShaun Pearce
 
DevOpsDays Galway 2017 - Skypilot Project
DevOpsDays Galway 2017 - Skypilot ProjectDevOpsDays Galway 2017 - Skypilot Project
DevOpsDays Galway 2017 - Skypilot ProjectThomas Shaw
 
Google Cloud Fundamentals by CloudZone
Google Cloud Fundamentals by CloudZoneGoogle Cloud Fundamentals by CloudZone
Google Cloud Fundamentals by CloudZoneIdan Tohami
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Codemotion
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Your easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processingYour easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processinggvernik
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service FabricDavide Benvegnù
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20Tyrone Systems
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSPuppet
 
Integration in the Cloud
Integration in the CloudIntegration in the Cloud
Integration in the CloudRob Davies
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOpsDelphix
 

Similar to DevOps at Tradeshift - AWS community day nordics (20)

Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 
Integration in the Cloud, by Rob Davies
Integration in the Cloud, by Rob DaviesIntegration in the Cloud, by Rob Davies
Integration in the Cloud, by Rob Davies
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
 
Enterprise Journey to the Cloud
Enterprise Journey to the CloudEnterprise Journey to the Cloud
Enterprise Journey to the Cloud
 
WHISHWORKS-MuleSoft Hyderabad Meetup -April 2019
WHISHWORKS-MuleSoft Hyderabad Meetup -April 2019WHISHWORKS-MuleSoft Hyderabad Meetup -April 2019
WHISHWORKS-MuleSoft Hyderabad Meetup -April 2019
 
DevOpsDays Galway 2017 - Skypilot Project
DevOpsDays Galway 2017 - Skypilot ProjectDevOpsDays Galway 2017 - Skypilot Project
DevOpsDays Galway 2017 - Skypilot Project
 
Google Cloud Fundamentals by CloudZone
Google Cloud Fundamentals by CloudZoneGoogle Cloud Fundamentals by CloudZone
Google Cloud Fundamentals by CloudZone
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Your easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processingYour easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processing
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
 
Un-clouding the cloud
Un-clouding the cloudUn-clouding the cloud
Un-clouding the cloud
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Integration in the Cloud
Integration in the CloudIntegration in the Cloud
Integration in the Cloud
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

DevOps at Tradeshift - AWS community day nordics

  • 2. | Confidential DevOps at tradeshift A small history on how stuff evolved the last few years (2015-2019) Jesper Terkelsen, VP Global Platform Operations jnt@tradeshift.com
  • 4. | Confidential4 = Office locations = Supplier density Delivering an end-to-end supply-chain buying experience in the cloud everywhere in the world Covering five continents and more than 190 countries. Tradeshift is the global business commerce company
  • 5. | Confidential Tradeshift by the numbers 5 Real network of 1,500,000 companies connected on platform $500B in transaction volume500+ Enterprise P2P Customers all migrating to next gen cloud platform 30M SKU’s on Tradeshift Marketplace and procurement solution 1000 Employees 42 Nationalities Global Presence US, Europe, Apac
  • 6. | Confidential What is Tradeshift? From a DevOps perspective
  • 7. | Confidential What is DevOps in Tradeshift? 7 • Delivering global, always-on SAAS to Enterprise customers that are used to on-prem or managed hosting. • With a rapidly expanding feature footprint • In a rapidly growing organization • To a rapidly growing user base • With a rapidly growing usage • While enabling 3rd parties to build on top of the platform • … and not grow cost at the same rate ...
  • 8. | Confidential What happened in 2016? All in on: “infrastructure as code”
  • 9. | Confidential 2015 and earlier 9 • Organization growth was mostly in engineering not in operations. • Lots of focus on features in the product, scale less of a concern • About 70 engineers • Small DevOps team • 3 FTE in Copenhagen • 1 FTE in San francisco • Partially automated infrastructure San Francisco DevOps Copenhagen DevOps
  • 10. | Confidential Challenges in late 2015 • Lots of new engineering teams wants • a new microservice to production today. • a new version of a component out today. • Demo environments were a sparse resource (answer was we will work for 4 month on this release) 10 Engineering Growth Other • We needed to get a lot of new certifications to be able to operate in the market. • Enable engineering to operate independently from operations
  • 11. | Confidential Challenges in late 2015 • We expected to grow engineering with 200% headcount • To around 300 people • We were about to expand into China. 11 Forecast was amazing • We expected to grow the # of instances with 2-3 orders of magnitude • Clustering for availability 2-3 x • Introducing new services 3-5 x. • Hosting in more data centers. 4-5 x • Scaling up storage 5-10 x • We did not plan to grow operations headcount with the same amount
  • 12. | Confidential We wanted to • Migrate all services to docker. about 15 at the time. • Introduce service discovery and clustering. • Introduce zero downtime deployments. • Migrate all hosts to VPCs. • Rewrite all of the automation code. • Introduce tests for all infrastructure automation code. • Upgrade OS versions. • Improve monitoring. • While the system was running - since it's a cloud service. 12
  • 13. | Confidential First let's agree on some values We strived towards the following values ● Never disrupt service. ● Speed matters. ● Self-service and horizontal ownership over “throw it over the wall”. ● Homogeneous operations over heterogeneous / many unique solutions. ● Code as documentation. ● Testability over one-off scripts. ● Everything is peer-reviewed. ● Clear ownership. This made our design choices easier to argue about 13
  • 14. | Confidential 1. Foundation We started by building the foundation - base infrastructure for automation. • Own internal certificate authority (CA) • End2End encryption • LDAP for authentication • Puppet servers • Private code registries • Terraform - infrastructure templating • And a current Ubuntu base role 14
  • 15. | Confidential 2. Design a global network • Build a internal CIDR ip allocation scheme for all possible future data centers. • Support for • Site to site network VPNs • Public encrypted channels based on TLS with mutual authentication • Use both private and public subnets 15
  • 16. | Confidential 3. Migrate databases • SQL and NoSQL databases • Use AWS Classic links for network connectivity • For SQL databases we had to upgrade the version first. • Then did a live streaming replication to a new slave. And then promoted that as master. • 10 Postgresql databases where the largest one was about 1 TB • For NoSql databases we - migrated one node at the time • 300TB Elasticsearch cluster • 1.5 PB Riak cluster 16
  • 17. | Confidential 4. Migrate services • We added service discovery and routing • Better leader election • Put everything in docker • Services were migrated. • Clusterable services was migrated during uptime. • Non clustered services was migrated during maintenance window in the weekend. • Load balancers was migrated • We had to announce IP changes to customers, because AWS does not run the same subnets for classic and VPC 17
  • 18. | Confidential 5. Replace build pipeline • Use same infrastructure automation as for production • QA methodology in Tradeshift relies heavily on automated tests - and every team has to own their own tests - no “throwing over the wall” • Runs our 700+ UI end to end flows • Introduce consumer driven tests • Push more tests upstream 18
  • 19. | Confidential 6. Replace demo sandboxes • Make sandboxes and demo stacks - on demand. • Tooling for sizing, scope, clustering, public/private • No more fighting over who can use an environment • Only runs as long as teams need them • Hours -> Months • Automate data creation • Useful for demo storyboards • Useful for performance tests. • Useful for automated tests • Promotes data generation 19
  • 20. | Confidential 7. Optimize cost a bit 20 • Roughly 20 environments are created/destroyed daily • Spot instances are used for both temp stacks, and build slaves + data processing slaves.
  • 21. | Confidential 8. Container adoption 21 • “Puppet and Terraform is fine, but i need my new docker image deployed in hours not days, since i only used a few hours scaffolding it up + coding the microservice” - ML team member • In 2018 we rolled out Kubernetes in all test environments and production currently running 30% of our services as flexible containers • This can be challenging for our values • homogeneous systems? Managed services mixed with K8 • PCI compliance? • Very good for infra as code with helm
  • 22. | Confidential| Confidential Why automate? 22 4x growth in 8 months.
  • 23. | Confidential Why do we automate? 23 Infrastructure as code • DevOps work should be like development work • “Test driven operations” • Treat infrastructure code as regular code • Write tests for the puppet code and configuration. • Have a code reviews within the team. Benefits ● Recover faster from incidents. ● Fewer people can manage more servers. (5 people 5000+ servers) ● Less human error ● More transparency into what is on the servers.
  • 24. | Confidential Ability to scale fast 24 ● We grew the number of versioned services from ~15 to ~120 ● We increased our rate of deployment to prod a. From once a week to 15 times a day b. This distributes risk and shrinks potential blast radius ● We now have 6000 unit tests for puppet written in Ruby, which is about 60% code coverage. a. This allows us to change the puppet code faster and with way more confidence. ● Number of virtual machines is now above 5000+ across all environments, and varies a lot from day to day. ● With containers accelerated the commit -> production delay (for new services) even further
  • 25. | Confidential Self service engineering • All engineering teams in tradeshift can write automation code and test that on our AWS test account. • Introducing a new service in prod is only a code review exercise from operations • Releasing new versions in production is fully automated • The productivity teams maintains the tools • Engineers are granted access to logs and error collection tools which allows teams to always show metrics near their desks 25
  • 26. | Confidential Automated Security 26 “We don't really need an army of people patching servers, or following human compliance processes, if we can automate the whole thing” We can then focus human time on improving actual security
  • 27. | Confidential AWS China rollout • 1 Full production environment • 5125 new lines of terraform template code • 1097 new lines of hiera configuration • In 120 pull requests 27 We provisioned Using • A team of 5 people • In 7 working days • 40k lines of existing puppet code • Similar amount of existing terraform code
  • 28. | Confidential| Confidential The power of automation 28
  • 29. | Confidential 2018 Dealing with growth Aka MORE!!!
  • 31. | Confidential More Hosts / Static containers 31
  • 36. | Confidential Current tech scale 36 ~5000 VM’s on AWS (daily average, across all envs) ~15 Releases/D ay~120 Services 45 running in K8 (1-4 added per month) 2-3m daily business transactions 58 Developer Teams Hosting in US, Europe, China
  • 37. | Confidential Operations/Productivity Teams 2019 37 SREToolchain Developer Productivity Site Reliability Engineering Platform Infrastructure Stacks SRE Compute Containers Storage Roots Data Data BackendApp Frameworks Dev Support TBH SRE - Århus Compute China
  • 38. | Confidential The future • Even larger engineering organization • We are currently 350 in engineering - 58 developer teams • We expect to grow engineering 100%-150% in 2019 and more in 2020 • Even more automation • Immutable infrastructure • In product testing: canary deploys, red-green deploys, improved A/B and feature testing. • Even more often deployments • We do roughly 10-15 deploys a day today, we want this to grow 10x • More security certifications • 20x more microservices We are looking to consumer global scale SAAS for inspiration (Google, Uber, LinkedIn, Facebook, Twitter, etc) - for processes as well as technology. 38