As Squad Architect Platform I supported the platform-team to migrate a complete ecommerce-environment to Google Cloud Platform. By sketching out various migration-steps, technical concepts and tooling I will explain we did the migration exactly this way.
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Migrate eCommerce Platform to GCP in 6 Hours
1. Paul Puschmann | 16.06.2021
How we finally migrated an
eCommerce-Platform to GCP
1
2. Who’s speaking?
Paul Puschmann, Squad Architect Platform at REWE Digital GmbH
Since July 2014 at REWE Digital GmbH
Taking care about the eCom-infrastructure
2
3. eCom: What is this?
• eCom / eCommerce-environment is providing customer-faced
applications for REWE
• www.rewe.de
• shop.rewe.de
• Mobile-Apps & API
3
7. New approach: "Lift & Shift"?
Use exact the same tooling as in current eCom-setup:
• Terraform everything at GCP
• Use simple GCE-instances
• Configure GCE-instances with Ansible
• "just" migrate datacenters
7
8. Prerequisites
What’s already in our stack supporting the migration?
• external Cortex & Grafana setup (scalable Prometheus)
• Consul
• Postgres-setup with PgBouncer
• Nomad
• Kafka
• Centralised Container Deployment-Scripts
8
12. Consul
Service Discovery
12
Docker-Host-A / 172.30.124.50 Docker-Host-B / 172.30.124.51
nslookup
app-b.service.consul.myinternal-domain1.net. 5 IN A 172.30.124.51
app-b.service.consul.myinternal-domain1.net. 5 IN A 172.30.124.50
17. <servicename>.service.consul.<domain> is limited to the local datacenter
Consul
Prepared queries
17
Prepared-Query:
{
"Name": "",
"Template": {
"Type": "name_prefix_match"
},
"Service": {
"Service": "${name.full}",
"Failover": {
"NearestN": 2
}
}
}
<servicename>.service.consul.<domain> is limited to the local datacenter
Solution:
<servicename>.query.consul.<domain>
Not an option:
<servicename>.service.datacenter2.consul.<domain>
18. How to move from .service.consul. to .query.consul. ?
Using shortnames: change the search-domain in /etc/resolv.conf
Using FQDN: Change the FQDN ;-)
Consul
Prepared queries
18
28. Migrate PRE & prepare PROD
• Setup „hardware“ for PRE & PROD at the same time
• Do it like PROD:
• orchestrated & condensed
• no downtime for PRE
28
30. The PROD migration
1. Stop external traffic
2. Primary-failover of Postgres & create new replicas
3. Migrate services between datacenters in Nomad
4. Migrate Solr, Redis, Elasticsearch
5. Reconfigure external DNS
6. Reallocate Kafka-Topics in two batches (pareto-split)
1. All smaller topics first
2. The few big topics at last
30
31. The PROD migration
Finish-line
1. Testing
2. Start external traffic
3. DONE, after 6 hours and 25 minutes
… 50 hours and 50 minutes after „going live again“ all Kafka-data was
reallocated.
31
32. The PROD migration
Summary of PROD
• 119 new GCE-instances
• 200 micro-services migrated
• 138 databases migrated
• 5 Terabyte of production-data moved (replicas not counted)
32
33. „Lift & Shift“
Summary
The Platform-Team migrated a complete PROD eCommerce-platform
in six hours from a VMware-environment to GCP
without any other external configuration changes.
The downtime during the migration was only committed to maximise
the data-consistency of the eCommerce-platform.
Everything is in code. Nice!
33
34. Hashicorp rocks!
We had configured everything using Consul service-discovery,
and this was a huge benefit.
34
was the most essential thing in this migration.
35. How we finally migrated an eCommerce-Platform to GCP
35
Source: https://vine.co/v/5blZLuKaZrQ