Watch the presentation: https://youtu.be/E3LeAmH6Ems
At the beginning of 2019, Chris Nuland and team were tasked with migrating a large mesosphere DC/OS cluster with hundreds of running containers to Kubernetes for a large fortune 100 healthcare company. One of the key challenges with this migration was the need to finish it within a 7 month timeframe to allow the sunsetting of DC/OS before the cluster’s end of life. In conjunction with this migration, there was also the need to containerize a couple hundred applications and deploy them into the newly built cluster. These tasks were completed in the desired time frame using a variety of migration and onboarding techniques, including the use of a few migration tools, like pathfinder, that would eventually be part of the Konveyor suite of applications.
This presentation will go over many of the challenges of that migration, how certain tooling aided in the process, and how the process would look differently now given many of the migration tooling advantages found in the Konveyor suite of applications.
Presenter: Christopher Nuland, Architect at Red Hat
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Migrating a Large Fortune 100 Healthcare Company to Kubernetes in 7 months
1. Migrating a Large Fortune 100
Healthcare Company to
Kubernetes in 7 months
Christopher Nuland - Architect
1
2. 2
Accelerate your journey to Kubernetes
with the Konveyor Community
A community of people passionate about
helping others modernize and migrate
their applications to the hybrid cloud by
building tools and best practices on
how to break down monoliths, adopt
containers, and embrace Kubernetes.
www.konveyor.io
5. 5
About HealthCare Co.
● Fortune 500
● A large conglomerate of over 20 different insurance
companies
● Over 70,000 employees
● $1 billion yearly IT budget throughout all subsidiaries
● Major client of IBM
● Large overseas IT operations team based out of India
6. 6
Operations
● Mostly uncontainerized apps when starting in March of 2019
● Bulk of operations running on VM’s
● Had begun migration to Mesos DC/OS and Docker Swarm back
in 2017
○ 13 application teams running on DC/OS
○ ~150 containers running in production environment
○ ~300 containers running in non-production environments
● Using Netflix’s service discovery stack (Zuul and Eureka)
● Mostly managed by overseas operations team
● Started piloting k8s with OpenShift 3.11
7. 7
k8s Migration
● Needed off DC/OS within 7 months for hard contract deadline
● Setting up a hybrid cloud with AWS and on-prem VMWare
● Utilizing OpenShift 4 (OCP) and Elastic Kubernetes Service (EKS)
● Migration included,
○ Migrating images off of DCR to Quay (hosted in OCP)
○ Sunsetting of Bamboo and migration to Jenkins (hosted in
OCP)
○ Management strategy for overseas team to run OCP after
the sunsetting of DC/OS
○ Migrating uncontainerized applications
○ Retire OCP 3.11
8. 8
Challenges
● Getting buy-in from applications teams to take ownership of the
full devops approach
● Training application teams on containerization and core k8s
concepts
○ All containerization was managed by operations team prior
● Migrating hundreds of services within 7 months
● Moving away from the Netflix service discovery stack
● Removing root privileges on all containers
● Obstacles around TLS termination and configurations of
HAProxy
10. 10
DC/OS Approach
● Migrate images to quay repository
● Use the oc new-app <image-url> to create initial assets
● Export k8s resources to a GIT repository
○ Building blocks for GitOps approach that was implemented
the following year
● Remove Netflix service discovery references in applications
● Remove requirement for root or UID in containers
● Work with networking team on requirements around TLS and F5
routing for production release
11. 11
OpenShift 3.11 → 4.3
● Migration analysis using Control Plane Migration
Assistant (CPMA)
● Automating the lift and shift process with Migration
Toolkit for Containers (MTC) Operator
● Move into a GitOps patterns using ArgoCD
● Leveraging existing patterns made this the easiest
of the 3 migration types
● Tools allowed migration tool to enforce strict
methodologies going into OpenShift 4
13. 13
Containerization
● Majority of applications were java websphere
● Asked to define cloud readiness for each
application
● Need for quick analysis of applications
● Used Pathfinder tool for data collection
● Migrated middleware from websphere to jboss
web server (JWS Tomcat)
● Containerized applications for utilization within
OpenShift and EKS
14. 14
Accomplishments
● Migrated and retired DC/OS within 8 months
(required a 1 month extension with DC/OS because
of a new DR standard)
● DCR concurrently retired with DC/OS
● Bamboo in the process of being retired and the
majority of non-legacy applications moved to
Jenkins
● Majority of applications running on Red Hat JWS
● More application teams taking ownership of full
DevOps process
15. 15
Shortcomings
● Overseas operations team not prepared to
takeover ownership of OCP and EKS clusters
● Some application teams unwilling to take
ownership of DevOps and GitOps process and
wanting to continue the “throw over the fence”
approach
● Challenges enabling teams after initial OCP 3.11 →
4.3 migration
● Scaling logging to meet demand and log
forwarding to Splunk
● Manual approach for migrating from existing
platforms like DC/OS and Docker Swarm
16. 16
What Would Be Different in 2021
● A more mature MTC operator (now MIG) would
have allowed for faster and more stable migrations
from OCP 3.11 → 4.3
● Projects like Move2Kube would have allowed for
the standardization of migrating into platforms like
OCP and EKS from existing platforms
● Utilizing new features in pathfinder (now Tackle)
would have caught more early warning signs in our
cloud enablement projects