Imagine migrating 1,200 hosts and 16,000 VMs from Xen to KVM, all while maintaining existing networking and integration functionality and providing a seamless transition for the infrastructure users. This is exactly what TicketMaster took on with the support of CloudOps as they adopted Apache CloudStack as their virtualization orchestrator. Learn about the challenges faced and the creative solutions developed to enable the successful transition.
Syed Ahmed and Jean-Francois Nadeau presented this talk at the CloudStack Collab Conf. of ApacheCon North America 2018 in Montreal, Quebec.
At Ticketmaster, Jean-Francois Nadeau is a senior systems engineer with a focus on open source solutions. With a large ecosystem of teams consuming their infrastructure, they focus on technologies which are flexible while also being operationally efficient. At Cloudops, Syed Ahmed is a software developer focusing on integrations and hard to solve problems. With extensive knowledge throughout the hardware and software stack, he is able to add a unique perspective to solving integration and orchestration challenges.
3. APACHECON North
America
Ticketmaster Intro
● 21 ticketing systems and
over 250 internal products
● 1400+ people in Product &
Tech
● Over 15000+ network
endpoints across the world
(Venus, Arenas, Kiosks, etc.)
● Every era of software…
starting in 1970
Tech
Museum
4. APACHECON North
America
The ticket vending machine
● Majority of our internal products runs in
our datacenters and are virtualized
● With many ticketing systems and
operational groups, it was required to
break our infra down into “tenants” to be
more manageable and secure.
● The isolation between tenants is enforced
at the physical network
5. APACHECON North
America
The virtual Infrastructure at
Ticketmaster (pre migration)
● 18K+ VMs
● 1K+ hypervisors
● 100+ Xenserver pools
● Homegrown self service portal
abstracting the complexities of the tenant
and network model
● End users only thinks about application
clusters, not infrastructure
● VMs are independent of application code.
All code resides in shared filesystem
6. APACHECON North
America
Challenges with Existing
Infrastructure
● XenServer free worked just fine for years
but it’s licensing change forced us to
reconsider our options.
● The home grown portal was built before
the *stack era. Originally built as a UI only
interface, APIs were most wanted. .
7. APACHECON North
America
What Alternatives Did we Have?
● Pay $$ for Xen Server and commit to Xen
for several years. Not an improvement for
the user.
● Revisit Openstack again.
8. APACHECON North
America
Problems With the Alternatives
● Our first Openstack test drive was not a
success (back to Havana)
○ Control plane complexity
○ We still had the Portal in front of it
● Green field is not an option. We need to
re-deploy VMs with the same network
identity.
9. APACHECON North
America
Why we Chose CloudStack
● Easy control plane setup and HA
● Integrating existing networks without the
need to reserve IP ranges
● Extending the API looked simple enough
to allow us to mimic our Portal logic in
cloudstack
● Opportunity to adopt KVM
10. APACHECON North
America
Integrating CloudStack into the
existing Setup
● Delegate IP/DNS to the existing IPAM
● no VR
● Existing AZs become zones
● Tenants (Product groups) are projects
● Networks scoped into projects
● Allow end users to self-migrate to
CloudStack
11. APACHECON North
America
Integrating CloudStack into the
existing Setup
● Custom API for creating new VMs
● Custom API for migrating VMs to
CloudStack
● Custom UI plugin for different workflow
to create VMs
● CLI tools for running migrations
● LDAP setup to reuse existing users
● Project/Domain setup
12. APACHECON North
America
Migration Process to CloudStack
● Adding new services which integrate with
the existing IPAM and Asset Inventory
● Creating APIs for running migration from
CloudStack.
● Shut the VM down in Xen, Create a new
VM in KVM, Update IPAM and Inventory
● Verify if the migration is successful
● Destroy the old VM
● Revert the process if migration is not
Successful
13. APACHECON North
America
Journey So Far
● All non-prod VMs migrated
● About 6000 VMs currently running in
CloudStack across 5 zones and 2 regions
● CloudStack+KVM being used for
production VMs as well
● Few months away from completing full
migration
14. APACHECON North
America
Lessons Learnt
● KVM live migrations whoes and tunings
● Controllers hosting backend cloud DB vs
split brain conditions
● Ansible the CS infra entirely
● CloudStack’s RBAC enforces a tree
structure which makes it inflexible
● EXT3 unstable when hypervisor crashes
● XAPI sometimes fails to shut down a VM