At a high level, the goal of Multi-Arch infrastructure is that workloads can run on the best hardware for their price/performance needs, without developers being concerned with the underlying architecture. That doesn’t mean it’s easy! Multi-Arch touches Infra As Code, CI/CD, packaging, binaries, images, Kubernetes upgrades, testing, scheduling, rollout, reproducible builds, performance testing and more. This talk looks at how early adopters handled the challenges so you are prepared for the road ahead.
1. Cheryl Hung, @oicheryl
Sr Director, Infra Ecosystem, Arm
Multi-Arch Infrastructure from the
Ground up
KubeCon CloudNativeCon EU 2023
Amsterdam, 19 Apr 2023
4. Arm Infra Ecosystem: Cloud, 5G, Telco, Networking
@oicheryl
1. Developer Outreach 2. SW/HW support 3. Standards
5. Objectives
1. Why is Multi-Architecture infrastructure tricky?
2. How do I do Multi-Arch with Kubernetes?
3. Case studies: FusionAuth, Honeycomb, Arm
@oicheryl
10. Fruit computers
ARM v1 ARM v7
ARM v6
ARM v8.6
ARM v8.4A
Arm’s RISC architecture targets power efficiency and
performance and can be licensed for different use cases
11. @oicheryl
Goals of Multi-Arch infrastructure
Workloads should run on the
best hardware for their
price/performance needs
Without developers being
concerned with the
underlying architecture
12. But Multi-Arch touches everything…
● Infra As Code
● CI/CD, reproducible
builds
● Packaging, binaries,
images, registries
● Testing, scheduling,
rollout, performance
testing
● Kubernetes upgrades
● …
17. 1. Inform
Inventory your software stack
● OS
● Container images
● Libraries, frameworks and runtimes
● Tools used to build, deploy and test
● Tools used to monitor and manage
and check each for Arm support (AArch64 in GCC, arm64 in Linux
kernel)
Identify hotspots
@oicheryl
18. 2. Optimize
Provision test Arm64 environment
Upgrade container images and test
Performance testing
Update CI/CD for reproducible Arm64 builds
@oicheryl
19. 3. Operate
Build K8s cluster
● Mixed control plane and worker nodes
● Cluster creation
● Daemonsets
Canary or blue-green deployment
● Node affinity, taints, tolerations
● Different limits and requests per architecture
@oicheryl
21. “48 of the top 50
Amazon EC2 customers
use AWS Graviton
processors for their
workloads”
Danilo Poccia, Chief
Evangelist (EMEA) AWS, Aug
2022
@oicheryl
23. technical timeline
1. Finding the JVM that supported ARM, especially Mac ARM chips (Java 17 was
the first one to do so). Added Java 17/ARM support to the code base Dec 2021
- Feb 2022
2. Updating and testing install scripts to use the correct JVM
3. Updating docker to target the ARM architecture with jlink and multi-arch
builds
4. Checking for ARM support in public cloud regions when spinning up SaaS
5. Update the application to expose the underlying architecture
Logins are especially CPU intensive due to password hashing, so the team load
tested 50k logins. Arm handled 26-49% more logins per second and cost 8-10%
less than Intel on EC2
@oicheryl
24. “Because we run on Java, our lift was pretty small. We just
had to find a JVM built for ARM, and then work out any
remaining kinks”
- Daniel DeGroff, FusionAuth CTO
“I just switched a FusionAuth instance to arm64 and the
transition was so smooth I couldn't even tell whether it's
actually running the arm64 version”
- Hendy Irawan, BandungPermaculture.com CIO and FusionAuth
user
@oicheryl
25. Full stack observability enabling engineers to deeply
understand and debug production software
@oicheryl
March
2020
First experiments
with Graviton2
Ingest workers in
production
Virtually all
workloads and
envs on Arm
92% of vCPUs on
Arm
May
2021
Nov
2021
Turned off last x86
EC2 instances
99% Arm on
Lambda
April
2022
26. 1. Chose to migrate ingest workers first as they are stateless, performance
critical and scales out horizontally. Written in Golang, so compiling for Arm
was easy.
2. Deployed in dogfood environment and observed positive results.
3. Initially not Kubernetes or container orchestration, everything in Terraform
and Chef so could switch Arm Amazon Machine Images (AMIs) by
enumerating all the dependencies to update.
4. Next up were own workloads (easiest to re-compile and highest compute
spend), Kafka. The last were ad-hoc one-off services and those difficult to
migrate.
honeycomb.io/blog/present-future-arm-aws-graviton-honeycomb
27. “Graviton has enabled Honeycomb to scale up our product without
increased operational toil, spend less on compute, and have a smaller
environmental footprint.”
- Ian Smith, Engineering Manager, Honeycomb.io
“I personally approached it as an idle experiment with a few spare
afternoons, and was surprised by how compelling the results were.
Saving 40% on the EC2 instance bill for this service […] is well worth the
investment”
- Liz Fong Jones, Field CTO, Honeycomb.io
@oicheryl
28.
29. Dogfooding on Arm
@oicheryl
2019 Arm moved EDA
tools from on-prem
x86 to Graviton
✅ 60% better performance
✅ 50% reduced cost
✅ >1MW power saved/day
33. Thanks!
Slides at oicheryl.com
Takeaways
● Why multi-arch?
● How?
● Talk to me!
○ Technical assistance from Arm
○ Credits for CI/CD
○ Success stories
@oicheryl
Sched
feedback
34. Thanks!
Slides at oicheryl.com
Takeaways
● Why multi-arch?
● How?
● Talk to me!
○ Technical support from Arm
○ Credits for CI/CD
○ Success stories
@oicheryl
Sched
feedback