Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Patterns and Pains of Migrating Legacy Applications to Kubernetes

145 views

Published on

Open Source Summit 2018, Vancouver (Canada): Talk by Josef Adersberger (@adersberger, CTO at QAware), Michael Frank (Software Architect at QAware) and Robert Bichler (IT Project Manager at Allianz Germany)

Abstract:
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud-native apps. But what to do if you’ve no shiny new cloud-native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Patterns and Pains of Migrating Legacy Applications to Kubernetes

  1. 1. Patterns and Pains of Migrating Legacy Applications to Kubernetes Josef Adersberger & Michael Frank, QAware Robert Bichler, Allianz Germany @adersberger @qaware
  2. 2. Michael Frank, Lead Developer, QAware Robert Bichler, Project Manager, Allianz Germany Josef Adersberger, Architect, QAware
  3. 3. CIO Let’s bring all our web applications onto a cloud native Platform
  4. 4. COSTS AVAILABILITY PRODUCTIVITY Digitalization => Agile => Cloud Native Platforms
  5. 5. Priorities: (1) Time (1,5 years) (2) Ops cost savings (3) Migration costs
  6. 6. 6 WE WERE BRAVE
  7. 7. WE FELT PAIN
  8. 8. WE DISCOVERED PATTERNS
  9. 9. 9 ❏ All 152 legacy applications migrated and in production within 17 months ❏ All security-hardened and modernized to containerized 12-factor-apps ❏ Benefits leveraged: strong business case, higher availability, more agile teams WE WERE SUCCESSFUL
  10. 10. The Architect’s Point of View
  11. 11. Patterns for success
  12. 12. 12 Visibility
  13. 13. The Cloudalyzer Tableau analysisMIGRATION DATABASEQAVALIDATOR SONARQUBE EAM TOOL QUESTIONNAIRES JIRA XLS STATIC ANALYSIS IBM MIGRATION TOOL … MIGRATION TASKS BASIC TOUR-DE-MIGRATION SYSTEM PROPERTIES OWASP Scanner jQAssistant
  14. 14. Questionnaire: Typical questions • Technology stack (e.g. OS, appserver, jvm) • Required resources (memory, CPU cores) • Writes to storage (local/remote storage, write mode, volume) • Special requirements (native libs, special hardware) • Inbound and outbound protocols (protocol stack, TLS, multicast, dynamic ports) • Ability to execute (regression/load tests, business owner, dev knowhow, release cycle, end of life) • Client authentication (e.g. SSO, login, certificates)
  15. 15. 15 Emergent design of cloud native software landscapes
  16. 16. Architecting hundreds of applications • Application Blueprint: Describing target architecture and some rules & principles • Migration Cookbook: Guidance on how to migrate the applications based on the application blueprint. Single source of truth & know-how externalization • Tour-de-Migration: Visiting all applications and collect open issues • GoLive Readiness Checklist: Criteria to be checked before GoLive APPLICATION BLUEPRINT MIGRATION COOKBOOK TOUR-DE-MIGRATION GOLIVE READINESS CHECKLIST Q1/17 Q2/17 Q3/17 Q4/17 Q1/18 Q2/18 APPLICATION MIGRATION CLOUD PLATFORM SETUP
  17. 17. APPLICATION HTTPD WEB LAYER J2EE 1.4 APPSERVER JVM 1.6 DB MQ HOST BATCH FS CLIENTS TLS 1.0+ TCP-Binary, WS, REST, C:D, LDAP Corba, SMTP, FTP, NAS, … RACF ESB ONPREM DATA CENTER ONPREM DATA CENTER DB MQ HOST BATCH FS RACF ESB KUBERNETES / OPENSHIFT DOCKER JVM 8 INNER APPLICATIONS AWS WEB LAYER AWS CLIENTS TLS 1.2 all TLS 1.2 JEE 7 APPSERVER SECURITY GATEWAY OUTER APPLICATIONS all 2-way TLS 1.2 & OIDC identity token Only data In transit The Blueprint
  18. 18. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  19. 19. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  20. 20. A sweet spot for legacy apps Cloud Friendly Apps … and enhance the application according the 12 factors Put the monolith into a container: do not cut, do not enhance with features in parallel
  21. 21. Sidecars to the rescue
  22. 22. Container patterns applied • Log extraction • Task scheduling Sidecar: Enhance container behaviour Ambassador: Proxy communication Adapter: Provide standardized interface • Configuration (ConfigMaps & Secrets to files) • mTLS tunnel • Circuit Breaking • Request monitoring Pod Application Container Pattern Container Other Container “Design patterns for container-based distributed systems”. Brendan Burns, David Oppenheimer. 2016
  23. 23. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  24. 24. Anti-pain rule: Don’t cut the monolith
  25. 25. Anti-pain rule: Don’t cut the monolith MONOLITH SOME MAGIC SAUCE BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER MONOLITH
  26. 26. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  27. 27. Security service to the rescue MONOLITH MONOLITH SECURITY SERVICE BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER TOKEN PROVIDER IAM SYSTEMS Adapting multiple authentication mechanisms to a uniform OIDC token.
  28. 28. Kubernetes constraints Initially we thought we’ll run into k8s restrictions on our infrastructure like: ‣ No support for multicast ‣ No RWX PVC available We did. But all required refactorings were moderate effort and lead to a better architecture.
  29. 29. Pain
  30. 30. The Lead Developer’s Point of View
  31. 31. The almighty legacy framework • “worry-free package framework” from the early 2000s with about 500kLOC, 0% test coverage and multiple forks • Strategies: • the hard way: consolidate forks and migrate manually and increase coverage • decorate with ambassadors, sidekicks and adapters • do not migrate parts and replace that API within the applications APPLICATION ALMIGHTY LEGACY FRAMEWORK J2EE 1.4 APPSERVER JVM 1.6 • from J2EE 1.4 to JEE 7 and Java 6 to 8 • add identity token check and relay • modify session handling (synchronization) • modify logging (to STDOUT) • modify configuration (overwrite from ConfigMap) • enforce TLS 1.2 • place circuit breakers • predefined liveness and readiness probes
  32. 32. TIME- OUTS
  33. 33. Timeouts: The pain • Kinds • Timeouts often too high. This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive
  34. 34. Timeouts: The pain • Kinds • Timeouts often too high. This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive
  35. 35. Timeouts: Recommendations • Keep timeouts within the following ranges – 1-3s for getConnection & connect – 3-60s for socket/read - aim as low as possible – 1-3min for TTL/KeepAlive of pooled connections • Allow for dynamic DNS changes and dynamic scaling of backend services • Tradeoff between reaction time and performance • Cascade timeouts – outer layer highest – inner layer lowest 60s 57s 54s 51s
  36. 36. LATENCY
  37. 37. Latency • Pain: Dramatic increase in latency You can't scale away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...
  38. 38. Latency • Pain: Dramatic increase in latency You can't scale away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...
  39. 39. DIAGNO- SABILITY
  40. 40. Diagnosability 1. Early on - diagnose cloud platform issues upfront 2. Holistic - monitor and correlate everything (infrastructure & apps, multiple levels, metrics & logs & traces) 3. Mandatory - everyone has to use it 4. Automatically - auto-instrumentation not involving devs
  41. 41. Metrics Events / LogsTraces • High effort to instrument for valuable insights • Scalability unclear for hundreds of applications • Applications have no time to run their own Prometheus instance • Scalability unclear for hundreds of applications (Jaeger & ZipKin) • Applications have no time to run their own instance • Scalability unclear (a lot of events lost) • Applications have no time to run their own EFK instance • Non-standardized log format requires custom log rewrite adapter but no fluentd DaemonSet Application Diagnosability?
  42. 42. Metrics Events / LogsTraces … use APM tools like Dynatrace and Instana Want to move fast? Buy first, reduce cost later Application Diagnosability
  43. 43. SESSION STATE
  44. 44. Session state 1. Session Stickiness: not within the cloud! 2. Session Persistence • Existing DB: perf impact to high ☹ • Redis: no TLS out of the box and infrastructure required ☹ 3. Session Synchronization • App-Server: no dynamic peer lookup within k8s ☹ • Hazelcast: TLS only in paid enterprise edition ☹ • ...
  45. 45. Session synchronization with Ignite • Apache Ignite as in-memory data grid – Embedded within application or standalone (in sidecar) – Cumbersome but working k8s peer lookup • Look out for ... – Java serialization – Legacy frameworks with custom session handling – Prevent generating sessions for e.g. health check requests – Applications putting large things into the “session” and misuse session as cache
  46. 46. #@!!#@$
  47. 47. Other technical pain points Pain Pattern Legacy crypto without TLS 1.2 and SNI support (e.g. Java 1.6) ● Find matching cipher suites ● Add a security proxy Legacy apps violating HTTP standards Refactor Access source URLs in redirect loops (e.g. IDP login) Use x-forwarded header and provide according filter No automated test suites ● Automated high-level tests ● Test generation (e.g. evosuite)?
  48. 48. The Project Manager’s Point of View
  49. 49. Patterns for success
  50. 50. Management support ❏ Strong management support ❏ Clear scope ❏ Courage to drive the change to cloud native development
  51. 51. Project Marketing & Motivation Identification & Celebration
  52. 52. Co-Location space One LEAP-Area ❏ Support- & ❏ Industrialization team ❏ In case of required support: Migration team
  53. 53. Industrialization
  54. 54. ARCHITECTURE TEAM DOZENS OF MIGRATION PROJECTS RUNNING IN PARALLEL (organized in release trains) ‣ Training sessions ‣ Support sessions ‣ Co-Location & remote ‣ Guidance / best practice sharing (cookbook, sample application) ‣ Unified development environment (via GitHub) ‣ Standard base images ‣ Pre-migrated frameworks ‣ Solutions: Security service, ambassadors INDUSTRIALIZATION TEAM ‣ Application blueprint ‣ Migration database SUPPORT TEAM ‣ Feedback
  55. 55. Transparency & information radiators App-Support Activities & Milestones Quality GoLive Planning Operational

×