Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Service discovery and configuration management at TransferWise

1,341 views

Published on

One of the things standing in the way of effective and scalable engineering is shared infrastructure. Ed Hargin, Lead of DevOps and Kyrylo Novotarskyi, Software Engineer for Currencies and Banking, focus on the journey TransferWise took from shared to independent infrastructure management, including moving from legacy manual towards automatic instance provisioning, the evolution of service registration and discovery adoption, and the introduction of centralized distributed configuration storage, including secret management techniques, testing, troubleshooting and disaster recovery scenarios.

Published in: Software
  • Be the first to comment

Service discovery and configuration management at TransferWise

  1. 1. SERVICE DISCOVERY AND CONFIGURATION MANAGEMENT AT TRANSFERWISE
  2. 2. FIRST INTRODUCTIONS
  3. 3. INTROS
  4. 4. FAMOUS FOR
  5. 5. ESTONIAN FOUNDERS
  6. 6. EVANGELICAL CUSTOMERS
  7. 7. $1 BILLION IN TRANSFERS 
 PER MONTH
  8. 8. SAVING OUR CUSTOMERS $50M PER MONTH
  9. 9. ENGINEERING CULTURE
  10. 10. WEAK CODE OWNERSHIP
  11. 11. EVERY PART OF THE CODE IS OWNED
  12. 12. ANY TEAM CAN CHANGE ANY PART OF THE CODE
  13. 13. ENGINEERING FOR CUSTOMER IMPACT
  14. 14. ENGINEERS GET CLOSE TO CUSTOMERS AND MAKE DECISIONS BASED ON DATA
  15. 15. SOME OF OUR TEAMS SERVICES NEED TO BE HIGHLY RESPONSIVE, SYNCHRONOUS AND SCALE REALLY WELL
  16. 16. OUR PLATFORM TEAM IS 8 PEOPLE STRONG WITH A HUGE BACKLOG
  17. 17. HOW DID WE HANDLE IT BEFORE?
  18. 18. AN ENGINEER ASKS FOR A PRODUCTION SERVICE
  19. 19. AND WAITS…
  20. 20. WAITS SOME MORE…
  21. 21. ASKS AGAIN…
  22. 22. WAITS SOME MORE…
  23. 23. GETS A SERVICE
  24. 24. NEEDS TO CHANGE PRODUCTION CONFIG
  25. 25. WAITS…
  26. 26. THIS NEEDS TO STOP
  27. 27. TOO MUCH CHATTER TOO LONG WAITS STRESS FOR EVERYONE
  28. 28. CUSTOMERS SUFFER
  29. 29. WE NEED: AUTOMATIC ALLOCATION PROVISIONING LOGGING AND MONITORING WITHOUT ENGINEERS
  30. 30. SERVICE DISCOVERY CAN HELP
  31. 31. RESEARCH AND TEX
  32. 32. WHICH DISCOVERY APPROACH TO CHOOSE?
  33. 33. CLIENT SIDE DISCOVERY
  34. 34. SELF REGISTRATION
  35. 35. NETFLIX OSS IS OUR WAY TO GO
  36. 36. PROOF OF CONCEPT
  37. 37. MAKE OUR MONOLITH CONSUME 1 SERVICE IN PRODUCTION
  38. 38. UNDERSTANDING THE SCOPE
  39. 39. Problem: making friends between spring cloud and anything else except spring boot can be tricky. There is no good adoption mechanism
  40. 40. Solution: grails and spring are close. Let’s read the source, find out what hides beyond spring boot’s netflix-specific annotations, mimic the approach it was designed for
  41. 41. Lets create a shared bean in the discovery space (used by our monolith parts that consumes the config and starts communication with Eureka)
  42. 42. Step 2: Let’s start a Ribbon LoadBalancer in that bean, connect it with Eureka client and let him start listening for apps
  43. 43. Step 3: Most of our communication is through restTemplate. How can we make restTemplate awesome and @LoadBalanced? Interceptors! Let’s build one
  44. 44. Step 4: Let’s add the interceptor to the needed RestTemplate Step 5: What if discovery completely fails? Fallbacks!
  45. 45. IS THIS A PERFECT SOLUTION? OF COURSE NOT!
  46. 46. NEED TO MAINTAIN TIED TO RESTTEMPLATES NEEDS ENGINEERS TO THINK ABOUT IT
  47. 47. NOT GOOD ENOUGH
  48. 48. BUILD CLIENTS AND MAKE EVERY CASE WORK AS A BLACK BOX
  49. 49. IF YOUR SERVICE WANTS TO BE DISCOVERED - MAKE YOUR CLIENT PROVIDE THE TOOLSET
  50. 50. PROBLEMS
  51. 51. SOME SERVICES HAVE A EUREKA CLIENT OF THEIR OWN EUREKA IS NOT USED TO IT (CONFIGURATIONS ARE STORED IN A CONTEXT BEAN) RESULTING IN OVERRIDES UNLESS CONTROLLED
  52. 52. TESTING IS PROBLEMATIC AS REQUIRES A FULL COPY OF PRODUCTION TESTING THE PIECES IS COMPLETELY ON THE ADOPTING ENGINEER
  53. 53. CHALLENGES
  54. 54. VARIATION IN HTTP COMMS AND IMPLEMENTATIONS
  55. 55. FEIGN AS A CLIENT BUILDER DIRECT FALLBACKS HYSTRIX CIRCUIT BREAKING SPLITTING BALANCING AND DISCOVERY BETWEEN SERVICES
  56. 56. SOLUTION COMPATIBILITY SPRING BOOT EUREKA RIBBON SUPPORTING TOOLSET AROUND 75% ADOPTION PROBLEMS
  57. 57. 3 MONTHS 30 SERVICES IN PRODUCTION 20 MORE IN THE MAKING
  58. 58. E
  59. 59. GREAT NOW WE HAVE LOTS OF SERVICE CONFIG TO TRACK
  60. 60. LET’S MAKE THINGS AS BORING AS POSSIBLE
  61. 61. LET’S PUT EVERYTHING IN (PUPPET | CHEF | SALT | ANSIBLE)
  62. 62. JOB DONE
  63. 63. EXCEPT…
  64. 64. SECRET MANAGEMENT IS TRICKY
  65. 65. WE CAN MAKE IT WORK. BUT IS THERE A LESS TEDIOUS WAY?
  66. 66. SPRING CLOUD CONFIG SERVER NETFLIX ARCHAIUS HASHICORP VAULT
  67. 67. NICE DOCUMENTATION: HTTPS://CLOUD.SPRING.IO/SPRING-CLOUD- CONFIG/SPRING-CLOUD-CONFIG.HTML
  68. 68. NAME/VALUE PAIRS /ENCRYPT & /DECRYPT ENDPOINTS EASY TO EMBED IN SPRING BOOT APPLICATIONS
  69. 69. SIMPLE REST API, SO PLAYS WELL WITH NON-SPRING TOO!
  70. 70. SUPPORTS TEMPLATE FILES… BUT WE’VE NOT USED THEM
  71. 71. SECRETS DON’T NEED TO BE STORED IN PLAIN TEXT: /ENCRYPT /DECRYPT
  72. 72. VAULT BACKEND: REQUIRES A TOKEN FROM CLIENT FAILFAST VS RETRY
  73. 73. GETTING IT INTO PRODUCTION: “VOLUNTEERED” A FEW SERVICE OWNERS
  74. 74. EVERYTHING WORKED SMOOTHLY. VERY SUSPICIOUS.
  75. 75. BUGS FOUND!
  76. 76. GITHUB DOWN: CONFIG SERVER GOES MENTAL CONFIG SERVER CACHES LOCALLY, BUT ALSO WILL ALWAYS CHECK FOR NEW CONFIG IN THE REPO
  77. 77. VERY INFREQUENTLY… IT JUST DIES AND WE DON’T KNOW WHY (YET) (NEVER HAD >1 NODE DIE AT ONCE, SO NOT AWFUL IMPACT, BUT CONCERNING)
  78. 78. RESULT: SERVICES ARE GETTING INTO PRODUCTION FASTER
  79. 79. Let us change money transfer forever. Together. https://transferwise.com/jobs/

×