Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[MicroCPH 2019] Airbnb's Great Migration: Building Services at Scale

114 views

Published on

MicroCPH 2019 conference in Copenhagen (https://microcph.dk/) abstract:

So you’ve decided to migrate from monolith to microservices, what next? Such a redesign to service-oriented architecture (SOA) is a long, arduous journey that benefits from an incremental, iterative approach. Yet, such a migration often must be done while still shipping new features, accelerating developer velocity, and growing the team in addition to ensuring there are no performance regressions.

This talk will focus on how Airbnb is building, operating, and scaling its expanding network of services. Though our re-architecture to SOA is still ongoing, we are already seeing various benefits including improved performance, developer productivity, build and deploy times, and site reliability.

Key takeaways:

* Understand design principles for building scalable, performant services
* Plan for dependencies: how to sequence decomposition into services and an API gateway
* Learn best practices for standardization, reliability, and performance when migrating architecture
* Identify ways to shift product culture to empower migration work
* Recognize tradeoffs with operating microservices


Published in: Engineering
  • Be the first to comment

  • Be the first to like this

[MicroCPH 2019] Airbnb's Great Migration: Building Services at Scale

  1. 1. Airbnb’sGreatMigration:
 BuildingServicesatScale JESSICA TAI / MAY 15-16, 2019 / MICROCPH - COPENHAGEN
  2. 2. @jessicamtai 2014
  3. 3. @jessicamtai 2015+
  4. 4. @jessicamtai Hi,I’mJessica.
 Ipairprogram
 withmycorgi.
  5. 5. Why migrate? Service design tenets Incremental
 comparison Best practices Results Decomposition @jessicamtai
  6. 6. Why migrate? Best practices Results Incremental
 comparison Service design tenets Decomposition @jessicamtai
  7. 7. Why migrate? Best practices Results Incremental
 comparison Service design tenets Decomposition @jessicamtai
  8. 8. Why migrate? Best practices Results Incremental
 comparison Service design tenets Decomposition @jessicamtai
  9. 9. Why migrate? Best practices Results Incremental
 comparison Service design tenets Decomposition @jessicamtai
  10. 10. Why migrate? Best practices Results Incremental
 comparison Service design tenets Decomposition @jessicamtai
  11. 11. Monorail,ourRubyonRailsmonolith @jessicamtai
  12. 12. Easystartwithmonoliths EARLY AIRBNB Client traffic Shared database Monorail Data access query Business logic Presentation view @jessicamtai
  13. 13. @jessicamtai
  14. 14. @jessicamtai
  15. 15. WHY DECIDE TO MIGRATE? @jessicamtai
  16. 16. @jessicamtai
  17. 17. Moreincidents Slowerdeploytrains @jessicamtai
  18. 18. Moreincidents Slowerdeploytrains @jessicamtai
  19. 19. Oursolution:Service-orientedarchitecture(SOA) NETWORK OF LOOSELY-COUPLED SERVICES @jessicamtai Client API Gateway Service 1 Service 2 Service 3 Data store Data store
  20. 20. CheckoutpageinSOA @jessicamtai Homes service Reservation service Review service Availability service Messaging service Business travel service Cancellation service Homes demand service Pricing service User service Checkout page service
  21. 21. SERVICE DESIGN TENETS @jessicamtai
  22. 22. Servicesownreads&writes
 totheirdata @jessicamtai
  23. 23. Servicesaddressaspecificconcern @jessicamtai
  24. 24. Avoidduplicatefunctionality https://www.flickr.com/photos/popilop/331357312 @jessicamtai
  25. 25. Datamutationspropagate
 viastandardevents @jessicamtai
  26. 26. Buildforproduction @jessicamtai
  27. 27. DECOMPOSE BY
 REQUEST LIFE CYCLE @jessicamtai
  28. 28. Requestlifecycle Client traffic Shared database Monorail Data access query Business logic Presentation view V1: MONORAIL @jessicamtai
  29. 29. Requestlifecycle Monorail API traffic Routing & view Business logic, model, data 
 via services Client traffic V2: MONORAIL & SERVICES @jessicamtai
  30. 30. MIDDLE TIER Shared business logic Servicetypes STRICT FLOW OF DEPENDENCIES PRESENTATION Synthesize DERIVED DATA Shared context, multiple sources DATA Entity read and writes @jessicamtai
  31. 31. MIDDLE TIER Shared business logic PRESENTATION Synthesize DERIVED DATA Shared context, multiple sources DATA Entity read and writes Servicetypes STRICT FLOW OF DEPENDENCIES @jessicamtai
  32. 32. MIDDLE TIER Shared business logic PRESENTATION Synthesize DERIVED DATA Shared context, multiple sources DATA Entity read and writes Servicetypes STRICT FLOW OF DEPENDENCIES @jessicamtai
  33. 33. MIDDLE TIER Shared business logic DERIVED DATA Shared context, multiple sources DATA Entity read and writes PRESENTATION Synthesize Servicetypes STRICT FLOW OF DEPENDENCIES @jessicamtai
  34. 34. PRESENTATION Synthesize MIDDLE TIER Shared business logic DERIVED DATA Shared context, multiple sources DATA Entity read and writes Servicetypes STRICT FLOW OF DEPENDENCIES @jessicamtai
  35. 35. Homes H O S T E D B Y K I T T Y   ·   A P T O S , C A L I F O R N I A Mushroom Dome @jessicamtai
  36. 36. Homes data service Shared database Monorail Data access query Business logic 1.Migratecoredatamodels Presentation view Homes database @jessicamtai
  37. 37. Homes data service Shared database Monorail Data access query Business logic 2.Migratecorebusinesslogic Presentation view Homes database Pricing derived data service Pricing trends data store @jessicamtai
  38. 38. Homes data service Shared database Monorail Data access query Business logic 3.Migratecoreproductviews Presentation view Homes database Pricing derived data service Pricing trends data store Checkout presentation service @jessicamtai
  39. 39. Homes data service Shared database Monorail Data access query Business logic 4.Migratecoreproductwrites Presentation view Homes database Homes validation middle-tier Pricing derived data service Pricing trends data store Checkout presentation service @jessicamtai
  40. 40. Requestlifecycle Monorail API traffic Routing & view Client traffic V2: MONORAIL & SERVICES @jessicamtai Business logic, model, data 
 via services
  41. 41. Requestlifecycle API gateway Middleware Session data service Authentication data service Oauth data service Risk derived 
 data service ... Request
 context Presentation, logic, data V3: SOA & API GATEWAY Routing @jessicamtai
  42. 42. Requestlifecycle API gateway Middleware Session data service Authentication data service Oauth data service Risk derived 
 data service ... Request
 context Web rendering service HTML viewV3: SOA & API GATEWAY Routing @jessicamtai Presentation, logic, data
  43. 43. Monolith world Services world TheFutureTM @jessicamtai
  44. 44. Migration world Monolith world Services world @jessicamtai
  45. 45. COMPARE FOR DIFFERENCES @jessicamtai
  46. 46. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Read path A Read path BMonorail Service Database @jessicamtai
  47. 47. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monorail Service Database Response A Response B Consumer + 
 offline comparison framework @jessicamtai
  48. 48. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monorail Service Database Response A Response B Consumer + 
 offline comparison framework @jessicamtai
  49. 49. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Production
 traffic Monorail Service Database Response A Response B Consumer + 
 offline comparison framework @jessicamtai
  50. 50. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Production
 traffic Monorail Service Database Response A Response B Consumer + 
 offline comparison framework @jessicamtai
  51. 51. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch 100% Monorail Service Database Response A Response B Consumer + 
 offline comparison framework @jessicamtai
  52. 52. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch @jessicamtai Monorail Service Database Response A Response B Consumer + 
 offline comparison framework
  53. 53. Dualreadcomparison WaitRamp&waitCompareGate Admin web UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monorail Service Database The only read path @jessicamtai
  54. 54. Writecomparison DUAL WRITE TO SEPARATE DATABASES Presentation service Write validation middle tier service Write path A Write path B Data service (production) Data service (shadow) Consumer + 
 offline comparison framework Monorail Payload A Payload B @jessicamtai
  55. 55. s Writecomparison DUAL WRITE TO SEPARATE DATABASES Presentation service Write validation middle tier service The only write path Data service (shadow) Monorail @jessicamtai
  56. 56. APIgatewaycomparison API gateway Presentation service Product logic Monorail Product logic Original request 
 (no middleware applied) Shadow request 
 with context Add request context @jessicamtai Middleware @jessicamtai Shadow
 request copy
  57. 57. APIgatewaycomparison API gateway Presentation service Product logic Request 
 with context Middleware @jessicamtai
  58. 58. Incrementalmigration @jessicamtai
  59. 59. ! Production traffic with partially complete service ○ e.g. batch API /loadUsers ○ Fetch users only by user id ! Unblock clients Migratebyendpoint @jessicamtai
  60. 60. Migratebyattribute Service Monorail Database Read migrated
 attributes Read not-yet-migrated 
 attributes Database Presentation service Production traffic @jessicamtai
  61. 61. SOA BEST PRACTICES @jessicamtai
  62. 62. Frameworks Auto-generate code Testing&deploying Replay production traffic Observability Standard templates Standardizeservicebuilding SCALE WITH CONSISTENCY @jessicamtai
  63. 63. Service Service&clientsetup Business logic @jessicamtai
  64. 64. Service Service&clientsetup Business logic Endpoint logic Server transport @jessicamtai
  65. 65. Service Service&clientsetup Business logic Endpoint logic Server transport Java client Ruby client Client transport Client transport @jessicamtai
  66. 66. Service Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Endpoint logic Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking @jessicamtai
  67. 67. Service Endpoint logic Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking @jessicamtai
  68. 68. Service Endpoint logic Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking @jessicamtai
  69. 69. FrameworksusingThrift
 InterfaceDescriptionLanguage(IDL) @jessicamtai
  70. 70. IDL Service Endpoint logic Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking @jessicamtai
  71. 71. ThriftIDL API FRAMEWORK /** Batch request for demo data */ struct LoadSomeDataRequest { 1: optional set<i64> ids /** Some extra context baz */ 2: optional bool fooBar (personal) } @jessicamtai
  72. 72. ThriftIDL API FRAMEWORK /** id to data response */ struct LoadSomeDataResponse { 1: optional map<i64, SomeData> data } @jessicamtai /** Batch request for demo data */ struct LoadSomeDataRequest { 1: optional set<i64> ids /** Some extra context baz */ 2: optional bool fooBar (personal) }
  73. 73. ThriftIDL API FRAMEWORK @jessicamtai /** /loadSomeData batch endpoint */
 LoadSomeDataResponse loadSomeData 
 (1: LoadSomeDataRequest request)
 throws (1: SomeException exception1)(
  74. 74. ThriftIDL API FRAMEWORK @jessicamtai accept_replay = "true",
 
 rate_limit = “true", /** /loadSomeData batch endpoint */
 LoadSomeDataResponse loadSomeData 
 (1: LoadSomeDataRequest request)
 throws (1: SomeException exception1)(
  75. 75. ThriftIDL API FRAMEWORK @jessicamtai accept_replay = "true",
 
 rate_limit = “true", slo_error_budget_percent = “0.1”,
 
 slo_success_rate = “99", 
 slo_success_rate_interval_minutes = "5",
 ) /** /loadSomeData batch endpoint */
 LoadSomeDataResponse loadSomeData 
 (1: LoadSomeDataRequest request)
 throws (1: SomeException exception1)(
  76. 76. Thrift annotations @jessicamtai
  77. 77. Block comments 
 from .thrift file @jessicamtai
  78. 78. TRY TRY AGAIN SUCCESS FAIL FAST @jessicamtai
  79. 79. Separateasync workerthreadpools Graceful degradation @jessicamtai
  80. 80. Testing&deploying TIMELINE ProductionCanaryDiffyStaging @jessicamtai Localdev
  81. 81. Testing&deploying TIMELINE ProductionCanaryDiffyStagingLocaldev Dev environment supports shared dev services @jessicamtai
  82. 82. Testing&deploying TIMELINE Replayed production traffic with other staging services ProductionCanaryDiffyStagingLocaldev @jessicamtai
  83. 83. Testing&deploying TIMELINE ProductionCanaryDiffyStagingLocaldev Compare responses from staging against production @jessicamtai
  84. 84. RegressionTesting DIFFY Staging (new code) Primary 
 (old code) Secondary (old code) Raw response differences Non-
 deterministic noise Filtered response differences Diffy Replayed traffic github.com/twitter/diffy @jessicamtai
  85. 85. Testing&deploying TIMELINE ProductionCanaryDiffyStagingLocaldev Deploy to single instance of production @jessicamtai
  86. 86. Testing&deploying TIMELINE ProductionCanaryDiffyStagingLocaldev Confidently deploy to prod @jessicamtai
  87. 87. @jessicamtai
  88. 88. @jessicamtai
  89. 89. EMPOWERING THE MIGRATION @jessicamtai
  90. 90. 2016: 
 One small infra team Airbnb’s SOA progress @jessicamtai
  91. 91. Productculture:
 shipthingsquickly @jessicamtai
  92. 92. Orgchallenges:
 servicebuildinginparallel @jessicamtai
  93. 93. Product Frontend
 Monorail Infrastructure Backend Monorail+services Volunteer
 sysops
 on-call @jessicamtai
  94. 94. @jessicamtai Product Frontend
 Monorail Infrastructure Backend Monorail+services Volunteer
 sysops
 on-call
  95. 95. On-callrotationperteam SERVICE OWNERSHIP Checkout service Pricing service User service Product Team Infrastructure Team @jessicamtai
  96. 96. PROGRESS SO FAR? @jessicamtai
  97. 97. 2016: 
 One small infra team 2019: 
 Whole engineering org Airbnb’s SOA progress @jessicamtai
  98. 98. ! Faster build & deploy times ○ Hours (Monorail) to minutes (service) ○ Fewer reverts ! Quicker bug fixes ! Increased developer productivity & happiness Promisinginitialresults SUCCESS @jessicamtai
  99. 99. ! Lower latency from parallelization ○ Ruby monorail single-threaded ○ Java services multi-threaded ! Search results page 3x faster ! Homes description page 10x faster! Latencyresults SUCCESS @jessicamtai
  100. 100. Monorailfreeze @jessicamtai
  101. 101. 500+ 67% Deploys in MonorailEngineers 2016 @jessicamtai
  102. 102. 500+ 67% Deploys in MonorailEngineers 1500+ 3% 2016 2019 @jessicamtai
  103. 103. Production traffic via API Gateway 45% @jessicamtai
  104. 104. IDL services in production0 450+ @jessicamtai
  105. 105. @jessicamtai
  106. 106. OurcurrentSOA @jessicamtai /findHomeListById /fetchReservations /loadReviewsForUser /getAvailability /createMessage /setBusinessTravel /getCancellations /calculateHomeDemand /fetchDates /getUser Checkout presentation service
  107. 107. Home Pricing Availability Host&guest users Reservation Checkout presentation service @jessicamtai
  108. 108. SOAhasitschallenges CAUTION @jessicamtai
  109. 109. Distributedservices CAUTION @jessicamtai
  110. 110. Multiple,overlappingmigrations CAUTION @jessicamtai
  111. 111. Complexserviceorchestration CAUTION @jessicamtai
  112. 112. SOAmigration TAKEAWAYS @jessicamtai ! Prepare for a long commitment ! Decompose incrementally ! Invest in frameworks, tools, infra teams to scale ! Shift development culture
  113. 113. Lookbothways
 duringyourGreatMigration @jessicamtai
  114. 114. @JESSICAMTAI MICROCPH 2019

×