Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Wix Microservice Stack

1,459 views

Published on

(A talk given at Wix R&D in Dnipro, Ukraine on March 2017. Video available at https://www.youtube.com/watch?v=eIX33mQdkAI&feature=youtu.be)

While microservices are conceptually simple, it's a deep rabbit hole to go down. Deceptively simple questions can have far-reaching implications: Which communication protocol should I choose? Is event-driven the way to go? What monitoring tools should I put in place?

In this talk we'll cover some of the fundamental questions, outline the solutions adopted or developed by Wix, and share our hindsight on what worked well for us, what didn't and thoughts on future directions for our stack.

Published in: Software
  • Be the first to comment

The Wix Microservice Stack

  1. 1. The Wix Microservice Stack Tomer Gabel, Wix March 2017 @ Dnipro, UA
  2. 2. Agenda 1. Topology 2. Networking 3. Structure 4. Operations 5. Beer
  3. 3. Our conceptual system Store Service Checkout Service Cart Service
  4. 4. 1. TOPOLOGY Image: Penrose Steps by Alex Eylar (CC BY-NC-SA 2.0)
  5. 5. Our conceptual system Store Service Checkout Service Cart Service Host A Host B Host C
  6. 6. Topology Topology Service→ host mapping Server inventory Service catalogue Formally, “scheduling”
  7. 7. Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  8. 8. Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  9. 9. Service Scheduling • The middle ground: – Naïve automatic scheduler – Human-configured overrides for zoning, optimization • Easy but limited scale – A few hundred servers
  10. 10. Our conceptual system Store Service Checkout Service Cart Service http://err:42/uh … derp?
  11. 11. Service Discovery Static Dynamic Logical Physical That way madness lies
  12. 12. Service Discovery Static Dynamic Logical Physical
  13. 13. Service Discovery Static Dynamic Logical Physical
  14. 14. In practice • Static topology – Managed with Frying Pan – Exported to Chef – Deployed via configuration files • Live registry in Zookeeper – Deployment only – … for now
  15. 15. 2. NETWORKING Image: Neurons by Birth Into Being (CC BY-NC-SA 2.0)
  16. 16. Back to diagrams Store Service Checkout Service Cart Service
  17. 17. Back to diagrams Store Service Checkout Service Cart Service Protocol
  18. 18. Protocol • RPC-style – Sync or async – Point-to-point • Message passing – Async only – Requires broker Shared Concerns Topology Serialization Operations
  19. 19. Protocol • Wix RPC – RPC-style – Custom JSON – HTTP • Pros/cons – Rock-solid – Sync/blocking – Legacy Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
  20. 20. Protocol • Greyhound –Message-passing –Custom JSON –Kafka • Pros/cons –Async + replayable –Still experimental Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
  21. 21. Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds?
  22. 22. Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds? Frying Pan  Chef  Nginx
  23. 23. To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  24. 24. To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  25. 25. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Server stack (JVM) – Jetty – Spring + Spring MVC – Custom handler • RPC client stack (JVM) – Spring – Proxy classes generated at runtime – AsyncHttpClient
  26. 26. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  27. 27. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  28. 28. Cascade Failures • What is a cascade failure? • Mitigations – Bulkheading – Circuit breakers – Load shedding • We don’t do any of that (mostly)
  29. 29. Does it go? • Short answer: yes. • Battle-tested – Evolving since 2010. – >200 services in production. • Known quantity – Easy to operate – Performs well enough – Known workarounds
  30. 30. Not all is well, though • Polyglot development – Custom client stack – Expensive to port!
  31. 31. Not all is well, though • Polyglot development – Custom client stack – Expensive to port! • Implicit state – Transparently handled by the framework – Thread local storage – Hard to go async! Client Proxy Service A Service B Session info Session info Transaction ID Session info Transaction ID A/B experiment Transaction ID A/B experiment
  32. 32. 3. STRUCTURE
  33. 33. Codebase modeling • A product comprises multiple services • Services have dependencies – Creating a DAG – Tends to cluster around domains • Org structure reflects the clustering (Conway)
  34. 34. Codebase modeling Repository-per-domain • Small repositories • Artifacts built independently • Binary dependencies • Requires specialized tools to manage: – Versions – Build dependencies Monorepo • Repository contains everything • Code is built atomically • Source dependencies • Requires a specialized build tool
  35. 35. At Wix • One repo per domain • Dependencies: – Declared in POMs – Version management via custom plugin – Builds managed by custom tool* • Custom dashboard, “Wix Lifecycle” * Lifecycle – Dependency Management Algorithm
  36. 36. Version management [INFO] QuickRelease /home/builduser/agent01/work/d9922a1c87aee4bb bf1bc8bcfb2eccebc4268651c5f19faa689be6e4 [08:10:55][INFO] Adding tag RC;.;1.20.0 [08:10:56][INFO] Tag RC;.;1.20.0 added successfully [08:10:56][INFO] Working on onboarding-server-web [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar deployable copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar sources copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar tests copied [08:10:56][INFO] onboarding-server-web pom deployed [08:10:57][INFO] Deploying artifacts to release artifacts repository [08:10:57][INFO] Deploying onboarding-server-web to RELEASE [08:10:57][INFO] pushing new pom [08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git push --tag origin master exitValue = 0 • All artifacts share a common parent – Master list of versions • Manually-triggered release builds – Custom release plugin – Increments version – Updates master – Pushes changes to git
  37. 37. 4. OPERATIONS
  38. 38. Back to diagrams Store Service Checkout Service Cart Service How ya doin’?
  39. 39. Health • Host monitoring – Sensu alerts – Usual host metrics – Health-check endpoint in framework • End-to-end – Pingdom • Business – Custom BI toolchain
  40. 40. Instrumentation • Metrics – DropWizard Metrics – Graphite and Anodot – Built-in metrics (RPC, resource pools…) – APIs for custom metrics • Alerts – Anodot, NewRelic – Via PagerDuty
  41. 41. Debugging • Logs – Good old Logback – No centralized aggregation – Not particularly useful • Feature toggle overrides • Distributed tracing
  42. 42. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com

×