The Wix Microservice Stack

Tomer Gabel
Tomer GabelConsulting Engineer at Substrate Software Services
The Wix Microservice
Stack
Tomer Gabel, Wix
March 2017 @ Dnipro, UA
Agenda
1. Topology
2. Networking
3. Structure
4. Operations
5. Beer
Our conceptual system
Store Service
Checkout
Service
Cart Service
1. TOPOLOGY
Image: Penrose Steps by Alex Eylar (CC BY-NC-SA 2.0)
Our conceptual system
Store Service
Checkout
Service
Cart Service
Host A
Host B Host C
Topology
Topology
Service→
host
mapping
Server
inventory
Service
catalogue
Formally,
“scheduling”
Service Scheduling
• A hard problem!
• Multiple dimensions:
– Resource utilization
(disk space, I/O, RAM,
network, power…)
– Resource availability
– Failover
(physical server, rack, row…)
– Custom constraints
(zoning, e.g. PCI compliance)
Service Scheduling
• A hard problem!
• Multiple dimensions:
– Resource utilization
(disk space, I/O, RAM,
network, power…)
– Resource availability
– Failover
(physical server, rack, row…)
– Custom constraints
(zoning, e.g. PCI compliance)
Service Scheduling
• The middle ground:
– Naïve automatic
scheduler
– Human-configured
overrides for zoning,
optimization
• Easy but limited scale
– A few hundred servers
Our conceptual system
Store Service
Checkout
Service
Cart Service
http://err:42/uh
… derp?
Service Discovery
Static Dynamic
Logical
Physical
That way
madness lies
Service Discovery
Static Dynamic
Logical
Physical
Service Discovery
Static Dynamic
Logical
Physical
In practice
• Static topology
– Managed with Frying Pan
– Exported to Chef
– Deployed via
configuration files
• Live registry in
Zookeeper
– Deployment only
– … for now
2. NETWORKING
Image: Neurons by Birth Into Being (CC BY-NC-SA 2.0)
Back to diagrams
Store Service
Checkout
Service
Cart Service
Back to diagrams
Store Service
Checkout
Service
Cart Service
Protocol
Protocol
• RPC-style
– Sync or async
– Point-to-point
• Message passing
– Async only
– Requires broker
Shared
Concerns
Topology Serialization Operations
Protocol
• Wix RPC
– RPC-style
– Custom JSON
– HTTP
• Pros/cons
– Rock-solid
– Sync/blocking
– Legacy
Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
Protocol
• Greyhound
–Message-passing
–Custom JSON
–Kafka
• Pros/cons
–Async + replayable
–Still experimental
Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
Load balancing
• Centralized
– Simple
– Limited flexibility
– Limited scale
– Thin implementation
 highly portable
– Suitable for static
topologies
• Distributed
– Highly scalable
– Flexible
– Fully dynamic
– Fat implementation
 difficult to port
• Quasi-distributed
– e.g. Synapse
– Best of both worlds?
Load balancing
• Centralized
– Simple
– Limited flexibility
– Limited scale
– Thin implementation
 highly portable
– Suitable for static
topologies
• Distributed
– Highly scalable
– Flexible
– Fully dynamic
– Fat implementation
 difficult to port
• Quasi-distributed
– e.g. Synapse
– Best of both worlds?
Frying Pan
 Chef
 Nginx
To our shame
• There’s always IDL.
• Informal
– Text documentation
– Code samples
• Formal
– Swagger, Apiary etc.
– ProtoBuf, Thrift, Avro
– WSDL, god forbid!
• … or
– Ad-hoc
public interface SiteMembersService {
SiteMemberDto getMemberById(
Guid<SiteMember> memberId,
UserGuid userId);
SiteMemberDto getMemberOrOwnerById(
Guid<SiteMember> memberId,
Guid<SMCollection> collectionId);
SiteMemberDto getMemberDtoByEmailAndCollectionId(
String email,
Guid<SMCollection> collectionId);
List<SiteMemberDto> listMembersByCollectionId(
Guid<SMCollection> collectionId);
}
To our shame
• There’s always IDL.
• Informal
– Text documentation
– Code samples
• Formal
– Swagger, Apiary etc.
– ProtoBuf, Thrift, Avro
– WSDL, god forbid!
• … or
– Ad-hoc
public interface SiteMembersService {
SiteMemberDto getMemberById(
Guid<SiteMember> memberId,
UserGuid userId);
SiteMemberDto getMemberOrOwnerById(
Guid<SiteMember> memberId,
Guid<SMCollection> collectionId);
SiteMemberDto getMemberDtoByEmailAndCollectionId(
String email,
Guid<SMCollection> collectionId);
List<SiteMemberDto> listMembersByCollectionId(
Guid<SMCollection> collectionId);
}
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Server stack (JVM)
– Jetty
– Spring + Spring MVC
– Custom handler
• RPC client stack (JVM)
– Spring
– Proxy classes
generated at runtime
– AsyncHttpClient
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Alternative stack
– Based on Node.js
– Generated RPC clients
– Manually-converted
entity schema :-(
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Alternative stack
– Based on Node.js
– Generated RPC clients
– Manually-converted
entity schema :-(
Cascade Failures
• What is a
cascade failure?
• Mitigations
– Bulkheading
– Circuit breakers
– Load shedding
• We don’t do any
of that (mostly)
Does it go?
• Short answer: yes.
• Battle-tested
– Evolving since 2010.
– >200 services in
production.
• Known quantity
– Easy to operate
– Performs well enough
– Known workarounds
Not all is well, though
• Polyglot development
– Custom client stack
– Expensive to port!
Not all is well, though
• Polyglot development
– Custom client stack
– Expensive to port!
• Implicit state
– Transparently handled
by the framework
– Thread local storage
– Hard to go async!
Client Proxy
Service A
Service B
Session info
Session info
Transaction ID
Session info
Transaction ID
A/B experiment
Transaction ID
A/B experiment
3. STRUCTURE
Codebase modeling
• A product comprises
multiple services
• Services have
dependencies
– Creating a DAG
– Tends to cluster
around domains
• Org structure reflects
the clustering
(Conway)
Codebase modeling
Repository-per-domain
• Small repositories
• Artifacts built
independently
• Binary dependencies
• Requires specialized tools
to manage:
– Versions
– Build dependencies
Monorepo
• Repository contains
everything
• Code is built atomically
• Source dependencies
• Requires a specialized
build tool
At Wix
• One repo per domain
• Dependencies:
– Declared in POMs
– Version management
via custom plugin
– Builds managed by
custom tool*
• Custom dashboard,
“Wix Lifecycle”
* Lifecycle – Dependency Management Algorithm
Version management
[INFO] QuickRelease
/home/builduser/agent01/work/d9922a1c87aee4bb
bf1bc8bcfb2eccebc4268651c5f19faa689be6e4
[08:10:55][INFO] Adding tag RC;.;1.20.0
[08:10:56][INFO] Tag RC;.;1.20.0 added
successfully
[08:10:56][INFO] Working on onboarding-server-web
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar deployable copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar sources copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar tests copied
[08:10:56][INFO] onboarding-server-web pom
deployed
[08:10:57][INFO] Deploying artifacts to release
artifacts repository
[08:10:57][INFO] Deploying onboarding-server-web
to RELEASE
[08:10:57][INFO] pushing new pom
[08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git
push --tag origin master exitValue = 0
• All artifacts share a
common parent
– Master list of versions
• Manually-triggered
release builds
– Custom release plugin
– Increments version
– Updates master
– Pushes changes to git
4. OPERATIONS
Back to diagrams
Store Service
Checkout
Service
Cart Service
How ya
doin’?
Health
• Host monitoring
– Sensu alerts
– Usual host metrics
– Health-check endpoint
in framework
• End-to-end
– Pingdom
• Business
– Custom BI toolchain
Instrumentation
• Metrics
– DropWizard Metrics
– Graphite and Anodot
– Built-in metrics (RPC,
resource pools…)
– APIs for custom
metrics
• Alerts
– Anodot, NewRelic
– Via PagerDuty
Debugging
• Logs
– Good old Logback
– No centralized
aggregation
– Not particularly useful
• Feature toggle
overrides
• Distributed tracing
WE’RE DONE HERE!
… AND YES, WE’RE HIRING :-)
Thank you for listening
tomer@tomergabel.com
@tomerg
http://il.linkedin.com/in/tomergabel
Wix Engineering blog:
http://engineering.wix.com
1 of 42

More Related Content

Similar to The Wix Microservice Stack(20)

Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
Yves Goeleven518 views
Cloud and Windows AzureCloud and Windows Azure
Cloud and Windows Azure
Radu Vunvulea633 views
Windows Azure introductionWindows Azure introduction
Windows Azure introduction
Microsoft Iceland388 views
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes1.3K views
MicroservicesMicroservices
Microservices
Salesforce Engineering790 views
Drupal performanceDrupal performance
Drupal performance
Piyuesh Kumar234 views
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode778 views
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
Michele Leroux Bustamante1.2K views
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
Kevin Webber4.1K views
Performance outPerformance out
Performance out
Ramu Alagappan841 views

More from Tomer Gabel(20)

The Wix Microservice Stack

  • 1. The Wix Microservice Stack Tomer Gabel, Wix March 2017 @ Dnipro, UA
  • 2. Agenda 1. Topology 2. Networking 3. Structure 4. Operations 5. Beer
  • 3. Our conceptual system Store Service Checkout Service Cart Service
  • 4. 1. TOPOLOGY Image: Penrose Steps by Alex Eylar (CC BY-NC-SA 2.0)
  • 5. Our conceptual system Store Service Checkout Service Cart Service Host A Host B Host C
  • 7. Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  • 8. Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  • 9. Service Scheduling • The middle ground: – Naïve automatic scheduler – Human-configured overrides for zoning, optimization • Easy but limited scale – A few hundred servers
  • 10. Our conceptual system Store Service Checkout Service Cart Service http://err:42/uh … derp?
  • 14. In practice • Static topology – Managed with Frying Pan – Exported to Chef – Deployed via configuration files • Live registry in Zookeeper – Deployment only – … for now
  • 15. 2. NETWORKING Image: Neurons by Birth Into Being (CC BY-NC-SA 2.0)
  • 16. Back to diagrams Store Service Checkout Service Cart Service
  • 17. Back to diagrams Store Service Checkout Service Cart Service Protocol
  • 18. Protocol • RPC-style – Sync or async – Point-to-point • Message passing – Async only – Requires broker Shared Concerns Topology Serialization Operations
  • 19. Protocol • Wix RPC – RPC-style – Custom JSON – HTTP • Pros/cons – Rock-solid – Sync/blocking – Legacy Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
  • 20. Protocol • Greyhound –Message-passing –Custom JSON –Kafka • Pros/cons –Async + replayable –Still experimental Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
  • 21. Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds?
  • 22. Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds? Frying Pan  Chef  Nginx
  • 23. To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  • 24. To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  • 25. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Server stack (JVM) – Jetty – Spring + Spring MVC – Custom handler • RPC client stack (JVM) – Spring – Proxy classes generated at runtime – AsyncHttpClient
  • 26. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  • 27. In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  • 28. Cascade Failures • What is a cascade failure? • Mitigations – Bulkheading – Circuit breakers – Load shedding • We don’t do any of that (mostly)
  • 29. Does it go? • Short answer: yes. • Battle-tested – Evolving since 2010. – >200 services in production. • Known quantity – Easy to operate – Performs well enough – Known workarounds
  • 30. Not all is well, though • Polyglot development – Custom client stack – Expensive to port!
  • 31. Not all is well, though • Polyglot development – Custom client stack – Expensive to port! • Implicit state – Transparently handled by the framework – Thread local storage – Hard to go async! Client Proxy Service A Service B Session info Session info Transaction ID Session info Transaction ID A/B experiment Transaction ID A/B experiment
  • 33. Codebase modeling • A product comprises multiple services • Services have dependencies – Creating a DAG – Tends to cluster around domains • Org structure reflects the clustering (Conway)
  • 34. Codebase modeling Repository-per-domain • Small repositories • Artifacts built independently • Binary dependencies • Requires specialized tools to manage: – Versions – Build dependencies Monorepo • Repository contains everything • Code is built atomically • Source dependencies • Requires a specialized build tool
  • 35. At Wix • One repo per domain • Dependencies: – Declared in POMs – Version management via custom plugin – Builds managed by custom tool* • Custom dashboard, “Wix Lifecycle” * Lifecycle – Dependency Management Algorithm
  • 36. Version management [INFO] QuickRelease /home/builduser/agent01/work/d9922a1c87aee4bb bf1bc8bcfb2eccebc4268651c5f19faa689be6e4 [08:10:55][INFO] Adding tag RC;.;1.20.0 [08:10:56][INFO] Tag RC;.;1.20.0 added successfully [08:10:56][INFO] Working on onboarding-server-web [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar deployable copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar sources copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar tests copied [08:10:56][INFO] onboarding-server-web pom deployed [08:10:57][INFO] Deploying artifacts to release artifacts repository [08:10:57][INFO] Deploying onboarding-server-web to RELEASE [08:10:57][INFO] pushing new pom [08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git push --tag origin master exitValue = 0 • All artifacts share a common parent – Master list of versions • Manually-triggered release builds – Custom release plugin – Increments version – Updates master – Pushes changes to git
  • 38. Back to diagrams Store Service Checkout Service Cart Service How ya doin’?
  • 39. Health • Host monitoring – Sensu alerts – Usual host metrics – Health-check endpoint in framework • End-to-end – Pingdom • Business – Custom BI toolchain
  • 40. Instrumentation • Metrics – DropWizard Metrics – Graphite and Anodot – Built-in metrics (RPC, resource pools…) – APIs for custom metrics • Alerts – Anodot, NewRelic – Via PagerDuty
  • 41. Debugging • Logs – Good old Logback – No centralized aggregation – Not particularly useful • Feature toggle overrides • Distributed tracing
  • 42. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com