The Wix Microservice
Stack
Tomer Gabel, Wix
March 2017 @ Dnipro, UA
Agenda
1. Topology
2. Networking
3. Structure
4. Operations
5. Beer
Our conceptual system
Store Service
Checkout
Service
Cart Service
1. TOPOLOGY
Image: Penrose Steps by Alex Eylar (CC BY-NC-SA 2.0)
Our conceptual system
Store Service
Checkout
Service
Cart Service
Host A
Host B Host C
Topology
Topology
Service→
host
mapping
Server
inventory
Service
catalogue
Formally,
“scheduling”
Service Scheduling
• A hard problem!
• Multiple dimensions:
– Resource utilization
(disk space, I/O, RAM,
network, power…)
– Resource availability
– Failover
(physical server, rack, row…)
– Custom constraints
(zoning, e.g. PCI compliance)
Service Scheduling
• A hard problem!
• Multiple dimensions:
– Resource utilization
(disk space, I/O, RAM,
network, power…)
– Resource availability
– Failover
(physical server, rack, row…)
– Custom constraints
(zoning, e.g. PCI compliance)
Service Scheduling
• The middle ground:
– Naïve automatic
scheduler
– Human-configured
overrides for zoning,
optimization
• Easy but limited scale
– A few hundred servers
Our conceptual system
Store Service
Checkout
Service
Cart Service
http://err:42/uh
… derp?
Service Discovery
Static Dynamic
Logical
Physical
That way
madness lies
Service Discovery
Static Dynamic
Logical
Physical
Service Discovery
Static Dynamic
Logical
Physical
In practice
• Static topology
– Managed with Frying Pan
– Exported to Chef
– Deployed via
configuration files
• Live registry in
Zookeeper
– Deployment only
– … for now
2. NETWORKING
Image: Neurons by Birth Into Being (CC BY-NC-SA 2.0)
Back to diagrams
Store Service
Checkout
Service
Cart Service
Back to diagrams
Store Service
Checkout
Service
Cart Service
Protocol
Protocol
• RPC-style
– Sync or async
– Point-to-point
• Message passing
– Async only
– Requires broker
Shared
Concerns
Topology Serialization Operations
Protocol
• Wix RPC
– RPC-style
– Custom JSON
– HTTP
• Pros/cons
– Rock-solid
– Sync/blocking
– Legacy
Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
Protocol
• Greyhound
–Message-passing
–Custom JSON
–Kafka
• Pros/cons
–Async + replayable
–Still experimental
Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
Load balancing
• Centralized
– Simple
– Limited flexibility
– Limited scale
– Thin implementation
 highly portable
– Suitable for static
topologies
• Distributed
– Highly scalable
– Flexible
– Fully dynamic
– Fat implementation
 difficult to port
• Quasi-distributed
– e.g. Synapse
– Best of both worlds?
Load balancing
• Centralized
– Simple
– Limited flexibility
– Limited scale
– Thin implementation
 highly portable
– Suitable for static
topologies
• Distributed
– Highly scalable
– Flexible
– Fully dynamic
– Fat implementation
 difficult to port
• Quasi-distributed
– e.g. Synapse
– Best of both worlds?
Frying Pan
 Chef
 Nginx
To our shame
• There’s always IDL.
• Informal
– Text documentation
– Code samples
• Formal
– Swagger, Apiary etc.
– ProtoBuf, Thrift, Avro
– WSDL, god forbid!
• … or
– Ad-hoc
public interface SiteMembersService {
SiteMemberDto getMemberById(
Guid<SiteMember> memberId,
UserGuid userId);
SiteMemberDto getMemberOrOwnerById(
Guid<SiteMember> memberId,
Guid<SMCollection> collectionId);
SiteMemberDto getMemberDtoByEmailAndCollectionId(
String email,
Guid<SMCollection> collectionId);
List<SiteMemberDto> listMembersByCollectionId(
Guid<SMCollection> collectionId);
}
To our shame
• There’s always IDL.
• Informal
– Text documentation
– Code samples
• Formal
– Swagger, Apiary etc.
– ProtoBuf, Thrift, Avro
– WSDL, god forbid!
• … or
– Ad-hoc
public interface SiteMembersService {
SiteMemberDto getMemberById(
Guid<SiteMember> memberId,
UserGuid userId);
SiteMemberDto getMemberOrOwnerById(
Guid<SiteMember> memberId,
Guid<SMCollection> collectionId);
SiteMemberDto getMemberDtoByEmailAndCollectionId(
String email,
Guid<SMCollection> collectionId);
List<SiteMemberDto> listMembersByCollectionId(
Guid<SMCollection> collectionId);
}
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Server stack (JVM)
– Jetty
– Spring + Spring MVC
– Custom handler
• RPC client stack (JVM)
– Spring
– Proxy classes
generated at runtime
– AsyncHttpClient
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Alternative stack
– Based on Node.js
– Generated RPC clients
– Manually-converted
entity schema :-(
In Detail
• Java interfaces?
+ Ridiculously simple
+ Lend well to RPC
– Coupled to JVM
• JSON serialization
+ Jackson-based
+ Custom, extensible
mapping
– Reflection-based
• Alternative stack
– Based on Node.js
– Generated RPC clients
– Manually-converted
entity schema :-(
Cascade Failures
• What is a
cascade failure?
• Mitigations
– Bulkheading
– Circuit breakers
– Load shedding
• We don’t do any
of that (mostly)
Does it go?
• Short answer: yes.
• Battle-tested
– Evolving since 2010.
– >200 services in
production.
• Known quantity
– Easy to operate
– Performs well enough
– Known workarounds
Not all is well, though
• Polyglot development
– Custom client stack
– Expensive to port!
Not all is well, though
• Polyglot development
– Custom client stack
– Expensive to port!
• Implicit state
– Transparently handled
by the framework
– Thread local storage
– Hard to go async!
Client Proxy
Service A
Service B
Session info
Session info
Transaction ID
Session info
Transaction ID
A/B experiment
Transaction ID
A/B experiment
3. STRUCTURE
Codebase modeling
• A product comprises
multiple services
• Services have
dependencies
– Creating a DAG
– Tends to cluster
around domains
• Org structure reflects
the clustering
(Conway)
Codebase modeling
Repository-per-domain
• Small repositories
• Artifacts built
independently
• Binary dependencies
• Requires specialized tools
to manage:
– Versions
– Build dependencies
Monorepo
• Repository contains
everything
• Code is built atomically
• Source dependencies
• Requires a specialized
build tool
At Wix
• One repo per domain
• Dependencies:
– Declared in POMs
– Version management
via custom plugin
– Builds managed by
custom tool*
• Custom dashboard,
“Wix Lifecycle”
* Lifecycle – Dependency Management Algorithm
Version management
[INFO] QuickRelease
/home/builduser/agent01/work/d9922a1c87aee4bb
bf1bc8bcfb2eccebc4268651c5f19faa689be6e4
[08:10:55][INFO] Adding tag RC;.;1.20.0
[08:10:56][INFO] Tag RC;.;1.20.0 added
successfully
[08:10:56][INFO] Working on onboarding-server-web
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar deployable copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar sources copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar copied
[08:10:56][INFO] onboarding-server-web-1.19.0-
SNAPSHOT jar tests copied
[08:10:56][INFO] onboarding-server-web pom
deployed
[08:10:57][INFO] Deploying artifacts to release
artifacts repository
[08:10:57][INFO] Deploying onboarding-server-web
to RELEASE
[08:10:57][INFO] pushing new pom
[08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git
push --tag origin master exitValue = 0
• All artifacts share a
common parent
– Master list of versions
• Manually-triggered
release builds
– Custom release plugin
– Increments version
– Updates master
– Pushes changes to git
4. OPERATIONS
Back to diagrams
Store Service
Checkout
Service
Cart Service
How ya
doin’?
Health
• Host monitoring
– Sensu alerts
– Usual host metrics
– Health-check endpoint
in framework
• End-to-end
– Pingdom
• Business
– Custom BI toolchain
Instrumentation
• Metrics
– DropWizard Metrics
– Graphite and Anodot
– Built-in metrics (RPC,
resource pools…)
– APIs for custom
metrics
• Alerts
– Anodot, NewRelic
– Via PagerDuty
Debugging
• Logs
– Good old Logback
– No centralized
aggregation
– Not particularly useful
• Feature toggle
overrides
• Distributed tracing
WE’RE DONE HERE!
… AND YES, WE’RE HIRING :-)
Thank you for listening
tomer@tomergabel.com
@tomerg
http://il.linkedin.com/in/tomergabel
Wix Engineering blog:
http://engineering.wix.com

The Wix Microservice Stack

  • 1.
    The Wix Microservice Stack TomerGabel, Wix March 2017 @ Dnipro, UA
  • 2.
    Agenda 1. Topology 2. Networking 3.Structure 4. Operations 5. Beer
  • 3.
    Our conceptual system StoreService Checkout Service Cart Service
  • 4.
    1. TOPOLOGY Image: PenroseSteps by Alex Eylar (CC BY-NC-SA 2.0)
  • 5.
    Our conceptual system StoreService Checkout Service Cart Service Host A Host B Host C
  • 6.
  • 7.
    Service Scheduling • Ahard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  • 8.
    Service Scheduling • Ahard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  • 9.
    Service Scheduling • Themiddle ground: – Naïve automatic scheduler – Human-configured overrides for zoning, optimization • Easy but limited scale – A few hundred servers
  • 10.
    Our conceptual system StoreService Checkout Service Cart Service http://err:42/uh … derp?
  • 11.
  • 12.
  • 13.
  • 14.
    In practice • Statictopology – Managed with Frying Pan – Exported to Chef – Deployed via configuration files • Live registry in Zookeeper – Deployment only – … for now
  • 15.
    2. NETWORKING Image: Neuronsby Birth Into Being (CC BY-NC-SA 2.0)
  • 16.
    Back to diagrams StoreService Checkout Service Cart Service
  • 17.
    Back to diagrams StoreService Checkout Service Cart Service Protocol
  • 18.
    Protocol • RPC-style – Syncor async – Point-to-point • Message passing – Async only – Requires broker Shared Concerns Topology Serialization Operations
  • 19.
    Protocol • Wix RPC –RPC-style – Custom JSON – HTTP • Pros/cons – Rock-solid – Sync/blocking – Legacy Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
  • 20.
    Protocol • Greyhound –Message-passing –Custom JSON –Kafka •Pros/cons –Async + replayable –Still experimental Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
  • 21.
    Load balancing • Centralized –Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds?
  • 22.
    Load balancing • Centralized –Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds? Frying Pan  Chef  Nginx
  • 23.
    To our shame •There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  • 24.
    To our shame •There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  • 25.
    In Detail • Javainterfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Server stack (JVM) – Jetty – Spring + Spring MVC – Custom handler • RPC client stack (JVM) – Spring – Proxy classes generated at runtime – AsyncHttpClient
  • 26.
    In Detail • Javainterfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  • 27.
    In Detail • Javainterfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  • 28.
    Cascade Failures • Whatis a cascade failure? • Mitigations – Bulkheading – Circuit breakers – Load shedding • We don’t do any of that (mostly)
  • 29.
    Does it go? •Short answer: yes. • Battle-tested – Evolving since 2010. – >200 services in production. • Known quantity – Easy to operate – Performs well enough – Known workarounds
  • 30.
    Not all iswell, though • Polyglot development – Custom client stack – Expensive to port!
  • 31.
    Not all iswell, though • Polyglot development – Custom client stack – Expensive to port! • Implicit state – Transparently handled by the framework – Thread local storage – Hard to go async! Client Proxy Service A Service B Session info Session info Transaction ID Session info Transaction ID A/B experiment Transaction ID A/B experiment
  • 32.
  • 33.
    Codebase modeling • Aproduct comprises multiple services • Services have dependencies – Creating a DAG – Tends to cluster around domains • Org structure reflects the clustering (Conway)
  • 34.
    Codebase modeling Repository-per-domain • Smallrepositories • Artifacts built independently • Binary dependencies • Requires specialized tools to manage: – Versions – Build dependencies Monorepo • Repository contains everything • Code is built atomically • Source dependencies • Requires a specialized build tool
  • 35.
    At Wix • Onerepo per domain • Dependencies: – Declared in POMs – Version management via custom plugin – Builds managed by custom tool* • Custom dashboard, “Wix Lifecycle” * Lifecycle – Dependency Management Algorithm
  • 36.
    Version management [INFO] QuickRelease /home/builduser/agent01/work/d9922a1c87aee4bb bf1bc8bcfb2eccebc4268651c5f19faa689be6e4 [08:10:55][INFO]Adding tag RC;.;1.20.0 [08:10:56][INFO] Tag RC;.;1.20.0 added successfully [08:10:56][INFO] Working on onboarding-server-web [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar deployable copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar sources copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar tests copied [08:10:56][INFO] onboarding-server-web pom deployed [08:10:57][INFO] Deploying artifacts to release artifacts repository [08:10:57][INFO] Deploying onboarding-server-web to RELEASE [08:10:57][INFO] pushing new pom [08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git push --tag origin master exitValue = 0 • All artifacts share a common parent – Master list of versions • Manually-triggered release builds – Custom release plugin – Increments version – Updates master – Pushes changes to git
  • 37.
  • 38.
    Back to diagrams StoreService Checkout Service Cart Service How ya doin’?
  • 39.
    Health • Host monitoring –Sensu alerts – Usual host metrics – Health-check endpoint in framework • End-to-end – Pingdom • Business – Custom BI toolchain
  • 40.
    Instrumentation • Metrics – DropWizardMetrics – Graphite and Anodot – Built-in metrics (RPC, resource pools…) – APIs for custom metrics • Alerts – Anodot, NewRelic – Via PagerDuty
  • 41.
    Debugging • Logs – Goodold Logback – No centralized aggregation – Not particularly useful • Feature toggle overrides • Distributed tracing
  • 42.
    WE’RE DONE HERE! …AND YES, WE’RE HIRING :-) Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com