Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015

3,815 views

Published on

In 6 years, Wix grew from a small startup with traditional system architecture (based on a monolithic server running on Tomcat, Hibernate, and MySQL) to a company that serves 60 million users. To keep up with this tremendous growth, Wix’s architecture had to evolve from a monolithic system to microservices, using some interesting patterns like CQRS to achieve our goal of building a blazing fast highly scalable and highly available system.

Published in: Engineering
  • Be the first to comment

Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015

  1. 1. Aviran Mordo Head of Back-end Engineering @aviranm linkedin.com/in/aviran aviransplace.com Scaling Wix with Microservices Architecture
  2. 2. Wix in Numbers Over 61M users 1.5M new users/month Static storage is >2PB of data 1.5TB new files/day 3 data centers + 3 clouds (Google, Amazon,Azure) 1.5B HTTP requests/day 900 people work atWix, of which ~ 300 in R&D
  3. 3. Initial Architecture Built for fast development Stateful login (Tomcat session), Ehcache, file uploads No consideration for performance, scalability and testing Intended for short-term use Tomcat, Hibernate, custom web framework Lighttpd (file serving) MySQL DB Wix (Tomcat)
  4. 4. The Monolithic Giant One monolithic server that handled everything Dependency between features Changes in unrelated areas of the system caused deployment of the whole system Failure in unrelated areas will cause system wide downtime
  5. 5. Breaking the System Apart
  6. 6. Concerns and SLA DataValidation Security / Authentication Data consistency Lots of data Edit websites High availability High performance Lots of static files Very high traffic volume Viewport optimization Long tail (immutable) Serving Media High availability High performance High traffic volume Long tail (mutable) View sites, created by Wix editor
  7. 7. Wix Segmentation 1. Editor Segment 3. Public Segment2. Media Segment Networking
  8. 8. HTML Editor Flash Editor MSM Private Media Public Media Editor Segment Public Segment Premium Services eCommerse List DB App Builder App Store App Market Dashboard Statics/me dia Mailer TimeZone Public HTML API Public API (Flash) MSP Public Server HTML Renderer HTML SEO Renderer Flash Renderer Flash SEO Renderer Sitemap Renderer Robots.txt Renderer User Server Template Viewer ContactsHUB Activit y Site Members Provided Mailing Service Comments Snapshoter User Pref Feed Me Shout-out Hotels PETRI Site Pref Dist LoggerSlicer eCom Renderer eCom Cart eCom Checkout eCom Catalog eCom Orders Payment Facade Account Info HTML API HTML Embeder BlogMobile
  9. 9. Microservices Guidelines Each service has its own database (if one is needed) Only one service can write to a specific DB table(s) There may be additional read-only services that directly accesses the DB (for performance reasons) Services are stateless No DB transactions Cache is not a building block, but an optimization
  10. 10. Microservices Tradeoffs Each service has its own database (if one is needed) Easy to scale microservices based on SLA concerns Tradeoff – system complexity, performance Only one service can write to a specific DB table(s) De-coupling architecture – faster development Tradeoff – system complexity / performance May be additional read-only services that accesses the DB Performance gain Tradeoff coupling Services are stateless Easy to scale out (just add more servers) Tradeoff performance / consistency No DB transactions Better DB performance, easier to scale Tradeoff system complexity
  11. 11. 1. Editor Segment
  12. 12. Editor Server Immutable JSON pages (~2.5M / day) Site revisions Active – standby MySQL cross datacenters Editor Server MySQL Active Sites MySQL Archive
  13. 13. Protect The Data DB outage with fast recovery = replication Data poisoning/corruption = revisions / backup Make the data available at all times = data distribution to multiple locations / providers
  14. 14. Browser Editor Server Static Grid Notify Google Cloud Storage MySQL Active Sites MySQL Archive Notify Saving Editor Data Archive (Amazon) Archive (Google) Save Page(s) 200 OK Upload Save Page DC replication Download Page MySQL Archive MySQL Active Sites
  15. 15. Browser Editor Server Static Grid Save Page(s) Save Page Upload Notify Download Page Google Cloud Storage MySQL Archive MySQL Active Sites MySQL Archive DC replication Notify Self Healing Process Archive (Amazon) Archive (Google) MySQL Active Sites 200 OK
  16. 16. No DB Transactions Save each page (JSON) as an atomic operation Page ID is a content based hash (immutable/idempotent) Finalize transaction by sending site header (list of pages) Can generate orphaned pages, not a problem in practice
  17. 17. 2. Media Segment
  18. 18. Prospero – Wix Media Storage 2PB user media files 3M files uploaded daily 800M metadata records Dynamic media processing • Picture resize, crop and sharpen “on the fly” • Watermark • Audio format conversion
  19. 19. T Google Cloud Prospero – Wix Media Manager get image.jpg First fallback Second fallback If not in CDN Amazon x36 T x36 T x32 Austin CDN
  20. 20. 3. Public Segment
  21. 21. Public Segment Roles Routing (resolve URLs) Dispatching (to a renderer) Rendering (HTML,XML,TXT) Public Server HTML Renderer HTML SEO Renderer Flash Renderer Sitemap Renderer Robots.txt Renderer www.example.com Flash SEO Renderer
  22. 22. Public SLA Our goal: 99% response time <100ms at peak traffic
  23. 23. Publish Site Publish site header (a map of pages for a site) Publish routing table Publish site header / routes (CQRS) Editor Segment Public Segment
  24. 24. Built For Speed Minimize out-of-process hops (2 DB, 1 RPC) Lookup tables are cached in memory, updated every few minutes Denormalized data – optimize for read by primary key (MySQL) Minimize business logic
  25. 25. How a Page Gets Rendered Bootstrap HTML template that contains only data Only JavaScript imports JSON data (site-header + dynamic data) No “real” HTML view
  26. 26. Offload rendering work to the browser
  27. 27. The average Intel Core i750 can push up to 7 GFLOPS without overclocking
  28. 28. Why JSON? Easy to parse in JavaScript and Java/Scala Fairly compact text format Highly compressible (5:1 even for small payloads) Easy to fix rendering bugs (just deploy a new code)
  29. 29. Minimum Number of Public Servers Needed to Serve 60M Sites 4
  30. 30. Public SLA Be Available 99.999%
  31. 31. Serving a Site – Sunny Day Archive CDN Statics Browser http://example.wix.com Store HTML to cache HTTP Request Notify site view LB Public Renderer HTML Resources / Media HTTP Request
  32. 32. Serving a Site – DC Lost Archive CDN Statics Browser http://example.wix.com LB Public Renderer LB Public Renderer Change DNS HTTP Request
  33. 33. Serving a Site – Public Lost Archive Browser http://example.wix.com LB Public Renderer Get Cached HTML Version HTML HTTP Request LB Public Renderer Fallback to 2nd DC
  34. 34. Living in the Browser Archive CDN Statics Browser http://example.wix.com LB Public Renderer Editor Pages Fallback JSON / Media HTML HTTP Request Fallback
  35. 35. Summary Identify your critical path and concerns Build redundancy in critical path (for availability) De-normalize data (for performance) Minimize out-of-process hops (for performance) Take advantage of client’s CPU power
  36. 36. Aviran Mordo Head of Back-end Engineering Q&A @aviranm linkedin.com/in/aviran aviransplace.com http://engineering.wix.com http://goo.gl/wlq9Ih @WixEng

×