Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling wix with microservices architecture devoxx London 2015


Published on

Many small startups build their systems on top of a traditional toolset like Tomcat, Hibernate, and MySQL. These systems are used because they facilitate easy development and fast progress, but many of them are monolithic and have limited scalability. So as a startup grows, the team is confronted with the problem of how to evolve the system and make it scalable.

Facing the same dilemma, grew from 0 to 60 million users in just a few years. Facing some interesting challenges, like performance and availability. Traditional performance solutions, such as caching, would not help due to a very long tail problem which causes caching to be highly inefficient. And because every minute of downtime means customers lose money, the product needed to have near 100% availability.

Solving these issues required some interesting and out-of-the-box thinking, and this talk will discuss some of these strategies: building a highly preformant, highly available and highly scalable system; and leveraging microservices architecture and multi-cloud platforms to help build a very efficient and cost-effective system.

Published in: Software
  • Be the first to comment

Scaling wix with microservices architecture devoxx London 2015

  1. 1. Aviran Mordo Head of Back-end Engineering @aviranm Scaling with Microservices Architecture and Multi-cloud platforms
  2. 2. Wix in Numbers Over 66M users Static storage is >2PB of data 3 data centers + 3 clouds (Google, Amazon,Azure) 2B HTTP requests/day 1000 people work atWix
  3. 3. Initial Architecture Built for fast development Stateful login (Tomcat session), Ehcache, file uploads No consideration for performance, scalability and testing Intended for short-term use Tomcat, Hibernate, custom web framework Lighttpd (file serving) MySQL DB Wix (Tomcat)
  4. 4. The Monolithic Giant One monolithic server that handled everything Dependency between features Changes in unrelated areas of the system caused deployment of the whole system Failure in unrelated areas will cause system wide downtime
  5. 5. Breaking the System Apart
  6. 6. Concerns and SLA DataValidation Security / Authentication Data consistency Lots of data Edit websites High availability High performance Lots of static files Very high traffic volume Viewport optimization Long tail (immutable) Serving Media High availability High performance High traffic volume Long tail (mutable) View sites, created by Wix editor
  7. 7. Wix Segmentation 1. Editor Segment 3. Public Segment2. Media Segment Networking
  8. 8. HTML Editor Flash Editor MSM Private Media Public Media Editor Segment Public Segment Premium Services eCommerse List DB App Builder App Store App Market Dashboard Statics/me dia Mailer TimeZone Public HTML API Public API (Flash) MSP Public Server HTML Renderer HTML SEO Renderer Flash Renderer Flash SEO Renderer Sitemap Renderer Robots.txt Renderer User Server Template Viewer ContactsHUB Activit y Site Members Provided Mailing Service Comments Snapshoter User Pref Feed Me Shout-out Hotels PETRI Site Pref Dist LoggerSlicer eCom Renderer eCom Cart eCom Checkout eCom Catalog eCom Orders Payment Facade Account Info HTML API HTML Embeder BlogMobile
  9. 9. Microservices Guidelines Each service has its own DB schema (if one is needed) Only one service should write to a specific DB table(s) There may be additional read-only services that directly accesses the DB (for performance reasons) Services are stateless No DB transactions Cache is not a building block, but an optimization
  10. 10. It is all about
  11. 11. Microservices Tradeoffs Each service has its own DB schema (if one is needed) Gain - Easy to scale microservices based on service level concerns Tradeoff – system complexity, performance Only one service should write to a specific DB table(s) Gain - Decoupling architecture – faster development Tradeoff – system complexity / performance May have additional read-only services that accesses the DB Gain - Performance gain Tradeoff - coupling Services are stateless Gain - Easy to scale out (just add more servers) Tradeoff - performance / consistency No DB transactions Gain - Better DB performance, easier to scale Tradeoff - system complexity
  12. 12. 1. Editor Segment
  13. 13. Editor Server Immutable JSON pages (~3M / day) Site revisions Active – standby MySQL cross datacenters Editor Server MySQL Active Sites MySQL Archive
  14. 14. Protect The Data DB outage with fast recovery = replication Data poisoning/corruption = revisions / backup Make the data available at all times = data distribution to multiple locations / providers
  15. 15. Browser Editor Server GCS MySQL Active Sites MySQL Archive Saving Editor Data WixMedia (Amazon) WixMedia (Google) Save Page(s) 200 OK Upload Save Page DC replication Notify MySQL Archive MySQL Active Sites S3 WixMedia (DC-1)
  16. 16. Browser Editor Server GCS MySQL Active Sites MySQL Archive WixMedia (Amazon) WixMedia (Google) Save Page(s) 200 OK Upload Save Page DC replication Notify MySQL Archive MySQL Active Sites S3 WixMedia (DC-1) Self Healing Process
  17. 17. No DB Transactions Save each page (JSON) as an atomic operation Page ID is a content based hash (immutable/idempotent) Finalize transaction by sending site header (list of pages) Can generate orphaned pages, not a problem in practice
  18. 18. 2. Media Segment (WixMP)
  19. 19. Wix Media Platform (WixMP) Eventual consistent distributed file system (2PB user media files) Dynamic media processing Multi datacenter aware Automatic fallback cross DC Run on commodity servers & cloud
  20. 20. T Google Cloud Prospero – Wix Media Manager get image.jpg First fallback Second fallback If not in CDN Amazon x36 T x36 T x32 Austin CDN
  21. 21. 3. Public Segment
  22. 22. Public Segment Roles Routing (resolve URLs) Dispatching (to a renderer) Rendering (HTML,XML,TXT) Public Server HTML Renderer HTML SEO Renderer Flash Renderer Sitemap Renderer Robots.txt Renderer Flash SEO Renderer
  23. 23. Public SLA Our goal: 99% response time <100ms at peak traffic
  24. 24. Publish Site Publish site header (a map of pages for a site) Publish routing table Publish site header / routes (CQRS) Editor Segment Public Segment
  25. 25. Built For Speed Minimize out-of-process hops (2 DB, 1 RPC) Lookup tables are cached in memory, updated every few minutes Denormalized data – optimize for read by primary key (MySQL) Minimize business logic
  26. 26. How a Page Gets Rendered Bootstrap HTML template that contains only data Only JavaScript imports JSON data (site-header + dynamic data) No “real” HTML view
  27. 27. Offload rendering work to the browser
  28. 28. The average Intel Core i750 can push up to 7 GFLOPS without overclocking
  29. 29. Why JSON? Easy to parse in JavaScript and Java/Scala Fairly compact text format Highly compressible (5:1 even for small payloads) Easy to fix rendering bugs and cross browsers issues (just deploy a new code)
  30. 30. Minimum Number of Public Servers Needed to Serve 66M Sites 4
  31. 31. Public SLA Be Available 99.999%
  32. 32. Serving a Site – Sunny Day Archive CDN WixMP Browser Store HTML to cache HTTP Request Notify site view LB Public Renderer HTML Resources / Media HTTP Request
  33. 33. Serving a Site – DC Lost Archive CDN WixMP Browser LB Public Renderer LB Public Renderer Change DNS HTTP Request
  34. 34. Serving a Site – Public Lost Archive Browser LB Public Renderer Get Cached HTML Version HTML HTTP Request LB Public Renderer Fallback to 2nd DC
  35. 35. Living in the Browser CDN WixMP Browser LB Public Renderer Editor Pages Fallback JSON / Media HTML HTTP Request Fallback
  36. 36. Summary Identify concerns and SLA for different parts of the system Build redundancy in critical path (for availability) De-normalize data (for performance) Minimize out-of-process hops (for performance) Take advantage of client’s CPU power
  37. 37. Q&A @aviranm @WixEng Aviran Mordo Head of Back-end Engineering