Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Wix Dev-Centric Culture And Continuous Delivery

2,827 views

Published on

How Wix is doing continuous delivery and our Dev-Centric culture to support that

Published in: Technology
  • Be the first to comment

Wix Dev-Centric Culture And Continuous Delivery

  1. 1. Wix Dev-Centric Culture Aviran Mordo Head Of Back-End Engineering @ Wix @aviranm http://www.linkedin.com/in/aviran 04:30
  2. 2. 04:30
  3. 3. Wix In Numbers • Over 45,000,000 users – >1M new users/month • Static storage is >800TB of data – >1.5TB new files/day • 3 Data centers + 2 Clouds (Google, Amazon) – ~300 servers • >700M HTTP requests/day • ~600 people work at Wix – Of which ~ 200 in R&D
  4. 4. Traditional Dev Pipeline Product Dev QA Operations 04:30
  5. 5. 04:30
  6. 6. 04:30 Product Dev QA Operations
  7. 7. 04:30
  8. 8. SCRUM 04:30
  9. 9. 04:30 Lean Agile SCRUM XP SCRUM != Agile
  10. 10. Jan 2014 Deployments (production changes) per month Every 9 minutes production changes its state (during working hours)
  11. 11. Do You Have The Guts To Deploy 60 Times A Day? 04:30
  12. 12. 04:30
  13. 13. Where We Were • We were working traditional waterfall • With fear of change – It is working, why touch it? – Uploading a release means downtime and bugs! • With low product quality • With slow development velocity • With tradition enterprise development lifecycle – Three months of a “VERSION” development and QA – Six months of crisis mode cleaning bugs and stabilizing system
  14. 14. 04:30
  15. 15. 04:30 Taiichi Ohno
  16. 16. Lean Product development “Top 5 Most-Used Commands in Microsoft Word • Paste • Save • Copy • Undo • Bold These five commands account for around 32% of the total command use in Word. Paste itself accounts for more than 11% of all commands used, and has more than twice as much usage as the #2 entry on the list, Save. Beyond the top 10 commands, the curve flattens out considerably. The percentage difference in usage between the #100 command ("Accept Change") and the #400 command ("Reset Picture") is about the same in difference between #1 and #11 ("Change Font Size") “
  17. 17. Scaling challenges – Product • Product Minimum Viable Product (MVP) – Does MVP meet your product standards? • What about tooltip, help,first time ux, etc.. ? – How to define a product that can be developed in a day ? – And that can win in a/b test … To Be Implemented
  18. 18. Get out of thought land • The law of failure – Most new “its” will fail even if they are flawlessly executed • Invest less, in-touch less , better ability to admit it fail – Data beats opinions - let the customer decide make sure you building the right it before build it right Quick Feedback
  19. 19. 04:30
  20. 20. Risk • Waterfall - minimize number of deployments • CD - minimize number of changes and impact in $$ 04:30 Risk = #deployments * chance of something going wrong (~ number of changes) * impact of something wrong in $$
  21. 21. Small Development Iterations • No Waterfall • No Scrum • No Iterations • No long documents • Build something small • When it is ready, deploy it – Measure it – Then fix it – Again – And again, until Dev, Product and Customers are happy • Then start changing it – Again, as a small change
  22. 22. Product/Dev/QA/Ops boundaries are going down
  23. 23. What Is The Common Denominator? • Product manager • Project manager • QA • Operations • DBA
  24. 24. CD is culture & mindset • Trust the developers – Empower developers to change production – Developer knows his system best • Automation as a default choice – no more “is it worth to automate ? ” – Everything should be automated • Welcome to the twilight zone – Product/Dev/QA boundaries are going down – Everyone need to care about everything – Less formality : Corridor - IN , Meeting Room - Out
  25. 25. Dev Centric Culture – Involve The Developer • Product definition (with product) • Development (with architect) • Testing (with QA developers) • Deployment / Rollback(with operations) • Monitoring / BI (with BI team) • DevOps – to enable deployment and rollback, fully automated
  26. 26. Continuous Delivery – Key points • Abandon the “VERSION” paradigm – move to a feature centric methodology • Make small and frequent release as soon as possible • Automate everything – TDD/CI/CD • Measure everything – A/B test every new feature – Monitor real KPIs (business, not CPU) • Deploy without downtime 04:30
  27. 27. Test Driven Development • No new code is pushed to Git without being fully tested – We currently have around 10,000 automated tests • Before fixing a bug first write a test to reproduce the bug • Cover legacy (untested) systems with Integration tests 04:30
  28. 28. What people think of TDD • TDD slows down development • With TDD we write more code (product + test code). • TDD has no significant impact on quality 04:30
  29. 29. What people think of TDD • TDD slows down development • With TDD we write more code (product + test code). • TDD has no significant impact on quality 04:30
  30. 30. TDD Actual impact on development • We develop products faster • Removes fear of change • Easier to enter some one else’s project • Do we still need QA? (Yes, they code automation tests) – We don’t have QA for back-end applications • Writing a feature is 10-30% slower, 45-90% less bugs • 50% faster to reach production. • Considerably less time to fix bugs 04:30
  31. 31. 04:30
  32. 32. Is Refactoring Rework? Absolutely NOT ! • Refactoring is the outcome of learning • Refactoring is the cornerstone of improvement • Refactoring builds the capacity to change • Refactoring doesn’t cost, it pays 04:30
  33. 33. Refactoring • Refactor from inside out – Small iterations with tests – Refactor small methods - make sure the tests don’t break – Deploy often • Re-write from the outside in – Write from scratch (one piece at a time) – Code duplication sometimes needed (temporary) – Protected by Feature Toggle 04:30 Before refactoring make sure everything is covered with tests - Legacy code usually covered by IT tests
  34. 34. 04:30
  35. 35. Code branch 04:30 New Code Old Code FT Opened Yes No
  36. 36. Usage example Simple “if” statement in your code 04:30
  37. 37. Feature Toggles • Everyone develops on the Trunk • Every piece of code can get to production at anytime 04:30
  38. 38. Feature Toggle to the rescue • Unused new code can go to production – no harm done • Operational new code goes with a guard – use new or old code by feature toggle 04:30
  39. 39. 04:30
  40. 40. DB Schema Changes Without Downtime • Adding columns – Use another table link by primary key – Use blob field for schema flexibility • Removing fields – Stop using. Do not do any DB schema changes 04:30
  41. 41. New DB schema with data migration • Plan a lazy migration path controlled by feature toggle 1. Write to old / Read from old 2. Write to both / Read from old 3. Write to both / Read from new, fallback to old • Backward compatibility is a must 4. Write to new / Read from new, fallback to old 5. Eagerly migrate data in the background 6. Write to new / Read from new 04:30
  42. 42. Feature Toggle Strategies (gradual expose users) • Company employees • Specific users or group of users • Percentage of traffic • By GEO • By Language • By user-agent • User Profile based • By context (site id or some kind of hash on site id) 04:30
  43. 43. Feature Toggle Override • By specific server – Used to test system load – New database flows/migration – Refactoring that may affect performance and memory usage • By Url parameter – Enable internal testing – Product acceptance – Faking GEO • By FT cookie value – Testing – When working with API on a single page application 04:30
  44. 44. 04:30 Wix PETRI
  45. 45. A/B Tests 04:30
  46. 46. A/B Test • Every new feature is A/B tested • We open the new feature to a % of users – Define KPIs to check if the new feature is better or worse – If it is better, we keep it – If worse, we check why and improve – If we find flaws, the impact is just for % of our users (kind of Feature Toggle) 04:30
  47. 47. An interesting site effect on product • How many times did you have the conversion “what is better”? – Put the menu on top / on the side • Well, how about building both and A/B Testing? 04:30
  48. 48. Marking users with toss value in a cookie • Anonymous user – Toss is randomly determined – Can not guarantee persistent experience if changing browser • Registered User – Toss is determined by the user ID – Guarantee toss persistency across browsers – Allows setting additional tossing criteria (for example new users only) – Only use this for sections that a user has to be authenticated 04:30
  49. 49. • Do not mix anonymous and registered tests • AB test parentage of users with optional filters – New Users Only (Registered users only) – By language – By GEO – By Browser – user-agent – OS – Any other criteria you have on your users 04:30
  50. 50. A/B Test Features • A/B Test Override – Allows to set a value of a test for validation – Helps support experience what users experiencing • Override methods – Via URL parameter – Via cookie • Start/Stop Test • Pause tests • Bots always get “A” 04:30
  51. 51. 04:30 NOT !!!
  52. 52. Gradual Deployment 04:30 • Assume two components • We shutdown one and install on it the new version. It is not active yet • Do self test • Activate the new server it is passes self test • Continue deploying the other servers, a few at a time, checking each one with self test A 1.1 B 1.1 A 1.1 B 1.2 A 1.1 A 1.1 B 1.1 B 1.1 A 1.1 A 1.1 B 1.1 B 1.2 A 1.1 B 1.2 A 1.1 A 1.1 B 1.1 B 1.2 A 1.1 B 1.1 A 1.1 A 1.1 B 1.1 B 1.2
  53. 53. Self Test / Post Deployment Test After each server deployment run a self test before deploying the next server. • Checking server configuration and topology – Make sure database is accessible (DB connection string) – Is the schema the one I expect – Access required local resources (data files, other config files, templates, etc’) – Access remote resources – RPC / REST endpoints reachable and operational • Server will refuse requests unless it passes the self test • Allow a way to skip self test (and continue deployment) 04:30
  54. 54. Tools - App-info – Self Test
  55. 55. Backward and Forward compatible • Assume two components • We release a new version of one • Now Rollback the other… 04:30 A 1.1 B 1.2 A 1.2 B 1.1A 1.1A 1.1 B 1.1 B 1.2 A 1.2A 1.1 B 1.1B 1.1 A 1.1 B 1.1A 1.1A 1.1 B 1.1B 1.1 A 1.0 A 1.2A 1.1 B 1.2B 1.1 B 1.2 A 1.2 A 1.2A 1.1 B 1.2B 1.1 B 1.0
  56. 56. Time machine event = • Deployment capabilities : “no click” deployment – Dozens of services , 130+ servers over 3 Data Centers • Backward and forward compatibility at the extreme field test case – Mixed versions of services / DB with no service downtime • Empowerment – The power we give to individual • Risk taken and failure embracement
  57. 57. CD – prepare to invest….. • Dev infrastructure - Refactor , Refactor, Refactor • Testing infrastructure & know how • Deployment infrastructure & tools • Automation , Automation , Automation • Monitoring (business and technical) – hundreds of aspects – thresholds use is a Must – Monitor business KPIs – Internal & external – Endless Tuning & learning
  58. 58. How does it work – CD Practices • Test driven development • Small Development Iterations • Backwards and Forwards compatible • Gradual Deployment & Self-Test • Feature Toggle • A/B Testing • Exception Classification • Production visibility 04:30
  59. 59. Tools - App-info - Dashboard
  60. 60. Tools - App-info – Running Experiments
  61. 61. Tools – Monitoring - New Relic
  62. 62. Tools – Frying Pan
  63. 63. Tools – Lifecycle To Rule Them All
  64. 64. Where are we today? • We have re-written our flash editor product as an HTML 5 editor – In just 4 months • Introduced Wix 3rd party applications (developers API) – In just 6 weeks • We are easily replacing significant parts of our infrastructure • And we are doing ~50 releases a day! • Production state changes every 9 minutes. 04:30
  65. 65. Aviran Mordo @aviranm http://www.linkedin.com/in/aviran http://www.aviransplace.com 04:30 Read more: The Road To Continuous Delivery: http://goo.gl/K6zEK Dev-Centric Culture: http://goo.gl/0Vo70t

×