Experimenting on Humans 
Aviran Mordo 
Head of Back-end Engineering 
@aviranm 
www.linkedin.com/in/aviran 
www.aviransplace.com 
Sagy Rozman 
Back-end Guild master 
@sagyrozman 
www.linkedin.com/in/sagyrozman
Wix In Numbers 
• Over 55M users + 1M new users/month 
• Static storage is >1.5Pb of data 
• 3 data centers + 3 clouds (Google, Amazon, Azure) 
• 1.5B HTTP requests/day 
• 900 people work at Wix, of which ~ 300 in R&D
1542 
(A/B Tests in 3 months)
Agenda 
• Basic A/B testing 
• Experiment driven development 
• PETRI – Wix’s 3rd generation open source experiment 
system 
• Challenges and best practices 
• How to (code samples)
11:31 
A/B Test
A 
B 
To B or NOT to B?
Home page results 
(How many registered)
Experiment Driven 
Development
This is the Wix editor
Our gallery manager 
What can we improve?
Is this better?
Don’t be a loser
Product Experiments 
Toggles & Reporting 
Infrastructure
How do you know what is running?
Why so many? 
If I “know” it is better, do I really 
need to test it?
Sign-up 
The theory 
Choose 
Template 
Edit site 
Publish 
Premium
Result = Fail
Intent matters
Conclusion 
• EVERY new feature is A/B tested 
• We open the new feature to a % of users 
○ Measure success 
○ If it is better, we keep it 
○ If worse, we check why and improve 
• If flawed, the impact is just for % of our users
Start with 50% / 50% ?
Sh*t happens (Test could fail) 
• New code can have bugs 
• Conversion can drop 
• Usage can drop 
• Unexpected cross test dependencies
Minimize affected users 
(in case of failure) 
Gradual exposure (percentage of…) 
• Language 
• GEO 
• Browser 
• User-agent 
• OS 
• Company employees 
• User roles 
• Any other criteria you have 
(extendable) 
• All users
Not all users are equal 
• First time visitors = Never visited wix.com 
• New registered users = Untainted users
We need that 
feature 
…and failure 
is not an 
option
Defensive Testing
Adding a mobile view
First trial failed 
Performance had to be improved
Halting the test results in loss of data. 
What can we do about it?
Solution – Pause the experiment! 
• Maintain NEW experience for already exposed users 
• No additional users will be exposed to the NEW feature
PETRI’s pause implementation 
• Use cookies to persist assignment 
○ If user changes browser assignment is unknown 
• Server side persistence solves this 
○ You pay in performance & scalability
Decision 
Keep feature Drop feature 
Improve code & 
resume experiment 
Keep backwards compatibility for 
exposed users forever? 
Migrate users to another equivalent 
feature 
Drop it all together (users lose data/ 
work)
The road to 
success
Reaching statistical significance 
• Numbers look good but sample size is small 
• We need more data! 
• Expand 
Control Group (A)75% 
50% 
25% 
0% 
Test Group (B) 
25% 
50% 
75% 
100%
Keep user experience consistent 
Control 
Group 
(A) 
Test 
Group 
(B)
Keeping persistent UX 
• Signed-in user (Editor) 
○ Test group assignment is determined by the user ID 
○ Guarantee toss persistency across browsers 
• Anonymous user (Home page) 
○ Test group assignment is randomly determined 
○ Can not guarantee persistent experience if changing 
browser 
• 11% of Wix users use more than one desktop 
browser
There is MORE than one
Possible states >= 2^(# experiments) 
# of active 
experiment 
Possible # of 
states 
10 1024 
20 1,048,576 
30 1,073,741,824 
Wix has ~200 active experiments = 1.606938e+60
A/B testing introduces 
complexity
Support tools 
• Override options (URL parameters, cookies, headers…) 
• Near real time user BI tools 
• Integrated developer tools in the product
Define 
Code 
Expand Experiment 
Merge 
code 
Close
Define spec 
• Spec = Experiment template (in the code) 
○ Define test groups 
○ Mandatory limitations (filters, user types) 
○ Scope = Group of related experiments (usually by product) 
• Why is it needed 
○ Type safety 
○ Preventing human errors (typos, user types) 
○ Controlled by the developer (developer knows about the context) 
○ Conducting experiments in batch
Spec code snippet 
public class ExampleSpecDefinition extends 
SpecDefinition { 
@Override 
protected ExperimentSpecBuilder 
customize(ExperimentSpecBuilder builder) { 
return builder 
.withOwner("OWNERS_EMAIL_ADDRESS") 
.withScopes(aScopeDefinitionForAllUserTypes( 
"SOME_SCOPE")) 
.withTestGroups(asList("Group A", "Group B")); 
} 
}
Conducting experiment 
• Experiment = “If” statement in the code 
final String result = 
laboratory.conductExperiment(key, fallback, new 
StringConverter()); 
if (result.equals("group a")) 
// execute group a's logic 
else if (result.equals("group b")) 
// execute group b's logic 
// in case conducting the experiment failed - 
the fallback value is returned 
// in this case you would usually execute the 
'old' logic
Upload spec 
• Upload the specs to Petri server 
○ Enables to define an experiment instance 
{ 
"creationDate" : "2014-01-09T13:11:26.846Z", 
"updateDate" : "2014-01-09T13:11:26.846Z", 
"scopes" : [ { 
"name" : "html-editor", 
"onlyForLoggedInUsers" : true 
}, { 
"name" : "html-viewer", 
"onlyForLoggedInUsers" : false 
} ], 
"testGroups" : [ "old", "new" ], 
"persistent" : true, 
"key" : "clientExperimentFullFlow1", 
"owner" : "" 
}
Start new experiment (limited population)
Manage experiment states
Ending successful experiment 
1. Convert A/B Test to Feature Toggle (100% ON) 
2. Merge the code 
3. Close the experiment 
4. Remove experiment instance
Experiment lifecycle 
• Define spec 
• Use Petri client to conduct experiment in 
the code (defaults to old) 
• Sync spec 
• Open experiment 
• Manage experiment state 
• End experiment
Petri is more than just an A/B test 
framework 
Feature toggle 
A/B Test 
Internal testing 
Personalization 
Continuous 
deployment 
Jira integration 
Experiments 
Dynamic 
configuration 
QA 
Automated 
testing
Other things we (will) do with Petri 
• Expose features internally to company employees 
• Enable continuous deployment with feature toggles 
• Select assignment by sites (not only by users) 
• Automatic selection of winning group* 
• Exposing feature to #n of users* 
• Integration with Jira 
* Planned feature
Petri is now an open source project 
https://github.com/wix/petri
Q&A 
http://goo.gl/L7pHnd 
https://github.com/wix/petri 
Aviran Mordo 
Head of Back-end Engineering 
@aviranm 
www.linkedin.com/in/aviran 
www.aviransplace.com 
Sagy Rozman 
Back-end Guild master 
@sagyrozman 
www.linkedin.com/in/sagyrozman
Credits 
http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg 
http://goo.gl/nEiepT 
https://www.flickr.com/photos/ilo_oli/2421536836 
https://www.flickr.com/photos/dexxus/5791228117 
http://goo.gl/SdeJ0o 
https://www.flickr.com/photos/112923805@N05/15005456062 
https://www.flickr.com/photos/wiertz/8537791164 
https://www.flickr.com/photos/laenulfean/5943132296 
https://www.flickr.com/photos/torek/3470257377 
https://www.flickr.com/photos/i5design/5393934753 
https://www.flickr.com/photos/argonavigo/5320119828
Why Petri 
• Modeled experiment lifecycle 
• Open source (developed using TDD from day 1) 
• Running at scale on production 
• No deployment necessary 
• Both back-end and front-end experiment 
• Flexible architecture
PERTI Server 
Your app 
Laboratory 
DB 
Logs

Advanced A/B Testing at Wix - Aviran Mordo and Sagy Rozman, Wix.com

  • 1.
    Experimenting on Humans Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com Sagy Rozman Back-end Guild master @sagyrozman www.linkedin.com/in/sagyrozman
  • 3.
    Wix In Numbers • Over 55M users + 1M new users/month • Static storage is >1.5Pb of data • 3 data centers + 3 clouds (Google, Amazon, Azure) • 1.5B HTTP requests/day • 900 people work at Wix, of which ~ 300 in R&D
  • 4.
    1542 (A/B Testsin 3 months)
  • 5.
    Agenda • BasicA/B testing • Experiment driven development • PETRI – Wix’s 3rd generation open source experiment system • Challenges and best practices • How to (code samples)
  • 6.
  • 7.
    A B ToB or NOT to B?
  • 8.
    Home page results (How many registered)
  • 9.
  • 10.
    This is theWix editor
  • 11.
    Our gallery manager What can we improve?
  • 12.
  • 13.
  • 14.
    Product Experiments Toggles& Reporting Infrastructure
  • 15.
    How do youknow what is running?
  • 16.
    Why so many? If I “know” it is better, do I really need to test it?
  • 18.
    Sign-up The theory Choose Template Edit site Publish Premium
  • 19.
  • 20.
  • 21.
    Conclusion • EVERYnew feature is A/B tested • We open the new feature to a % of users ○ Measure success ○ If it is better, we keep it ○ If worse, we check why and improve • If flawed, the impact is just for % of our users
  • 22.
  • 24.
    Sh*t happens (Testcould fail) • New code can have bugs • Conversion can drop • Usage can drop • Unexpected cross test dependencies
  • 25.
    Minimize affected users (in case of failure) Gradual exposure (percentage of…) • Language • GEO • Browser • User-agent • OS • Company employees • User roles • Any other criteria you have (extendable) • All users
  • 26.
    Not all usersare equal • First time visitors = Never visited wix.com • New registered users = Untainted users
  • 27.
    We need that feature …and failure is not an option
  • 28.
  • 29.
  • 30.
    First trial failed Performance had to be improved
  • 31.
    Halting the testresults in loss of data. What can we do about it?
  • 32.
    Solution – Pausethe experiment! • Maintain NEW experience for already exposed users • No additional users will be exposed to the NEW feature
  • 33.
    PETRI’s pause implementation • Use cookies to persist assignment ○ If user changes browser assignment is unknown • Server side persistence solves this ○ You pay in performance & scalability
  • 34.
    Decision Keep featureDrop feature Improve code & resume experiment Keep backwards compatibility for exposed users forever? Migrate users to another equivalent feature Drop it all together (users lose data/ work)
  • 35.
    The road to success
  • 36.
    Reaching statistical significance • Numbers look good but sample size is small • We need more data! • Expand Control Group (A)75% 50% 25% 0% Test Group (B) 25% 50% 75% 100%
  • 37.
    Keep user experienceconsistent Control Group (A) Test Group (B)
  • 38.
    Keeping persistent UX • Signed-in user (Editor) ○ Test group assignment is determined by the user ID ○ Guarantee toss persistency across browsers • Anonymous user (Home page) ○ Test group assignment is randomly determined ○ Can not guarantee persistent experience if changing browser • 11% of Wix users use more than one desktop browser
  • 39.
    There is MOREthan one
  • 40.
    Possible states >=2^(# experiments) # of active experiment Possible # of states 10 1024 20 1,048,576 30 1,073,741,824 Wix has ~200 active experiments = 1.606938e+60
  • 41.
  • 42.
    Support tools •Override options (URL parameters, cookies, headers…) • Near real time user BI tools • Integrated developer tools in the product
  • 43.
    Define Code ExpandExperiment Merge code Close
  • 44.
    Define spec •Spec = Experiment template (in the code) ○ Define test groups ○ Mandatory limitations (filters, user types) ○ Scope = Group of related experiments (usually by product) • Why is it needed ○ Type safety ○ Preventing human errors (typos, user types) ○ Controlled by the developer (developer knows about the context) ○ Conducting experiments in batch
  • 45.
    Spec code snippet public class ExampleSpecDefinition extends SpecDefinition { @Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder builder) { return builder .withOwner("OWNERS_EMAIL_ADDRESS") .withScopes(aScopeDefinitionForAllUserTypes( "SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); } }
  • 46.
    Conducting experiment •Experiment = “If” statement in the code final String result = laboratory.conductExperiment(key, fallback, new StringConverter()); if (result.equals("group a")) // execute group a's logic else if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed - the fallback value is returned // in this case you would usually execute the 'old' logic
  • 47.
    Upload spec •Upload the specs to Petri server ○ Enables to define an experiment instance { "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer", "onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }
  • 48.
    Start new experiment(limited population)
  • 49.
  • 50.
    Ending successful experiment 1. Convert A/B Test to Feature Toggle (100% ON) 2. Merge the code 3. Close the experiment 4. Remove experiment instance
  • 51.
    Experiment lifecycle •Define spec • Use Petri client to conduct experiment in the code (defaults to old) • Sync spec • Open experiment • Manage experiment state • End experiment
  • 52.
    Petri is morethan just an A/B test framework Feature toggle A/B Test Internal testing Personalization Continuous deployment Jira integration Experiments Dynamic configuration QA Automated testing
  • 53.
    Other things we(will) do with Petri • Expose features internally to company employees • Enable continuous deployment with feature toggles • Select assignment by sites (not only by users) • Automatic selection of winning group* • Exposing feature to #n of users* • Integration with Jira * Planned feature
  • 54.
    Petri is nowan open source project https://github.com/wix/petri
  • 55.
    Q&A http://goo.gl/L7pHnd https://github.com/wix/petri Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com Sagy Rozman Back-end Guild master @sagyrozman www.linkedin.com/in/sagyrozman
  • 56.
    Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377 https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828
  • 57.
    Why Petri •Modeled experiment lifecycle • Open source (developed using TDD from day 1) • Running at scale on production • No deployment necessary • Both back-end and front-end experiment • Flexible architecture
  • 58.
    PERTI Server Yourapp Laboratory DB Logs