Keynote Ton Wesseling at the Web Analytics Wednesday Copenhagen #wawcph at Se...Online Dialogue
Ton was asked to talk about things that get him excited as a web analyst looking at conversion rate optimization. He picked 5 things:
- The real fun part of web analytics is analyzing how user behavior is changing (analyzing experiments), not creating campaign reports...
- Win: inject your website feedback form responses into your analytics and be able to segment behavior based on goals.
- Run your experiments with an automated and free GTM / GA / EXCEL results set-up!
- To make sure business gets it: apply Bayesian statistics on experiment results, don't report on P values, confidence levels etc.
- Bandit algoritms: use www.smartnotifications.com for automated persuasive messaging on your website.
Statistical Models Explored and ExplainedOptimizely
Statistical models used by A/B testing solutions vary greatly. To interpret your test results with accuracy, you need to be well-versed in the approach your testing solution uses to calculate significance. In this presentation Optimizely stats experts will provide a hard-nosed look at a range of statistical models, the risks and tradeoffs associated with each and explain how not all models are created equal.
Check out these slides to learn:
- How testing solutions use Frequentist and Bayesian models to compute significance
- A refresh on core statistical concepts including significance, error, and more
- How Optimizely’s Stats Engine mitigates risk while allowing experimenters to make decisions quickly
[CXL Live 16] How to Utilize Your Test Capacity? by Ton WesselingCXL
The more tests you run, the more you learn. Every test will deliver extra insights, but winners will pay the direct costs. How many tests should you run how often? Should it be many small tests or a couple of big ones? Ton will explain this to you based on his 15+ years of A/B-testing experience.
The Data Quality Formula details the fundamental elements to data quality; Detection, Analysis and Resolution. In order for businesses to realise success, they must understand the Rule of 3.
Keynote Ton Wesseling at the Web Analytics Wednesday Copenhagen #wawcph at Se...Online Dialogue
Ton was asked to talk about things that get him excited as a web analyst looking at conversion rate optimization. He picked 5 things:
- The real fun part of web analytics is analyzing how user behavior is changing (analyzing experiments), not creating campaign reports...
- Win: inject your website feedback form responses into your analytics and be able to segment behavior based on goals.
- Run your experiments with an automated and free GTM / GA / EXCEL results set-up!
- To make sure business gets it: apply Bayesian statistics on experiment results, don't report on P values, confidence levels etc.
- Bandit algoritms: use www.smartnotifications.com for automated persuasive messaging on your website.
Statistical Models Explored and ExplainedOptimizely
Statistical models used by A/B testing solutions vary greatly. To interpret your test results with accuracy, you need to be well-versed in the approach your testing solution uses to calculate significance. In this presentation Optimizely stats experts will provide a hard-nosed look at a range of statistical models, the risks and tradeoffs associated with each and explain how not all models are created equal.
Check out these slides to learn:
- How testing solutions use Frequentist and Bayesian models to compute significance
- A refresh on core statistical concepts including significance, error, and more
- How Optimizely’s Stats Engine mitigates risk while allowing experimenters to make decisions quickly
[CXL Live 16] How to Utilize Your Test Capacity? by Ton WesselingCXL
The more tests you run, the more you learn. Every test will deliver extra insights, but winners will pay the direct costs. How many tests should you run how often? Should it be many small tests or a couple of big ones? Ton will explain this to you based on his 15+ years of A/B-testing experience.
The Data Quality Formula details the fundamental elements to data quality; Detection, Analysis and Resolution. In order for businesses to realise success, they must understand the Rule of 3.
The Secret to Nailing Project EstimationsAtlassian
How many times have you been asked by your manager, a product owner, or a stakeholder, "how long will writing a new feature take?" And how many times have you actually given an accurate time estimate—say within 1 week of your estimation?
Learn about how my development team at Atlassian faced these same issues and how we transformed our practices to consistently nail time estimates
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynoteOnline Dialogue
Keynote by Ton Wesseling of Online Dialogue at the Ecommerce Conversion World event in London on March 23, 2017 about "Online Experiments" and explaining the ROAR model.
[CXL Live 16] When, Why and How to Do Innovative Testing by Marie PolliCXL
Innovative testing is risky. If not addressed carefully it can destroy your optimization strategy by creating loopholes that make it impossible to know what exactly in the change caused the uplift or drop in your conversion rate. At the same time disruptive methods are needed to break out of the ordinary and to take your business to the next level.
The session is a break down and an overview of the worst and the best of innovative testing so that when you take a jump into the unknown you know what to expect.
Top 10 Tactics for Leveraging Behavioral Science to Enhance Member EngagementRevel
Revel CEO Jeff Fritz and Brad Hunt, Chief Marketing Officer, UnitedHealthcare Medicare & Retirement presented at the 9th Annual RISE Stars Master Class on December 11th in San Diego.
Healthcare consumerism is creating an opportunity for organizations to incorporate data driven tactics from industries like retail and finance and become more effective at influencing member behavior. Based on lessons learned from successful health engagement campaigns combined with Revel’s research on human behavior we’ll provide pragmatic tools and tips that strengthen engagement and enhance the member experience. Areas covered will include:
- Personalization tactics that map the healthcare consumer journey based on their individual preferences.
- Actionable predictive modeling techniques to identify strong campaign strategies and messages.
- Using behavior-based data to select the right messaging channels.
- Ideas for building meaningful, long-lasting relationships with members using consumer-based loyalty practices
Presented at Flowcon SF on Nov 1, 2013
Nothing interrupts the continuous flow of value like bad surprises that require immediate attention: major defects; service outages; support escalations; or even scrapping just-completed capabilities that don't actually meet business needs.
You already know that the sooner you can discover a problem, the sooner and more smoothly you can remedy it. Agile practices involve testing early and often. However feedback comes in many forms, only some of which are traditionally considered testing. Continuous integration, acceptance testing with users, even cohort analysis to validate business hypotheses are all examples of feedback cycles.
This talk examines the many forms of feedback, the questions each can answer, and the risks each can mitigate. We'll take a fresh look at the churn and disruption created by having high feedback latency, when the time between taking an action and discovering its effect is too long. We'll also consider how addressing "bugs" that may not be detracting from the actual business value can distract us from addressing real risks. Along the way we'll consider fundamental principles that you can apply immediately to keep your feedback cycles healthy and happy.
Information Today, Knowledge Management Tomorrow - MVV2014M-Brain
Presentation on Knowledge Management held by M-Brain's Susanna Tirkkonen (@susannatirkkone) & Christina Lentell (@lentellW) at Markkinointiviestinnänviikko 2014.
Pharma Knowledge Centre (PKC) “My Learning Life” is engaged to bridge
the real time knowledge gap between Academia and Industry to make
Students “Industry Ready” for Pharmaceutical, Biopharmaceutical and
Clinical Research organization.
Defending against CDD: Chaos-Driven DeliveryJulia Wester
Have you heard of TDD? Well, many teams struggle with CDD: Chaos-Driven Delivery. That is, teams struggle with how to handle the constant onslaught of overwhelming amounts of work and begin to lose hope. The good news is that if you understand operating systems, you already know a great deal about how to tame the chaos!
Process management is an integral part of an operating system. The OS makes decisions about scheduling, sharing information between jobs, handling interrupts and multi-tasking. It also has to manage the resources of a process and be concerned with process synchronization, just as we mere humans do. This presentation will show you how to apply common concepts from operating system process management to the way teams process work.
Jumping off the hamster wheel with KanbanJulia Wester
You're running and running and running but the scenery never changes. You never actually get to your destination. So, you run faster and faster thinking that if you just try harder then you'll get there. Do you feel like you're living your life in a hamster wheel? You're not alone. Many teams face conditions that keep them feeling exactly the same way.
This presentation walks through the challenges from my 1st development manager role, but is presented as a holistic story by including the challenges and changes of the business we were embedded with. Specifically, unsustainable amounts of work, a constant barrage of emergencies, feeling forced into the percentage game, and high levels of specialization.
Experimental statistics is only one of the many powerful analytical techniques companies are using to supercharge their experiment ideation, segmentation, and analysis. Check out this content for a refresher of key stats issues and a discussion on how to use data for better test and bigger wins.
Chris Stuccio - Data science - Conversion Hotel 2015Webanalisten .nl
Slides of the keynote by Chris Stuccio (USA) at Conversion Hotel 2015, Texel, the Netherlands (#CH2015): "What’s this all about data science? Explain baysian statistics to me as a kid – what should I know?" http://conversionhotel.com
The sensationalisation of A/B Testing & Why it doesn't work for you - Measure...Manuel Da Costa
Slidedeck for my talk about why reading blogs and case studies on CRO and A/B testing are doing more harm than good. You also learn how to use a framework to do optimisation the right way
The Secret to Nailing Project EstimationsAtlassian
How many times have you been asked by your manager, a product owner, or a stakeholder, "how long will writing a new feature take?" And how many times have you actually given an accurate time estimate—say within 1 week of your estimation?
Learn about how my development team at Atlassian faced these same issues and how we transformed our practices to consistently nail time estimates
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynoteOnline Dialogue
Keynote by Ton Wesseling of Online Dialogue at the Ecommerce Conversion World event in London on March 23, 2017 about "Online Experiments" and explaining the ROAR model.
[CXL Live 16] When, Why and How to Do Innovative Testing by Marie PolliCXL
Innovative testing is risky. If not addressed carefully it can destroy your optimization strategy by creating loopholes that make it impossible to know what exactly in the change caused the uplift or drop in your conversion rate. At the same time disruptive methods are needed to break out of the ordinary and to take your business to the next level.
The session is a break down and an overview of the worst and the best of innovative testing so that when you take a jump into the unknown you know what to expect.
Top 10 Tactics for Leveraging Behavioral Science to Enhance Member EngagementRevel
Revel CEO Jeff Fritz and Brad Hunt, Chief Marketing Officer, UnitedHealthcare Medicare & Retirement presented at the 9th Annual RISE Stars Master Class on December 11th in San Diego.
Healthcare consumerism is creating an opportunity for organizations to incorporate data driven tactics from industries like retail and finance and become more effective at influencing member behavior. Based on lessons learned from successful health engagement campaigns combined with Revel’s research on human behavior we’ll provide pragmatic tools and tips that strengthen engagement and enhance the member experience. Areas covered will include:
- Personalization tactics that map the healthcare consumer journey based on their individual preferences.
- Actionable predictive modeling techniques to identify strong campaign strategies and messages.
- Using behavior-based data to select the right messaging channels.
- Ideas for building meaningful, long-lasting relationships with members using consumer-based loyalty practices
Presented at Flowcon SF on Nov 1, 2013
Nothing interrupts the continuous flow of value like bad surprises that require immediate attention: major defects; service outages; support escalations; or even scrapping just-completed capabilities that don't actually meet business needs.
You already know that the sooner you can discover a problem, the sooner and more smoothly you can remedy it. Agile practices involve testing early and often. However feedback comes in many forms, only some of which are traditionally considered testing. Continuous integration, acceptance testing with users, even cohort analysis to validate business hypotheses are all examples of feedback cycles.
This talk examines the many forms of feedback, the questions each can answer, and the risks each can mitigate. We'll take a fresh look at the churn and disruption created by having high feedback latency, when the time between taking an action and discovering its effect is too long. We'll also consider how addressing "bugs" that may not be detracting from the actual business value can distract us from addressing real risks. Along the way we'll consider fundamental principles that you can apply immediately to keep your feedback cycles healthy and happy.
Information Today, Knowledge Management Tomorrow - MVV2014M-Brain
Presentation on Knowledge Management held by M-Brain's Susanna Tirkkonen (@susannatirkkone) & Christina Lentell (@lentellW) at Markkinointiviestinnänviikko 2014.
Pharma Knowledge Centre (PKC) “My Learning Life” is engaged to bridge
the real time knowledge gap between Academia and Industry to make
Students “Industry Ready” for Pharmaceutical, Biopharmaceutical and
Clinical Research organization.
Defending against CDD: Chaos-Driven DeliveryJulia Wester
Have you heard of TDD? Well, many teams struggle with CDD: Chaos-Driven Delivery. That is, teams struggle with how to handle the constant onslaught of overwhelming amounts of work and begin to lose hope. The good news is that if you understand operating systems, you already know a great deal about how to tame the chaos!
Process management is an integral part of an operating system. The OS makes decisions about scheduling, sharing information between jobs, handling interrupts and multi-tasking. It also has to manage the resources of a process and be concerned with process synchronization, just as we mere humans do. This presentation will show you how to apply common concepts from operating system process management to the way teams process work.
Jumping off the hamster wheel with KanbanJulia Wester
You're running and running and running but the scenery never changes. You never actually get to your destination. So, you run faster and faster thinking that if you just try harder then you'll get there. Do you feel like you're living your life in a hamster wheel? You're not alone. Many teams face conditions that keep them feeling exactly the same way.
This presentation walks through the challenges from my 1st development manager role, but is presented as a holistic story by including the challenges and changes of the business we were embedded with. Specifically, unsustainable amounts of work, a constant barrage of emergencies, feeling forced into the percentage game, and high levels of specialization.
Experimental statistics is only one of the many powerful analytical techniques companies are using to supercharge their experiment ideation, segmentation, and analysis. Check out this content for a refresher of key stats issues and a discussion on how to use data for better test and bigger wins.
Chris Stuccio - Data science - Conversion Hotel 2015Webanalisten .nl
Slides of the keynote by Chris Stuccio (USA) at Conversion Hotel 2015, Texel, the Netherlands (#CH2015): "What’s this all about data science? Explain baysian statistics to me as a kid – what should I know?" http://conversionhotel.com
The sensationalisation of A/B Testing & Why it doesn't work for you - Measure...Manuel Da Costa
Slidedeck for my talk about why reading blogs and case studies on CRO and A/B testing are doing more harm than good. You also learn how to use a framework to do optimisation the right way
Mastering Analytics for Optimisation SuccessMichele Kiss
[This version was presented at Conversion Hotel in Texel, NL in November 2017]
Analytics and optimization can each generate great results for businesses. However, it’s at the intersection of analytics and optimization that real value can be extracted. In this session, Analytics Demystified Senior Partner Michele Kiss will share how to better integrate your testing and analytics practices, and real-life examples of success.
Delivered by Kath Pay, CEO of Holistic Email Marketing, at Figaro Digital's Email Marketing & CRM Seminar on Thursday 9 February 2017.
Aimed at marketing professionals looking to push the power of their email marketing, this session reveals how to leverage the testing ability of your emails to improve the performance of your other marketing acquisition and conversion channels. You will learn how to build a hypothesis into your emails, to drive the actions that provide the answers you're looking for. As well as how to use a push channel such as email to inform and improve pull channel performance. Plus, tactics to test for long-term results as well as short-term results.
Validation and hypothesis based product management by Abdallah Al-KhalidiAbdallah Al-Khalidi
Prioritization and validation are key activities for successful product management. As a product manager, how will you succeed if you are not able to prioritize correctly? How do you determine what feature to release next and if it is the correct feature to build in the first place?
This talk will cover principle methods and frameworks for feature validation and prioritization and is recommended for product managers and people working in product.
Learn how to transform from a mild-mannered online organizer into a true data-driven mastermind! What to track, how to test, and methods for creating a data-driven culture at your nonprofit.
Website Optimization Without a Committee: Using Testing to Make DecisionsEarthbound Media Group
Hospital marketers today are spending more of their advertising budgets to drive traffic to their websites. However, research shows they are converting only 2 percent to 3 percent of visitors into actual patients. Learn how to use online automated testing, instead of a committee, to make informed changes to your website that will improve performance, engage users, and boost ROI for your online marketing initiatives.
What Is Customer Effort Score and How Do You Measure CES?Kayako
What is customer effort score (CES)?
This metric shows how much effort the customer thinks they had to put in to have their problem resolved. It’s a survey question “How easy was it for you to get your problem solved?” (scale of 1 to 5)
Why should you measure customer effort scores?
Knowing your CES allows you to see what needs to be done to improve the way your support team interacts with your customers.
It is a strong predictor of future customer loyalty – those with high effort scores are less likely to become return customers.
Learn everything you need to know about customer service metrics: https://blog.kayako.com/customer-support-metrics/
Eight Keys for Integrating ABM with Your Sales Team’s Existing Target Account...Demandbase
ABM can’t exist and succeed separate from your sales department. Whereas traditional marketing strategies and tactics can be executed without sales being involved, successful ABM requires tight coordination to achieve optimal results. Matt Heinz outlines eight essential elements for a successful integration of ABM with your sales team’s target account program.
Optimizing Web Forms: How one company generated 226% more leads from a comple...MarketingExperiments
According to the MarketingSherpa 2012 Lead Generation Benchmark Report, capturing a higher quantity of qualified leads is of one of the most important goals in the minds of marketers.
So how can we capture more leads across our Web forms without significantly reducing the quality of those leads?
In our next clinic, the MECLABS research team will walk you through a scientifically validated case study in which one company radically redesigned its Web form without significantly reducing form fields. The marketers' approach generated a 226% higher rate of lead capture.
You’ll learn:
• The exact changes the company made to its form with before and after versions
• How those changes affected conversion on the page
• Examples and case studies on Web form optimization from our library
• How to apply everything you learn to your own pages through live optimization
Lead generation is a delicate balance between generating a high quantity of leads, but also high-quality leads. So how can we capture more leads across our Web forms without significantly reducing the quality of those leads?
Watch this Web clinic replay to learn about a recent experiment that revealed how some minor changes to form fields can increase response from your prospects.
Are you running lots of A/B-tests and is your win ratio through the roof? Or do you have too few winners? In both cases you should be wary of the quality and validity of your experiments. In this session Annemarie will give you 10 concrete tips how to check and warrant the validity of your A/B-tests. This will help you run a more successful optimization program.
Presentation for Digital Elite Day 2020
Ben jij al aan het A/B-testen en behaal je daarmee veel successen? Maar heb je het gevoel dat je niet alle cijfers of uitkomsten kunt vertrouwen? Of vind je juist veel te weinig winnaars? In beide gevallen moet je kritisch kijken naar de kwaliteit van jouw uitgevoerde experimenten.
In deze sessie deelt Annemarie aan de hand van concrete voorbeelden, cases en tips hoe je de validiteit van je A/B-testen kunt waarborgen en checken voor een succesvoller CRO programma.
Workshop at MeasureCamp Amsterdam about building a data driven test strategy. Where can you test? What should you test? How do you analyze the results?
CRO is supposed to be really easy. Everyone can set up an A/B-test in the WYSIWYG editors, the testing tool does all the difficult computations for you and it will tell if you have found a winner. It’s child’s play, right? No, you’re wrong! WYSIWYG editors are very error prone (especially with different browsers) and in order to really analyse and interpret A/B-test results correctly you need a basic understanding of statistics.
This presentation will help you understand:
-The importance of Test Power
-How to correctly set up an A/B-test
-How to analyse test results yourself
-The difference between Frequentist and Bayesian statistics
-How to decide to implement a variation
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
28. @AM_Klaassen
Challenge 1: Hard to Understand
In a frequentist test you state a null hypothesis:
H0 = Variation A and B have the same conversion rate
29. @AM_Klaassen
Say for example you did an experiment and the p-value of that test
was 0.01.
Challenge 1: Hard to Understand
http://onlinedialogue.com/abtest-visualization-excel/
30. @AM_Klaassen
Which statement about the p-value (p=0.01) is true?
a) You have absolutely disproved the null hypothesis: that is,
there is no difference between the variations
b) There is a 1% chance of observing a difference as large as
you observed even if the two means are identical
c) There’s a 99% chance that B is better than A
Challenge 1: Hard to Understand
31. @AM_Klaassen
Which statement about the p-value is true?
a) You have absolutely disproved the null hypothesis: that is,
there is no difference between the variations
b) There is a 1% chance of observing a difference as large as
you observed even if the two means are identical
c) There’s a 99% chance that B is better than A
Challenge 1: Hard to Understand
32. @AM_Klaassen
Which statement about the p-value is true?
a) You have absolutely disproved the null hypothesis: that is,
there is no difference between the variations
b) There is a 1% chance of observing a difference as large as
you observed even if the two means are identical
c) There’s a 99% chance that B is better than A
Challenge 1: Hard to Understand
33. @AM_Klaassen
So the p-value only tells you:
How unlikely is it that you found this result,
given that the null hypothesis is true (that there
is no difference between the conversion rates)
Challenge 1: Hard to Understand
40. @AM_Klaassen
1. No statistical terminology involved
2. Answers the question directly: ‘what is the probability
that variation B is better than A’
Advantage 1: Easy to understand
53. @AM_Klaassen
Make a risk assessment
IMPLEMENT B PROBABILITY
AVERAGE
DROP/UPLIFT
Expected
risk
10.9% -1.85%
Expected
uplift
89.1% 5.92%
Contribution
54. @AM_Klaassen
Make a risk assessment
IMPLEMENT B PROBABILITY
AVERAGE
DROP/UPLIFT
* EFFECT ON
REVENU
Expected
risk
10.9% -1.85% - $ 115,220
Expected
uplift
89.1% 5.92% $ 370,700
Contribution $ 317,936
* Based on 6 months and an average order value of € 100
59. @AM_Klaassen
Or the payback period
IMPLEMENT B BUSINESS CASE
Average CR change 5.00%
Extra margin per week $ 2,400
Cost of implementation $ 15,000
60. @AM_Klaassen
Or the payback period
IMPLEMENT B BUSINESS CASE
Average CR change 5.00%
Extra margin per week $ 2,400
Cost of implementation $ 15,000
Payback period 6.25 weeks
63. @AM_Klaassen
The cut-off probability for implementation is not the same as the
cut-off probability for a learning
CHANCE LEARNING?
< 70 % No learning
70 – 85 % Indication – need retest to confirm
85 – 95 % Strong indication – need follow-up test to confirm
> 95 % Learning
We still need the scientist!
65. @AM_Klaassen
Comparison both methods
• 50 A/B-tests,
• 50.000 visitors per variation,
• conversion rate of 2%,
• average order value of $100,
• minimum contribution of $150,000 in 6 months time
• (equivalent to $30,000 extra margin : ROI of 200%)
First of all thank you so much for having me over! I already spend a few days on Jamaica and it is truly amazing!
So, a bit about me. I flew in Saturday from the Netherlands. So I live in this tiny country in Europe. And I can tell you the sun doesn’t shine as often and strong as in Jamaica. I got already a pretty nasty sunburn
I have 8 years of webanalytics experience and I ‘m always looking to find real insights from user data.
I may not look the part, but I’m basically a nerd.
Every A/B-test I have ever done has been analyzed using Excel.
I have build my own test evaluation tools based on my statistical knowledge from University.
You will get to see some screenshots of these Excel tools I’ve build in this presentation. And one of these tools has now been released as a webtool as well. I’ll come back to this later.
And the other thing about me: I love to travel and explore the world, especially to islands.
So this conference with fellow nerds on a tropical island is just perfect!
So what is it that I do on a daily basis?
I work at Online Dialogue: a data-driven conversion optimization agency in the Netherlands. Some of you may be familiar with my crazy bosses Ton and Bart. We have a mixed group of analyst, psychologists, ux designers, developers and project leads.
We combine data insights with psychological insights for evidence based growth.
With every client we use this framework for conversion optimization.
First we look at the data and determine the pages with the highest test power: which pages have enough visitors and conversions to be able to test on?
Then we look at the paths visitors take on the website to make a booking or place an order. So, what are the main online customer journeys? And where are the biggest leaks in this process?
These data findings are then send to the psychologist. He or she combines this data with scientific research to come up with hypothesis to test
These hypothesis are then briefed to the designer who will come up with test variations.
These variations are then tested in several A/B-tests (since you cannot prove an hypothesis based on one experiment)
The learnings of these A/Btests are then combined in overall learnings which can then be shared with the rest of the organization.
In order to run this program, we run lots and lots of A/B-tests.
The purpose of those A/B-tests is to add direct value in the short term – you want to increase the revenue that is coming in to the website.
And in the long run to really learn from user behavior.
What is it that triggers the visitors on that particular website?
And how can we use those insights to come up with even better optimization efforts on other parts of the site?
This of course sounds terrific, but in practice we see a lot of A/B-test programs cancelled. There is a real challenge in keeping such a program alive.
If not everyone in your organisation believes in A/B-testing you will have a hard time proving its worth.
In order to have successful A/B-test program we believe you need at least 1 winner in 2 weeks. So that every other week the site changes for the better.
Otherwise you will drain the energy out of your test team. Test team members put a lot of time and energy in finding the insights, developing test variations and analyzing them. If these efforts aren’t rewarded their energy drops.
And another more important consequence is that you will have lower visibility in the organization. If you only deliver a winning variation once a month or less you will not be perceived as very important to the business. So you will be deprioritized.
So if the energy of the test team drops and visibility in the organization is low, your A/B-test program will die!
On average we have a success rate of 25% with our clients.
In the market this success rate is even lower. Some companies only have like 1 in 8 winners.
So in order to get to 1 implementation within 2 weeks you need at least 4 tests per week
But you don’t just run 4 tests a week. You need a lot of resources for this AND you need high traffic volumes. Which you don’t always have.
So if you cannot run 4 tests a week, you need a higher implementation rate to get the 1 winner in 2 weeks
There are 2 solutions for this:
This can be achieved by a couple things: you can get more data insights before you run the test.
So you improve your conversion study (refer to talk Peep)
And you can start using consumer psychology and scientific research to combine customer journey analysis with scientific insights for better hypothesis
And in the test fase you could
- Test bolder changes. Bolder changes normally means you are more likely to change visitor behaviour.
- And/or run it on more traffic, to be able to recognize lower uplifts
And the other solution is to look at the statistics you are using to determine winners. Because you can redefine what is perceived as a winner. Should you really not implement a non-significant test variation?
There are a couple of challenges with the traditional t-test, which is most commonly used.
First of all. It’s really hard to understand what a test result actually tells you.
When you use a t-test (which we all have been using) you state a null hypothesis. You may recall this from your statistics classes. You calculate the p-value and decide to reject the null hypothesis or not. So you try to reject the hypothesis that the conversion rates are the same.
So, suppose you did an experiment and the p-value of that test was 0.01
You measured an uplift of 9,58% - and the graph indicates you have a winner
It is the second one, but you have to read it more than once just to get a grip of what is says. You will have a hard time explaining this to your team and higher up the organization.
What you actually want the result to tell you is what the chance is that B is better than A, but that is not what the p-value tells you.
Try explaining that to your manager. The hippo in your organization or higher management won’t understand what the heck you are talking about. They just want to know if they should implement the variation in order to make more money.
The second challenge is that with a t-test an A/B-test can only have 2 outcomes: winner of no winner.
And the focus is on finding those real winners. You want to take as little risk as possible.
This stems from the fact that t-tests have been used in a lot of medical research as well. Of course you don’t want to bring a medicine to the market if you’re not 100% sure that it won’t make people worse of kill them. But businesses aren’t run this way. You need to take some risk to grow your business.
When you look at this test result you will conclude that the experiment was a success. The p-value is very low. So it needs to be implemented. And you can expect an uplift in conversion rate of well over 8% after implementation
But based on this test result you would conclude that there is no winner, that it mustn’t be implemented and that the measured uplift in conversion rate wasn’t enough. So you will see this a loser and move on to another test idea.
However, there seems to be a positive movement (the measured uplift is 5%), but it isn’t big enough to recognize as a significant winner. You probably only need a few more conversions.
So what’s the alternative then?
The most common approach to analysing A/B-tests is the t-test (which is based on frequentist statistics).
But, over the last couple of years more and more software packages (like VWO) are switching to Bayesian statistics.
So what are the advantages of using Bayesian statistics instead?
First, there is no statistical terminology involved. There’s no null hypothesis and no p-value and no false positives. You don’t have to explain that if you have a winning variation there’s still a chance the variation won’t make you money: the false positive rate. If you use a chance you automatically see there’s also a chance it won’t be a winner.
Second, it shows you the probability that B is better than A. Probability is very easy to explain. Everyone understands this.
So this is the excel visualization of a Bayesian A/B-test result I developed.
You see the number of users and conversions per variation, the conversion rates, the measured uplift and the chance that B is better than A.
Easy right?
In the graph you see the chances of the expected uplift after implementation. These lie in a range. The more traffic and conversions the more certain you are of the actual uplift.
But these numbers are the same as in our previous example. With a t-test you would conclude that there is no winner and you need to move on to another test idea.
So you have a 89,1% chance that B will outperform A. I think every manager will take that chance and implement the variation.
And he / she will understand this result.
So you get a happy Hippo!
The second advantage of using a Bayesian A/B-test evaluation method is that it doesn’t have a binary outcome like the t-test does.
A test result won’t tell you winner / no winner, but a percentage between 0 and 100% whether the variation performs better than the original
Instead of the focus of trying to find absolute truths, you can do a risk assessment.
Does the chance of an uplift outweigh the chance of loosing money when you would implement the variation?
That of course depends on the cost of implementing the variation.
In this example you have a 89,1% chance the variation is better than the original. So in 89,1% of the samples is the difference in conversion rate of B and A higher than 0.
But in order to earn back the costs of implementation you want to know the chance it will earn you at least x revenue.
So how big should the increase in revenue be to justify implementation?
Say for example that a test implementation costs 15.000 and your margin is 20% - so you need at least 75.000 extra revenue.
So what is the chance that the implementation of the variation will earn you that amount within 6 months? In this example this is still a 82% chance. So you will probably implement it.
We calculated the expected revenue over 6 months time ( this is a ballpark for how long a A/B-test result has effect on the business – the environment changes / some effects will be longer, some shorter)
So this is the webtool I talked about earlier to make these Bayesian calculations yourself. Just input your test and business case data and calculate!
The first graph will show you the main test result + also the chance of at least x revenue.
When you know the cost of testing (the cost to implement the variation) you can also calculate the ROI of the test.
The average drop of the red bars is -1,85%
The average uplift of the green bars is 5,92%
So you have a 10.9% chance of a drop in conversion rate of 1.85%
And you have a 89.1% chance of an uplift in conversion rate of 5.92%
In money terms this translates to a drop in revenue of 115 thousand 220 or an increase of revenue of 370 thousand 700 dollar.
Multiply 10.9 times the drop in revenue plus 89.1 times the uplift in revenue and you have the contribution of this test.
With this contribution you calculate the ROI
So if you have a margin of 20% this means this test will earn you over 63 thousand dollar
The cost of implementation is 15 thousand dollar
This then results in a ROI of 424%. Pretty positive right?
You could also look at the payback period. So you don’t look at the 6 months revenue, but calculate how long it would take to earn back the investment.
In this example you have measured a change in conversion rate of 5%
A 5% uplift means extra margin of 24 hundred dollar each week.
If you take the 15 thousand dollar investment
It will take 6 and a quarter week to earn back this investment.
You can set certain cut-off values as to when to implement. If it takes longer than 3 months to earn it back, ‘then don’t implement.
However, there is a word of caution to this. Because, we still need the scientist!
As you might recall we test to add direct value, but we still want to learn about user behavior
We take these numbers as a ballpark. If the test has a probability lower than 70% we won’t see it as a learning. If the percentage lies between 70 and 85% we see it as an indication something is there, but we need a retest to confirm the learning.
Anything between 85 en 95% is a very strong indication. So we would do follow-up tests on other parts of the website to see if it works there too. And the same as with a t-test: when the chance is higher than 95% we see it as a real learning.
So even though you would implement the previous test, it doesn’t prove the stated hypothesis. It shows a strong indication, but to be sure the hypothesis is true you need follow-up tests to confirm this learning.
So what does this mean in terms of revenue over time if you compare the two methods?
So I looked at 50 example test results and whether it would be implemented based on a t-test and based on a Bayesian risk assessment.
So these are the numbers of the first 10 tests with the Bayesian probability and the significance indication based on a t-test
I also calculated the revenue it would add to the business in 6 months for each test based on a bayesian and a frequentist approach
Based on 50 tests, 1 in 5 was a significant winner if you use frequentist statistics: so you will implement 10 test variation over time
When you use a Bayesian test evaluation than the number of tests that are implemented rise to 29!. This is a whopping 58%.
As you see the expected uplift of using bayesian is way higher than using frequentist statistics. But the risk is also higher.
With frequentist statistics you have a 5% risk (the false positive rate), with this bayesian example 27%.
But because the uplift is way higher as well you end up with a higher contribution for the Bayesian approach in the end.
When you put these numbers in a graph you will see this: the bayesian approach will increase your implementation rate and you earn you way more money