CRO is supposed to be really easy. Everyone can set up an A/B-test in the WYSIWYG editors, the testing tool does all the difficult computations for you and it will tell if you have found a winner. It’s child’s play, right? No, you’re wrong! WYSIWYG editors are very error prone (especially with different browsers) and in order to really analyse and interpret A/B-test results correctly you need a basic understanding of statistics.
This presentation will help you understand:
-The importance of Test Power
-How to correctly set up an A/B-test
-How to analyse test results yourself
-The difference between Frequentist and Bayesian statistics
-How to decide to implement a variation
Praktik Pengembangan Konten E-Learning HTML5 SederhanaMuhammad Yusuf
Praktik sederhana pengembangan konten E-Learning yang menggunakan HTML5 sebagai dasar pengembangannya. Pada akhir pengembangan dilakukan penyesuaian konten untuk memenuhi standar SCORM sehingga hasil akhir kontennya adalah sebuah konten yang mendukung standar SCORM dan siap dikonsumsi oleh LMS. Library yang digunakan adalah CreateJS.
How the machine understands Korean
기계와 대화를 하려면 어떻게 해야 할까요? 우리는 그 동안 기계가 이해할 수 있는 프로그래밍 언어를 만들어서, 그 언어를 통해 소통해 왔습니다. 하지만 2010년 들어서며 급물살을 탄 AI 연구는 이러한 소통의 영역까지 침투하여, 기계가 인간의 언어를 이해하고, 소통할 수 있는 단계로 다가서고자 노력하고 있습니다. 그 근간에는 선형대수학의 여러 이론들이 사용되고 있는데요, 특히 인간의 언어를 기호화하고 이를 벡터공간에 투영하는 방법들이 핵심으로 여겨지고 있습니다. 이러한 방법을 임베딩(embedding)이라 지칭하고, 단어부터 문장, 문서에 이르기까지 인간의 언어를 다양한 형태로 벡터화하고, 이를 이용해 언어의 의미 유사성, 관계 유사성 등을 벡터 공간에서 벡터 연산을 통해 내재적인 의미를 도출합니다.
이번 세미나에서는 벡터공간모델(Vector Space Model, VSM)의 전통적인 방법(TF-IDF, SVD 등)부터 신경망 방법(word2vec, sent2vec 등)에 이르는 다양한 언어 모델링들을 살펴보고, 이를 한국어에 적용했을 때 기계가 어떻게 의미를 이해하는 것으로 해석할 수 있는지 다양한 관점에서 실험을 통해 살펴보도록 하겠습니다.
Praktik Pengembangan Konten E-Learning HTML5 SederhanaMuhammad Yusuf
Praktik sederhana pengembangan konten E-Learning yang menggunakan HTML5 sebagai dasar pengembangannya. Pada akhir pengembangan dilakukan penyesuaian konten untuk memenuhi standar SCORM sehingga hasil akhir kontennya adalah sebuah konten yang mendukung standar SCORM dan siap dikonsumsi oleh LMS. Library yang digunakan adalah CreateJS.
How the machine understands Korean
기계와 대화를 하려면 어떻게 해야 할까요? 우리는 그 동안 기계가 이해할 수 있는 프로그래밍 언어를 만들어서, 그 언어를 통해 소통해 왔습니다. 하지만 2010년 들어서며 급물살을 탄 AI 연구는 이러한 소통의 영역까지 침투하여, 기계가 인간의 언어를 이해하고, 소통할 수 있는 단계로 다가서고자 노력하고 있습니다. 그 근간에는 선형대수학의 여러 이론들이 사용되고 있는데요, 특히 인간의 언어를 기호화하고 이를 벡터공간에 투영하는 방법들이 핵심으로 여겨지고 있습니다. 이러한 방법을 임베딩(embedding)이라 지칭하고, 단어부터 문장, 문서에 이르기까지 인간의 언어를 다양한 형태로 벡터화하고, 이를 이용해 언어의 의미 유사성, 관계 유사성 등을 벡터 공간에서 벡터 연산을 통해 내재적인 의미를 도출합니다.
이번 세미나에서는 벡터공간모델(Vector Space Model, VSM)의 전통적인 방법(TF-IDF, SVD 등)부터 신경망 방법(word2vec, sent2vec 등)에 이르는 다양한 언어 모델링들을 살펴보고, 이를 한국어에 적용했을 때 기계가 어떻게 의미를 이해하는 것으로 해석할 수 있는지 다양한 관점에서 실험을 통해 살펴보도록 하겠습니다.
Progressive Web Apps are one of the hottest things to come to the web platform in years, but how much of it is just hot air? When can you actually start shipping these things? Decades ago! In a hands on presentation, I'll show how PWAs are truly meant to be progressive - building on an evolution of web technologies nearly as old as the web itself, and still let you ship one of the most performant and cutting edge web apps around.
This talk recaps some features and practices that are best to avoid in good jQuery pages and apps. Following these rules will improve performance and maintainability, and may prevent your co-workers from coming after you with sharp objects.
Are you running lots of A/B-tests and is your win ratio through the roof? Or do you have too few winners? In both cases you should be wary of the quality and validity of your experiments. In this session Annemarie will give you 10 concrete tips how to check and warrant the validity of your A/B-tests. This will help you run a more successful optimization program.
Presentation for Digital Elite Day 2020
Ben jij al aan het A/B-testen en behaal je daarmee veel successen? Maar heb je het gevoel dat je niet alle cijfers of uitkomsten kunt vertrouwen? Of vind je juist veel te weinig winnaars? In beide gevallen moet je kritisch kijken naar de kwaliteit van jouw uitgevoerde experimenten.
In deze sessie deelt Annemarie aan de hand van concrete voorbeelden, cases en tips hoe je de validiteit van je A/B-testen kunt waarborgen en checken voor een succesvoller CRO programma.
Workshop at MeasureCamp Amsterdam about building a data driven test strategy. Where can you test? What should you test? How do you analyze the results?
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
35. o Test for as few weeks as possible
o Test on 100% of the traffic
o Test with less variations (only A vs B)
o Test bolder changes
HIGH SAMPLE POLLUTION ?
39. Null hypothesis
Defendant is innocent
Alternative hypothesis
Defendant is guilty
Present the evidence
Collect data
Judge the evidence
“Is there reasonable doubt? Can
the defendant still be innocent?”
Yes
Fail to reject H0
No
Reject H0
40. Null hypothesis
Conversion rate A = B
Alternative hypothesis
Conversion rate A < B
Present the evidence
Run A/B-test
P - value
“Could the data plausibly have
happened by chance if the null
hypothesis is true?”
Yes
Fail to reject H0
No
Reject H0
41.
42. H0 = Variation A and B have the same conversion rate
INTERPRET STATISTICS
Conclusion: there’s a 1% chance of
observing a 9.58% difference,
given that there is no difference in
conversion rate between A and B
47. IMPLEMENT OR NOT?
o Depends on how much risk the
business is willing to take
o Depends on the type of test : how
invasive (in terms of resources) is
the test?
48. IMPLEMENT B PROBABILITY * EFFECT ON REVENU
Expected risk 10.8% - € 204,400
Expected uplift 89.2% + € 647,150
Contribution € 599,333
* Based on 6 months and an average order value of € 175
RISK ASSESSMENT
49. TAKE AWAYS
1. Determine where you can test (Power >80%)
2. Don’t use the WYSIWYG-editor
3. Check the representativeness of your sample
4. Run tests for full weeks
5. Do research how big sample pollution is
6. Make sure you can interpret the statistics
correctly
First of all, I’d like to thank Jackie for inviting me to this Conversion Elite conference in Manchester.
For everyone who doesn’t know me, I’ll start with a short introduction. I can keep it real short, because Eline did the most work for me.
As said I’ve worked in the field of webanalytics for over 7 years now and have worked for several companies.
I like the travel industry the most, that’s why I’ve recently switched jobs. Right now, I’m Conversion Manager at Tix.nl and am responsible for the Abtesting program.
I love running experiments and optimize websites based on data.
As said, the travel industry really appeals to me. And that’s not just work related.
My life motto is basically: work, save, travel, repeat. The thing that make me the most happy en thrilled is wandering the world. Right now I’m in the work phase, but tomorrow it’s travel time. I have extended my stay until this weekend so if you have any tips for me please let me know during drinks.
Last summer we went to Northern Norway. We visited Tromso and the Lofoten, but also went to a lesser known island called Senja. There were no travelguides about this area so we googled to find out what we could do. We found a hike going to one of the highest mountains of Senja, which should have an amazing view. It was supposed to be a 2 hour hike, so very doable.
The trail started off rather easy and we were opportunistic and enthousiastic to reach the top. But our enthousiasm soon declined. The trail was getting steeper and steeper and there were several times that I wanted to call it a day and return to the car. My limbs were sore and I was totally out of breath.
But we persevered and made it to the top. The view was to die for. And the whole journey towards the top was totally worth it.
It would have been soooo much easier if there was an easy route, like a cable car. But then we wouldn’t have overcome the challenges and learned to persevere, and we wouldn’t have been as thrilled to made it to the top.
This hike for me stands for the way we look at A/B-testing as well. We are inclined to only have the end goal in mind and take the easy route. We buy a testingtool and immediately expect amazing outcomes. We let the marketeer run the test program. The testtool can make the variation so we don’t need developers, the testtool will tell when the test is cooked so we don’t need data scientists and the testtool will tell you if it’s a winner so we don’t have to do the analysis ourselves. Easy does it! But running a proper CRO program can’t be done by skipping the hiking trail and taking the cable car to the top. You need to put time and effort in it. Because there are important things our testingtool doesn’t tell us.
The first thing testtools don’t tell you is what’s worth to test? Where should you start? You can set up a test on a page with 50 visits per day, but such a test would run for ages and the results would then still be questionable!
So how do you determine where you should A/B-test on your site and where you shouldn’t?
Well, there’s a rough rule of thumb to keep in mind. Basically, if you have less then 1,000 conversions per month, it’s not worth the trouble. Because in order to find a significant effect between A and B you will need a conversion rate uplift of around 15%. And these kind of uplifts are not easy to accomplish.
So if a page has fewer than 1,000 conversions per month, you shouldn’t spend your resources on it, but find other ways to validate your idea (with other types of more qualitative research).
But this benchmark of a 1000 conversions per month isn’t specific enough.
Before I start with an optimization project I first start with a test power determination.
This is basically a fancy word for determining whether of not you can run a test on a specific page with a reasonable chance to find a winner. This reasonable chance to detect a winner is called ‘Statistical Power’
I know this sounds a bit cryptic, but what you should remember is
you need a certain amount of people in your test and the change should make an impact on behavior to be able to proof your hypothesis. the more people in your test, the higher the power and the bigger de difference between A and B the higher the power.
For each page template you want to test on you should determine the unique weekly visitors and the unique weekly buyers. Given the 80% Power critical value and the pre-determined confidence level you can calculate the needed sample size and corresponding effect size.
Suppose you have 32.000 weekly visitors and 800 buyers. These are nice numbers to perform an Abtest on right?
Well, in order to have a Power of at least 80% you can detect uplifts of 9% and higher in 4 weeks time. If you expect that your variation will only result in a 5% uplift, then your sample size needs to be way higher to garantee 80% power. You then need to run the test for 13 weeks.
And this is not something you want to do. The adviced maximum test duration is 4 weeks. If you would test shorter than this, then the chance to detect a winner will be far lower than 80%.
You need to be aware of the of the effect you need from your test and how long the tests needs to run. If you know what impact you need to make, you can design your test accordingly. If you know you need an uplift of 10% you will probably test bolder changes.
The 2nd thing your testing tool doesnt tell you is how you should code your variations.
A couple of months ago we did an Abtest where we showed our destinations that are highlighted in the tv-commercial on the homepage. The code for these 6 blocks was written by our front-end developer, but during the test the visuals and prices needed to be updated. My colleague was responsible for this and used the WYSIWYG editor for this.
The result after 4 weeks of testing was this monstrosity of a code.
This is really dirty code and front-end developers will probably hate you when you show them this.
Because, it doesn’t only look ugly, it is ugly as well. Because the longer the code, the slower it will load.
And it’s unreliable in different browsers. Some adjustments will work perfectly find in Chrome, but will not work in Internet Explorer. And where you have these kind of codes it becomes hard to QA
And the third drawback of the WYSIWYG editor is that it’s rather limited. You cannot add new functionalities or completely change the lay-out.
For easy tests you can code this yourself using html/css and javascript and jQuery. You can quickly learn the basics on Codecademy.com. For the more complicated tests you really need development resources. There’s no way around it.
The 3rd thing testing tools don’t address is the possibility of validity problems.
Validity is the extent to which a conclusion or measurement is well-founded and corresponds accurately to the real world. If you have run a valid experiment, the results can be generalized for the whole population, but there are a couple of checks you need to do.
You need to make sure that each group in the sample is representative of the total population.
Especially if you have a smaller sample size this might be an issue and when you have groups in your population with very different conversion rate. For example, new visitors normally have a far lower conversion rate then returning visitors. This means that when you have relatively more returning visitors in one of your variations the measured uplift may well be caused by the difference in composition of the samples. So you need to check the representativeness of each sample before drawing conclusions. This is something a testtool doesn’t do for you.
All the testing tools in the market will call a winning result regardless of how long the test has run. Of course they take into account sample size, but they don’t take into account day-of-the-week effects.
If you take a look at your data over a longer period of time and big differences appear in conversion rates per day, you should always test for full weeks. Because a winning test based on only weekdays, may well underperform in the weekend. Again: these results cannot be generalized over all the days of the week.
And another important one: you cannot generalize results found in a period where user behavior is different than normal. If you found a winning result around Christmas, you should at least re-test this in another time of the year. Or wait for the next Christmas to implement the winner.
Another threat to running a successfull testing program is the possibility of sample pollution. We wrongfully asume that we can build the perfect scientific experiment online.
There are several reasons why unique users might end up in both variations of an experiment. First of all visitors can delete their cookies or browse incognito. When visitors return to the site they have a 50% chance (when you just run A vs B) that they will see the other variation.
Secondly, more and more visitors will use more than 1 browser of device in their quest for your product. Especially cross-device usage is a big polluter of your sample.
And lastly, if the orientation phase for the product you are selling is long, then visitors will probably have seen the original page before they ended up in your test variation.
So suppose you did an experiment and you have found that B has a higher conversion rate than A
Now you know a proportion of both the samples is polluted. Some visitors in A have also seen B and vice versa.
We don’t know what the conversion rate is of the ABBA group, but we make the assumption that their conversion rate is probably the average of A and B.
This means that the actual conversion rate of A is even lower (because it positively influenced by the ABBA group). And the actual conversion rate of B is probably higher – because it’s negatively influenced by the ABBA group.
So you measured this uplift in your test
But the actual uplift is far higher! This means that if you have a lot of pollution in your test, you will have a hard time finding the effect.
Especially if your win ratio is at the lower end (like lower than 20%) you need to do research how big these issues are.
If you have found out that the pollution rate is very high, you need to adjust your testing strategy.
And the last thing I want to address it that you need a basic understanding of the statistics that are used by your testtool. Because every A/B-test uses its own statsengine and you need to be aware what this means when drawing conclusions.
There are 2 types of statistics that have been used in A/B-testing: frequentist and Bayesian.
Historically most testtools used frequentist statistics, but over the last couple of years more and more tools switched to Bayesian statistics. And this is not without reason.
Using frequentist statistics has a couple of challenges. I’ll explain this.
Frequentist testing can be compared with court trial in the US.
The null hypothesis says that the defendant is innocent. The defendant is innocent until proven guilty!
and the alternative hypothesis says that the defendant is guilty.
We then present evidence or, or in other words, collect data.
Then, we judge this evidence and ask ourselves the question, could the evidence have happened by chance if the defendant is innocent? Is there reasonable doubt?
This principle is used in Frequentist statistics as well. You have a null hypothesis stating that there is no difference in conversion rate and you try to disprove this claim. You run your test and you judge the results based on the p-value.
You try to answer the question: could the data plausibly have happend if the null hypothesis is true? If there still is doubt then you fail to reject the null hypothesis and conclude that there is no difference. But if there’s no doubt (or very little) than you reject the null hypothesis.
Snoop Dogg has an excellent line for this: if the p is low, the ho must go
I will give an example how this translates to an A/B-test.
So, suppose you did an experiment and the p-value of that test was 0.01. You remembered: the p-value is very low, so the H0 needs to go. But what is the exact conclusion of this test?
With the use of frequentist statistics you can only conclude how surprising the results are based on the hypothesis that A and B perform exactly the same. I don’t know about you, but this confuses the hell out of me! This is really hard to explain – not only to fellow optimizers but mainly to your boss.
And besides the confusion, I’m actually not interested in “how unlikely it is that I found these results.” I just want to know whether variation B is better than A. Frequentist statistics are counter intuitive.
The other challenge with using frequentist statistics is that an A/B-test can only have 2 outcomes: you either have a winner of no winner. In other words, you can either reject the null hypothesis or fail to reject it. And there is no wiggle room.
If you take a look at this test-result you would conclude that there is no winner, that it mustn’t be implemented and that the measured uplift in conversion rate wasn’t enough. So you will see this a loser and move on to another test idea.
However, there seems to be a positive movement (the measured uplift is 5%), but it isn’t big enough to recognize as a significant winner.
The alternative to using frequentist statistics, is Bayesian statistics. And as said most test tools have switched to using Bayesian (or using flavors of Bayesian).
And that’s not without reason: Bayesian statistics makes more sense, since it far better suits the underlying business question.
When you use Bayesian statistics, to evaluate your A/B-test, then there is no difficult statistical terminology involved anymore. There’s no null hypothesis, no p-value or z-value et cetera. It just shows you the measured uplift and answers the question what the chance is that B is better than A. Easy right?
Everyone can understand this.
Based on the same numbers of the A/B-test I showed you earlier you have a 89,2% chance that B will actually be better than A. Probably every manager would understand this and will like these odds.
When using a Bayesian A/B-test evaluation method you no longer have a binary outcome like the t-test does. A test result won’t tell you winner / no winner, but a percentage between 0 and 100% whether the variation performs better than the original.
In this example 89,2%.
The question that remains is: is this enough to be implemented?
Well that depends on a couple of things. If you would implement a test variation with a probability of 51% then you’re not doing much better than just flipping a coin. The risk of implementing a losing variation is quite high.
Depending in the type of business you may be more or less willing to take risks. If you are a start-up you might want to take more risk then a full grown business, but still we don’t really like the chance to lose money so what we see with our clients that most need at least a probability of 75%.
But it also depends on the type of test. If you only changed a headline then the risk is lower, then when you need to implement a new functionality on the page. This will consume much more resources. Hence, you will need a higher probability.This deliberation isn’t presented in your testtool, you need to make this decision yourself. For some tests you need 95% probability, whereas for other you are happy with 75%.
What you can do is make a risk assessment. You can calculate what the results mean in terms of revenue.
When the client decides to implement the variation they have a 10.8% chance of a drop in revenue of 200.000 in 6 months time (and an average order value of 175)
But on the other hand, they also have a 89.2% chance that the variation is actually better and brings in nearly 650.000 euro.
You can show this table to your boss and ask whether he would place the bet. So it’s not the testingtool which decides if it’s a winner or not it’s you (or your boss).
So, to sum up there are a lot of things your testing tool doesn’t tell you: - you need to: 1,2,3,4,5,6….It’s not an easy path and there will be times you want to call it a day, but if you persevere you will make it to the top and fully enjoy what you have accomplished.