Mobile Game In-game Data Analysis
This presentation analyzes the data form an A/B test
conducted on Cookie Cats, a mobile connect-three-style
puzzle game.
also, we will analyze the result of an A/B test where
the first gate in Cookie Cats was moved from level 30 to
level 40.
In particular, we will analyze the impact on player
retention.
The presentation is written in Python and the libraries
used are Pandas, Numpy, Matplotlib, and Seaborn.
 Game description
Cookie Cats is a hugely popular
mobile puzzle game developed by
Tactile Entertainment.
It's a classic "connect three" style
puzzle game where the player
must connect tiles of the same
color in order to clear the board
and win the level.
Problem Definition
 Did the change made on the
position of the gate affect the
player retention rate?
Let’s to detect and Fix Potential Problems…
• It seems that there is no missing value presented in the dataset.
• Since there are only 2 unique values in the version column, we can transform the data
type of this column to "category"
Outliers in Game Rounds
The describe of sum game rounds column from dataset
Seems like there is an outlier value as high as 49854, and the number is so high that it
dwarfs all the other values. This could be caused by:
1.an error resulted from manual data key-in processes.
2.cheating behaviors from the player side.
3.the hard-work a relentlessly hardcore player.
or other
We need to work the problem.
For now, it would be easier to just delete this record.
Conduct Exploratory Data Analysis (CEDA)
 Let's start by examining the graph of the
empirical cumulative distribution function of
"sum_gamerounds" to have a glimpse of
player behavior. We start by defining several
functions that will be used repetitively later.
Each data point indicates the percentage of players who played less than or
equal to the rounds of that point. For example, if game rounds = 500, we can
see the corresponding y value is around 0.99. This means that 99% of the
players in the dataset played less than or equal to 500 rounds.
 From the above chart, we identify several issues that we might be interested in:
Around 4.43% of the players did not finish even one game round, and we might
want to ask: did they encounter any problem when they are playing the game.
Around 20% of the players played no more than 3 rounds and around 40%
stopped before level 11. If a round of a game takes 3 minutes to finish on
average, it means we lose 20% of the players after they play the game for 9
minutes and 40% of them after 30 minutes averagely.
Over 63% of the players did not reach level 30 or higher, so we would like to
ask:
what made them stop before reaching the first gate (at least in the "gate_30"
version)?
is this churn rate what we are expecting? If not, what can we do to improve?
how does this affect our A/B tests?
 From this chart, we observe that before level 30, the proportion of players for
the _gate30 version is lower than the one for the _gate40 version. In between
level 30 to level 40, the gap closes in and the former surpasses the latter, as
annotated in the chart. This is quite interesting, and we may suspect that:
 Setting the gate at level 30 is better at retaining the players because, compared
to the _gate40 data, there are more proportions of players who reached higher
levels after the gate. However, we need to be cautious about establishing any
conclusion simply based on this information.
We need to look at the retention rates to better formulate ideas about which
version is better.
The day-1 retention rate, which means the proportion of players who came back to play the game one
day after the installation, was around 45% for both version.
And the day-7 retention rate was around 19%. The is quite alarming because it means, on average, we
lost over a half of the player one day after they installed the game.
he p-value is around 0.038, and this tells us:
The p-value is relatively small, so we might reasonably believe that a difference
between the retention rates of the 2 versions is real. Since the empirical difference is
positive, we can reasonably decide that the gate_30 version has a better day-1 retention
rate than gate_40 version does.
Prediction :
Seeing the p-value is lower than 0.001, which is extremely small.
We can be very confident to say the difference between the retention rates is real
and that gate_30 version has indeed better day-7 retention rate than gate_40 version
does.
Report for number of user install gate-30
using Tableau
Report for number of user install gate-40
using Tableau
Resources
Dataset from Kaggle :
https://www.kaggle.com/jyunyolin/mobile-game-retention-analysis-a-b-testing/data
Code on GitHub :
https://github.com/roaa-qteishat/Game/blob/0c3d3077d4fd5dd283ac850c17b513d214964e42/Games.ipynb
Tableau:
https://public.tableau.com/app/profile/roaa1997/viz/cookiecatsgametableau/Cookiecats
Done by
 Roa’a Qteishat :roaa.qteishat@yahoo.com
 Heyam alsaidi : alsaidiheyam@gmail.com

Cookie cats game analysis

  • 2.
    Mobile Game In-gameData Analysis This presentation analyzes the data form an A/B test conducted on Cookie Cats, a mobile connect-three-style puzzle game. also, we will analyze the result of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention. The presentation is written in Python and the libraries used are Pandas, Numpy, Matplotlib, and Seaborn.
  • 3.
     Game description CookieCats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "connect three" style puzzle game where the player must connect tiles of the same color in order to clear the board and win the level.
  • 4.
    Problem Definition  Didthe change made on the position of the gate affect the player retention rate?
  • 5.
    Let’s to detectand Fix Potential Problems… • It seems that there is no missing value presented in the dataset. • Since there are only 2 unique values in the version column, we can transform the data type of this column to "category"
  • 6.
    Outliers in GameRounds The describe of sum game rounds column from dataset
  • 7.
    Seems like thereis an outlier value as high as 49854, and the number is so high that it dwarfs all the other values. This could be caused by: 1.an error resulted from manual data key-in processes. 2.cheating behaviors from the player side. 3.the hard-work a relentlessly hardcore player. or other
  • 8.
    We need towork the problem. For now, it would be easier to just delete this record.
  • 9.
    Conduct Exploratory DataAnalysis (CEDA)  Let's start by examining the graph of the empirical cumulative distribution function of "sum_gamerounds" to have a glimpse of player behavior. We start by defining several functions that will be used repetitively later.
  • 10.
    Each data pointindicates the percentage of players who played less than or equal to the rounds of that point. For example, if game rounds = 500, we can see the corresponding y value is around 0.99. This means that 99% of the players in the dataset played less than or equal to 500 rounds.
  • 12.
     From theabove chart, we identify several issues that we might be interested in: Around 4.43% of the players did not finish even one game round, and we might want to ask: did they encounter any problem when they are playing the game. Around 20% of the players played no more than 3 rounds and around 40% stopped before level 11. If a round of a game takes 3 minutes to finish on average, it means we lose 20% of the players after they play the game for 9 minutes and 40% of them after 30 minutes averagely. Over 63% of the players did not reach level 30 or higher, so we would like to ask: what made them stop before reaching the first gate (at least in the "gate_30" version)? is this churn rate what we are expecting? If not, what can we do to improve? how does this affect our A/B tests?
  • 14.
     From thischart, we observe that before level 30, the proportion of players for the _gate30 version is lower than the one for the _gate40 version. In between level 30 to level 40, the gap closes in and the former surpasses the latter, as annotated in the chart. This is quite interesting, and we may suspect that:  Setting the gate at level 30 is better at retaining the players because, compared to the _gate40 data, there are more proportions of players who reached higher levels after the gate. However, we need to be cautious about establishing any conclusion simply based on this information.
  • 15.
    We need tolook at the retention rates to better formulate ideas about which version is better. The day-1 retention rate, which means the proportion of players who came back to play the game one day after the installation, was around 45% for both version. And the day-7 retention rate was around 19%. The is quite alarming because it means, on average, we lost over a half of the player one day after they installed the game.
  • 16.
    he p-value isaround 0.038, and this tells us: The p-value is relatively small, so we might reasonably believe that a difference between the retention rates of the 2 versions is real. Since the empirical difference is positive, we can reasonably decide that the gate_30 version has a better day-1 retention rate than gate_40 version does.
  • 17.
    Prediction : Seeing thep-value is lower than 0.001, which is extremely small. We can be very confident to say the difference between the retention rates is real and that gate_30 version has indeed better day-7 retention rate than gate_40 version does.
  • 18.
    Report for numberof user install gate-30 using Tableau
  • 19.
    Report for numberof user install gate-40 using Tableau
  • 20.
    Resources Dataset from Kaggle: https://www.kaggle.com/jyunyolin/mobile-game-retention-analysis-a-b-testing/data Code on GitHub : https://github.com/roaa-qteishat/Game/blob/0c3d3077d4fd5dd283ac850c17b513d214964e42/Games.ipynb Tableau: https://public.tableau.com/app/profile/roaa1997/viz/cookiecatsgametableau/Cookiecats
  • 21.
    Done by  Roa’aQteishat :roaa.qteishat@yahoo.com  Heyam alsaidi : alsaidiheyam@gmail.com