Howusesto
How to win at businessEarn X+ε per user(Lifetime Value)Pay X to acquire a user(Cost per Acquisition)Black Box𝑖=0𝑛𝜀𝑖 = WIN (where n is large, and ε>0)
“What am I investigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
Customer Lifetime ValueHow much is a user worth to me over his/her lifetime?CLV(C,S,R) = C * S * RC: conversion to payS: average transaction sizeR: average number of purchases over lifetime
How do we increase revenue?Conversion = 	# paying users / # total usersSocial Gaming sites usually get 1% (low) - 5% (godly)!
site
Welcome to OMGPOP
OMGPOP is a community-centric multiplayer gaming siteReal-Time Multiplayer Games
Community Oriented
Virtual EconomyWe sell virtual items…And accept many forms of payment
We are not a research labWe are a venture-backed startup
Investors demand bottom-line results FAST. No credit for academic publications and citations.
Resources are SCARCE
We have to justify every minute spent on predictive analytics
How long do we spend developing, testing, and measuring X feature?”
We have weeks – not months – to show results. It needs to be immediately actionable; we make lots of assumptions“What am I investigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
How do we increase conversion?Our site contains MANY features ChatGamesWallsNotificationsSurveysPicturesWhere do we focus our efforts?Which has the greatest ROI?
What causes a user to buy?Our Guiding Mantra: A user’s experience on the site is directly correlated with his/her probability to payP(Buy)P(Buy’)On Site Experience ChangesBeforeAfter
What site experience is causing users to pay?Let’s translate into analytical questions:“What are indicators of paying users?”“What features are unique to paying users?”“What unique experience do payers have that drive them to pay?”“What features separate paying users from nonpaying users?”
“What am I investigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
We aggregated over 100 featuresgenderagesite_levelgameplayslogins_countplay_intensitylogin_intensitycents_first_purchasenumber_virtual_goods_purchasedamount_on_first_purchaseingame_items_purchasedtotal_coins_spenttotal_coins_earnedcoin_balancenumber_of_friendstotal_friends_invitedfacebook_connectedcandystand_useraim_usergifts_sentgifts_receivedip_addresshas_mobile_numberhas_uploaded_photosignup_datepay_datetime_to_first_purchase_rounduptime_to_first_purchase_roundprofile_items_purchasedballoono_items_purchased
We were suspicious of our gender dataAccording to self-reported data, 80% of users were male.
So we hired a 3rd party data service to validate
And we asked every user 4 questions about their genderOur female users lie to us65% of women said “No, I’m not a girl”
73% of women said “No, I’m not a woman”Our users don’t always tell the truth
We can use the gender questions to build a simple predictive modelInput Raw Data SetChoose labelChoose classifierTrain on X% of the dataTest on Y% of the dataRemove irrelevant features
Name of the game:Train a model with the highest accuracy (confidence)MLFeaturesResultDataAccuracy determined by data – want to remove data that doesn’t contribute relevant information (i.e. remove noise)
Choosing Features1. Intuition to choose many possible important features2. Remove features that you can’t trust3. Approximate Importance of features4. Train model5. Re-Train model on subsets of feature list and choose features that yield highest accuracy
Problems Comparing Features Between Our UsersSelect Features Common to payers and NonpayersKeep Distributions IntactOur Experience:Had to compare users stats right before their first purchase (behavior on site changes after first purchase)-100 Plays-20 days on site-Paid $20-10 Plays-1 days on site-Didn’t pay
How to Format the Data for Your Model
ModelingFeaturesResultData
“What am I investigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
What does a Classification Model do?Model ‘learns’ to classify apples and orangesClassification ModelLabeled Apples and Orangespruning, optimize parameters, weights, etcUnlabeled FruitClassification Model% Chance of Being an Apple (or orange)
Applying a Predictive ModelPurposeWe are a startup, we need quick results, interpretation, and actionDecision TreePros:Easily Understood / interpretedCalculates QuicklyConsLocal max only (greedy)Less Accuracy
Wine Data
What is a Decision Tree?
Gameplays> 100< 100PayersNonpayers
PurityMeasure Homogeneity of the LabelsDegree of Homogeneity is measured through:EntropyGini Index								others= probability of occurence of class j in the sample
Decision Tree Algorithm (simplified)Calculate Impurity for Original Sample (probability for each )P(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)2. Calculate Information Gain for each possible attribute splitsplit table -- difference in impurity measure is called the Information Gain IG= E(Original Table) – Sum_i (n *  k /n * E( Feature_table_i )3. Choose the attribute split that results in the highest information gain4. Remove Splitting Attribute, Recusively keep splitting on highest information gain attribute – Done when no more attributes, information gain is too tiny, or max depth of tree
Calculate Impurity for SampleP(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)
CALCULATE IMPURITY FOR EACH INDIVIDUAL FEATURENoYes
Decision Tree Algorithm (simplified)Calculate Entropy for Original Table(probability for each )P(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)2. Take the difference in entropy between our original table and the weighted sum of the split tables -- difference in impurity measure is called the Information Gain IG= E(Original Table) – Sum_i (n *  k /n * E( Feature_table_i )3. Choose the attribute split that results in the highest information gain4. Keep splitting on highest information gain attributes, repeat until a certain depth of tree has been reached, or until a certain lower threshold of information gain is achieved
Setting Up a Decision Tree Using RapidMiner(http://archive.ics.uci.edu/ml/datasets/Wine)MerlotShirazMerlotCabernetMerlot
some features# friends# playsWin percentagescoins earnedPhotos UploadedCoins spentPurchases of different virtual items# Plays for each gameFill rate for each gameGame LengthsFacebook / Myspace / aimGifts sent / receivedLocationetc
Can Use impurity for a quick ‘approximate’ ranking of feature importance for segmentingThe highest information gain split, the more relevant the feature is for segmenting
Ideas for what features seperated nonpaying users from paying users?
Results showed four different ‘groups’ of usersPeople who hadn’t interacted with goods or virtual currencypeople who got just a free virtual good people who bought 1-3 virtual goods and spent at least 1 virtual currencyPeople who bought 7.5 + virtual itemsGroup 1 had almost no people who spent real $$, group 4 had the most. Intuitive! The goal is then to take a smaller step and get people to interact with the virtual goods and currency from day one.
total_coins_spentcoin_balancegameplaysBalloonogameplaysnumber_virtual_goods_purchasedSVM Weightstype: C-SVC (LIBSVM)kernel: linear
ModelingFeaturesResultData
“So now what do I do? How do I take action?”
Payers and Nonpayers are having different experiences!Most people purchase at the START of their experience (seen in distribution of payers)People who spend $$ are those who spend their virtual currency buying virtual goodsExtracting Insights From The Model
We want the nonpaying user to have the same experience as the paying user
Press the Button
On a website…So you cant click for the user, but...You control the flows, i.e you ‘direct’ people where to click, and have HUGE INFLUENCE over what users do on your siteOnly one link? No where else to go.
We want people to buy more virtual items and spend more coins at the start of their experience…and we can direct where people click…So?
On the first login, directFORCE all new users to spend coins buying virtual items!
TheorySomeone can easily go through the site without EVER having spent a single coin.Lubricate the purchasing processHabituate users to spend / buy early and oftenGetting people to spend more coins will increase conversionForcing NOT the same as User Elected Action, But it Likens their Experiences!
A Quick Look BackExtract ‘Insights’ from the modelSee most relevant separating featuresBridge the gap between separating features
Data at the HelmHard to make ‘Guiding Decisions’Data Inspires Confidence in decisions moving forwardWorse Case - learn how data changed in response to change on website, new insight!
“So I implemented this change, how can I tell if its working?”

How OMGPOP Uses Predictive Analytics to Drive Change

  • 1.
  • 2.
    How to winat businessEarn X+ε per user(Lifetime Value)Pay X to acquire a user(Cost per Acquisition)Black Box𝑖=0𝑛𝜀𝑖 = WIN (where n is large, and ε>0)
  • 3.
    “What am Iinvestigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
  • 4.
    Customer Lifetime ValueHowmuch is a user worth to me over his/her lifetime?CLV(C,S,R) = C * S * RC: conversion to payS: average transaction sizeR: average number of purchases over lifetime
  • 5.
    How do weincrease revenue?Conversion = # paying users / # total usersSocial Gaming sites usually get 1% (low) - 5% (godly)!
  • 6.
  • 7.
  • 8.
    OMGPOP is acommunity-centric multiplayer gaming siteReal-Time Multiplayer Games
  • 9.
  • 10.
    Virtual EconomyWe sellvirtual items…And accept many forms of payment
  • 11.
    We are nota research labWe are a venture-backed startup
  • 12.
    Investors demand bottom-lineresults FAST. No credit for academic publications and citations.
  • 13.
  • 14.
    We have tojustify every minute spent on predictive analytics
  • 15.
    How long dowe spend developing, testing, and measuring X feature?”
  • 16.
    We have weeks– not months – to show results. It needs to be immediately actionable; we make lots of assumptions“What am I investigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
  • 17.
    How do weincrease conversion?Our site contains MANY features ChatGamesWallsNotificationsSurveysPicturesWhere do we focus our efforts?Which has the greatest ROI?
  • 18.
    What causes auser to buy?Our Guiding Mantra: A user’s experience on the site is directly correlated with his/her probability to payP(Buy)P(Buy’)On Site Experience ChangesBeforeAfter
  • 19.
    What site experienceis causing users to pay?Let’s translate into analytical questions:“What are indicators of paying users?”“What features are unique to paying users?”“What unique experience do payers have that drive them to pay?”“What features separate paying users from nonpaying users?”
  • 20.
    “What am Iinvestigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
  • 21.
    We aggregated over100 featuresgenderagesite_levelgameplayslogins_countplay_intensitylogin_intensitycents_first_purchasenumber_virtual_goods_purchasedamount_on_first_purchaseingame_items_purchasedtotal_coins_spenttotal_coins_earnedcoin_balancenumber_of_friendstotal_friends_invitedfacebook_connectedcandystand_useraim_usergifts_sentgifts_receivedip_addresshas_mobile_numberhas_uploaded_photosignup_datepay_datetime_to_first_purchase_rounduptime_to_first_purchase_roundprofile_items_purchasedballoono_items_purchased
  • 22.
    We were suspiciousof our gender dataAccording to self-reported data, 80% of users were male.
  • 23.
    So we hireda 3rd party data service to validate
  • 24.
    And we askedevery user 4 questions about their genderOur female users lie to us65% of women said “No, I’m not a girl”
  • 25.
    73% of womensaid “No, I’m not a woman”Our users don’t always tell the truth
  • 26.
    We can usethe gender questions to build a simple predictive modelInput Raw Data SetChoose labelChoose classifierTrain on X% of the dataTest on Y% of the dataRemove irrelevant features
  • 27.
    Name of thegame:Train a model with the highest accuracy (confidence)MLFeaturesResultDataAccuracy determined by data – want to remove data that doesn’t contribute relevant information (i.e. remove noise)
  • 28.
    Choosing Features1. Intuitionto choose many possible important features2. Remove features that you can’t trust3. Approximate Importance of features4. Train model5. Re-Train model on subsets of feature list and choose features that yield highest accuracy
  • 29.
    Problems Comparing FeaturesBetween Our UsersSelect Features Common to payers and NonpayersKeep Distributions IntactOur Experience:Had to compare users stats right before their first purchase (behavior on site changes after first purchase)-100 Plays-20 days on site-Paid $20-10 Plays-1 days on site-Didn’t pay
  • 31.
    How to Formatthe Data for Your Model
  • 32.
  • 33.
    “What am Iinvestigating?”“Where do I start?”“What data do I use?”“How do I model my data?”“What is the data telling me?”“What do I do with my new insights?”“How do I know my insights are working?”
  • 34.
    What does aClassification Model do?Model ‘learns’ to classify apples and orangesClassification ModelLabeled Apples and Orangespruning, optimize parameters, weights, etcUnlabeled FruitClassification Model% Chance of Being an Apple (or orange)
  • 35.
    Applying a PredictiveModelPurposeWe are a startup, we need quick results, interpretation, and actionDecision TreePros:Easily Understood / interpretedCalculates QuicklyConsLocal max only (greedy)Less Accuracy
  • 36.
  • 37.
    What is aDecision Tree?
  • 38.
  • 39.
    PurityMeasure Homogeneity ofthe LabelsDegree of Homogeneity is measured through:EntropyGini Index others= probability of occurence of class j in the sample
  • 40.
    Decision Tree Algorithm(simplified)Calculate Impurity for Original Sample (probability for each )P(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)2. Calculate Information Gain for each possible attribute splitsplit table -- difference in impurity measure is called the Information Gain IG= E(Original Table) – Sum_i (n * k /n * E( Feature_table_i )3. Choose the attribute split that results in the highest information gain4. Remove Splitting Attribute, Recusively keep splitting on highest information gain attribute – Done when no more attributes, information gain is too tiny, or max depth of tree
  • 41.
    Calculate Impurity forSampleP(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)
  • 42.
    CALCULATE IMPURITY FOREACH INDIVIDUAL FEATURENoYes
  • 43.
    Decision Tree Algorithm(simplified)Calculate Entropy for Original Table(probability for each )P(Payer) = P, P(nonpayer) = N, use relative frequenciesEntropy: -P*log(P) + -N*log(N)**Entropy of a single label is zero ( if 1 class C, P(C) =1, thus log(1) = 0)2. Take the difference in entropy between our original table and the weighted sum of the split tables -- difference in impurity measure is called the Information Gain IG= E(Original Table) – Sum_i (n * k /n * E( Feature_table_i )3. Choose the attribute split that results in the highest information gain4. Keep splitting on highest information gain attributes, repeat until a certain depth of tree has been reached, or until a certain lower threshold of information gain is achieved
  • 44.
    Setting Up aDecision Tree Using RapidMiner(http://archive.ics.uci.edu/ml/datasets/Wine)MerlotShirazMerlotCabernetMerlot
  • 45.
    some features# friends#playsWin percentagescoins earnedPhotos UploadedCoins spentPurchases of different virtual items# Plays for each gameFill rate for each gameGame LengthsFacebook / Myspace / aimGifts sent / receivedLocationetc
  • 46.
    Can Use impurityfor a quick ‘approximate’ ranking of feature importance for segmentingThe highest information gain split, the more relevant the feature is for segmenting
  • 47.
    Ideas for whatfeatures seperated nonpaying users from paying users?
  • 48.
    Results showed fourdifferent ‘groups’ of usersPeople who hadn’t interacted with goods or virtual currencypeople who got just a free virtual good people who bought 1-3 virtual goods and spent at least 1 virtual currencyPeople who bought 7.5 + virtual itemsGroup 1 had almost no people who spent real $$, group 4 had the most. Intuitive! The goal is then to take a smaller step and get people to interact with the virtual goods and currency from day one.
  • 49.
  • 50.
  • 52.
    “So now whatdo I do? How do I take action?”
  • 53.
    Payers and Nonpayersare having different experiences!Most people purchase at the START of their experience (seen in distribution of payers)People who spend $$ are those who spend their virtual currency buying virtual goodsExtracting Insights From The Model
  • 54.
    We want thenonpaying user to have the same experience as the paying user
  • 55.
  • 56.
    On a website…Soyou cant click for the user, but...You control the flows, i.e you ‘direct’ people where to click, and have HUGE INFLUENCE over what users do on your siteOnly one link? No where else to go.
  • 57.
    We want peopleto buy more virtual items and spend more coins at the start of their experience…and we can direct where people click…So?
  • 58.
    On the firstlogin, directFORCE all new users to spend coins buying virtual items!
  • 59.
    TheorySomeone can easilygo through the site without EVER having spent a single coin.Lubricate the purchasing processHabituate users to spend / buy early and oftenGetting people to spend more coins will increase conversionForcing NOT the same as User Elected Action, But it Likens their Experiences!
  • 60.
    A Quick LookBackExtract ‘Insights’ from the modelSee most relevant separating featuresBridge the gap between separating features
  • 61.
    Data at theHelmHard to make ‘Guiding Decisions’Data Inspires Confidence in decisions moving forwardWorse Case - learn how data changed in response to change on website, new insight!
  • 62.
    “So I implementedthis change, how can I tell if its working?”
  • 63.
    A/B Testing Show someusers layout ‘A’Show other users layout ‘B’Measure how many people who see layout ‘A’ who do some action vs people who saw layout ‘B’Choose the layout that had the highest conversion to actionGoogle Web Optimizer for HTML A/B Testing
  • 64.
    Implementing an A/BTestGROUP A (control group - no changes)GROUP B (Force Buy)SignupPlayLeavesPut User Into Shop, Give Coins, Popup to Buy a Virtual itemSignupPlayLeaves
  • 65.
    Now Click ‘Buy’to buy this cool armor for your character!
  • 66.
    Which Measurement TellsMe that My Change is Successful?Choose the test group that spent more time in the Virtual item shop?Choose the test group that bought more virtual items and spent more coins?Choose the test group with the highest LTV!
  • 68.