1. 1
Reporting, Analytics, and Tableau at Spil Games
0-100 KPH in One Year
Presented by:
Rob Winters
30 October 2012
2. 2
• 200 Million UVs per Month (Google Analytics)
• 2-3 Billion pageviews per month, >1 Billion game plays
• Local portals in 19 languages with traffic from 219 countries a month
• Developer, Publisher, and Platform
• Target audience: Girls, Boys, Adult females
Who is Spil Games?
Titles
Portals
3. 3
• Slowing growth in core markets plus increased competition equaled less
rapid EBITDA growth
• Focus on personalization and user-centricity (changing market
expectations)
• Change in revenue streams and products (Advertising to End User
Monetization)
Why Reporting and Analytics Became Important
These guys showed that data mattered
4. 4
August 2011: Starting from Ground Zero
Data “Piggy Bank”
• 5 GB of data
• Unindexed/Unusable
• Direct copies of some production data
Reporting
• Two Dashboards
• Manually generated weekly
“Analytics”
5. 5
Analytics
ReportingData Platform
Today we have a different landscape
Data Warehouse
• >700GB Compressed/>2,5TB Uncompressed
• Largest tables load >50M records/day
MapReduce
• >150M events/day
• >500M events/day by Q1 2013
Tableau Server
• Daily, Weekly, and Monthly Push Reports
• >192 views, 5,5 GB of extracts and data sources
• 100+ sessions per day
Tableau Desktop plus R
• Five desktop users of Tableau
• Driving company forecast process and planning
• Analysis forms backbone of new strategies
9. 9
• Reports are scheduled and pushed daily, weekly, and monthly from the Tableau Server
• Tools used:
• Tableau Server + tabcmd: pull reports
• sqlRun: ODBC command line tool used to build message bodies
• BLAT: command line email client
Push Reporting
SQL
Processes
•Grab key values from DWH
•Log into Tableau Server Postgres DB, confirms that the extract update ran
Build
message
body
•Modify DB outputs to match expected format (ex. “.”->”,”)
•Echo text into body text file
Pull reports
from Tableau
Server
•start /wait "Logoff" Tabcmd" logout
•start /wait "Login" "Tabcmd" login -s https://reporting.spilgames.com -u USERNAME -p PASSWORD
•start /wait "Pull Report" "Tabcmd" get views/DailyEUMRevenueReport/DailyEUMRevenueReportPush.pdf -f
"X:Reporting dataToday's ReportsDaily EUM Report for %AWSDT%.pdf"
Send Emails
and Archive
•Email list managed via ActiveDirectory, emails pushed via SMTP
•Every push report archived to share drive to have permanent record
11. 11
• 25 power users (10% of local office)
• 5 sites:
• Reporting
• Development
• Three sites for business partners
• 87 workbooks, 192 views
• >100 sessions per day
• One dashboard accounts for 50% of all views
Web reporting platform
Reporting Platform
12. 12
Change to the site Issue
Added dashboards to various pages to
show report update status, primary
KPIs on the landing page, etc
Slow load speed of reports caused
issues; different permissions between
sites led to partners having issues
Custom HTML on the page to link to
documentation, report request forms,
and email the reporting team
Due to Tableau’s “update” process,
HTML would have to be manually
replaced with each version change
Custom CSS to match branding Same issue as custom HTML
What we have done that DIDN’T work and other issues
13. 13
• Tableau butchers custom SQL. when possible, use views, tables, or projections
• Huge amounts of usage data are available on the server back-end, use it to your
advantage.
• Tabcmd can handle custom variables easily, opening the potential for users to
request highly personalized reports (or batch produce reports with a loop)
• Balance flexibility with data size, and use extracts for reports which require significant
dimensionality
• If using the server for multiple functions (ex. reporting AND analysis-sharing), make
separate sites to avoid confusion on data quality
• Use parameters and actions to make your report dynamic
• You can lead a horse to water, but you can’t make them drink
• Make it easy to search with tags
• Provide easy access to documentation and contact forms
• Resist the urge to make duplications of data for users wanting slightly modified
reports
Recommendations for Reporting
15. 15
Analytics at Spil: many tools make effective work
R plus Tableau combine to form a
well-trained athlete
Explore
data in
SQL and
Tableau
Build
Models in
R and
evaluate
Test via
A/B
Testing
Implement
reporting
with
Tableau
Both form a critical part of
Spil’s analytics
But for simple problems,
Tableau is sufficient
16. 16
When we use Tableau When we use R
Multidimensional trending analysis
(including comparing trends)
Modeling/forecasts (ARIMA,
regression, etc)
Distribution analysis Seasonal decomposition
Visualization of small multiples Tree-based analysis
Exploratory analysis Statistical analysis (correlations, t-
tests, ANOVA)
Data mining
We have found each tool is optimal for different purposes
17. 17
• Structuring your data BEFORE Tableau forces you to consider dimensions/attributes.
SQL and Hive are your friends.
• We are TOO good at seeing patterns, so “trust but verify” what you learn from
Tableau with more robust tools like SAS or R.
• Remember Occam’s Razor: Use the simplest possible visualization that can
accurately convey the information but no simpler
Analysis Advice
19. 19
Most content on the home page was geared to the under-12
audience, yet analysis showed older users were more valuable
Can we ensure that the
content interests for the
most valuable users are
met?
20. 20
Work flow:
1. A variety of base and calculated variables were created in SQL and
loaded into a reference table
2. Data was loaded into Tableau and explored to find “natural”/visual
relationships and break points
3. New variables were created or added based on visual exploration
4. Revised data set was loaded into R for modeling
• Step 1: Stepwise logistic models predicting probability of game play based on
variables from step 3
• Step 2: Build behavioral clustering models and compare to demographic
segmentation
• Step 3: Model other industry standard approaches (ex. slope one, cosine
similarity) in R and measure reduction in AIC
5. Users were assigned to appropriate clusters and distributions of various
variables explored in R
6. Models were tuned and made ready for production
Tableau and R were used simultaneously to accelerate analysis
and modeling process
21. 21
Segmentation: Kmeans clustering on 30+ factors,
dividing the user base in 30 different behavioral
segments plus demographic boosting
Ultimately, an ensemble of models were built to recommend
content
Content selection: Ensemble model based on
drivers predictive of play
• Cosine similarity of player bases and probability
of play
• Weighted slope one modeling of relative play
rates
• General user feedback from user ratings and
relative time on page
23. 23
Bottoms up ARIMA forecast is generated for each core market/business
channel/traffic source split (approximately 500 forecasts)
1. Traffic (visits) are forecast using R’s auto-ARIMA functionality
• Multiple ARIMA models plus time series linear regressions are built and
compared based on AIC/AICc, with the best-fit model selected
• Forecasts are then rolled up to market/channel level (approx. 120 forecasts)
2. Primary interactions (casual and social gameplays) are forecast on a per-visit basis
based on historical patterns and known seasonality matrices
3. Secondary interactions (navigational pageviews) are forecast based on primary
interaction forecasts, historical data, and other regressors
4. Advertising impressions are loaded into the model on a market/channel/page type
basis to generate total impressions by type and location
5. eCPMs are forecast on a market/channel/page type basis
6. Forecast data is aggregated and loaded into the data warehouse for tracking
Step One: Initial forecast is built using R
Completely parallel: Using a quad-core machine, total forecasting time is under one hour per month
24. 24
Step two: Exploratory variance analysis using Tableau
Why Tableau:
• Faster than R for rapid exploration
• Flexible adjustment of plot structure
while exploring data with leadership
• Clear visualization without planning
25. 25
Step Three: Modify forecast with business and load adjusted
forecast to data warehouse
Channel Family Family Family Family Family Family Family Family Family Family Family Family Family
Jan 12 Feb 12 Mar 12 Apr 12 May 12 Jun 12 Jul 12 Aug 12 Sep 12 Oct 12 Nov 12 Dec 12
Grand
Total
Austria 1.0 1.1 1.0 1.0 1.0 1.0 1.0 .9 .8 1.0 1.0 1.0 11.8
Belgium 2.5 2.7 2.4 2.5 2.3 2.5 2.3 2.4 2.2 2.3 2.5 2.7 29.2
France 17.1 18.0 16.8 17.2 16.6 16.9 16.1 15.5 14.1 15.9 16.4 18.5 199.2
Germany 12.9 12.8 12.8 12.4 12.1 12.8 12.7 11.3 10.5 11.2 11.6 12.6 145.6
Italy 18.7 20.2 19.0 19.3 19.1 21.0 17.8 16.6 17.1 16.7 16.9 17.6 220.0
Netherlands 5.6 5.4 5.2 5.0 4.9 5.2 4.7 4.2 4.0 4.6 4.6 4.9 58.3
Poland 34.1 34.8 33.2 30.9 28.7 31.4 28.8 29.5 25.2 26.7 29.2 33.8 366.2
Portugal 2.0 2.0 2.3 2.3 2.2 2.6 2.6 2.5 2.1 1.9 1.9 2.3 26.9
Russia 5.8 5.7 6.2 5.4 5.1 4.2 3.5 3.7 3.4 4.1 4.5 4.7 56.2
Spain 5.2 5.4 5.4 5.6 5.3 5.8 5.1 5.1 4.9 4.6 4.4 5.3 62.1
Sweden 3.1 3.0 2.9 2.8 2.6 2.6 2.1 2.3 2.2 2.6 2.7 3.0 31.9
Switzerland .9 .9 .9 .9 .9 .9 .8 .8 .7 .9 .9 .9 10.5
Ukraine 1.5 1.6 1.7 1.5 1.3 1.1 .9 .9 .8 .9 .9 1.0 14.1
United Kingdom 2.5 2.7 2.6 2.7 2.6 2.6 2.7 2.4 2.0 2.4 2.4 2.8 30.6
United States 5.1 5.3 5.4 5.0 5.1 5.4 5.4 4.8 4.3 4.4 4.8 5.2 60.3
Canada 1.1 1.1 1.2 1.1 1.0 1.0 1.0 1.0 .9 1.0 1.0 1.1 12.5
Turkey 29.9 25.0 25.8 23.5 24.0 24.7 23.6 23.8 22.0 20.7 20.9 21.1 285.1
India .5 .5 .7 .8 1.0 .9 .6 .6 .6 .6 .6 .6 8.0
Indonesia .7 .5 .6 .7 .8 .9 1.0 1.0 .7 .6 .7 .8 9.0
Argentina 9.7 10.1 9.9 9.4 10.0 10.6 11.5 10.8 9.5 9.7 8.2 9.5 119.0
Brazil 27.2 23.7 22.8 21.7 22.8 24.0 24.7 22.7 20.5 20.6 19.2 22.5 272.6
Mexico 12.0 12.0 13.2 13.4 13.8 14.4 14.4 13.3 10.2 10.0 9.6 11.3 147.6
LATAM 17.6 16.6 16.8 16.4 16.4 18.2 18.6 18.0 15.3 15.3 14.2 16.5 199.8
ROW 15.7 13.8 14.5 13.5 13.7 14.8 14.4 13.2 11.2 10.9 11.2 12.6 159.6
Grand Total 232.6 225.1 223.3 214.9 213.5 225.8 216.4 207.3 185.2 189.4 190.3 212.4 2536.1
Austria .0
Belgium .0
France .0
Germany .0
Italy .0
Netherlands .0
Poland .0
Portugal .0
Russia .0
Spain .0
Sweden .0
Switzerland .0
Ukraine .0
United Kingdom .0
United States .0
Canada .0
Turkey .0
India .0
Indonesia .0
Argentina .0
Brazil .0
Mexico .0
LATAM .0
ROW .0
Grand Total .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0
Austria 1.0 1.1 1.0 1.0 1.0 1.0 1.0 .9 .8 1.0 1.0 1.0 11.8
Belgium 2.5 2.7 2.4 2.5 2.3 2.5 2.3 2.4 2.2 2.3 2.5 2.7 29.2
France 17.1 18.0 16.8 17.2 16.6 16.9 16.1 15.5 14.1 15.9 16.4 18.5 199.2
Germany 12.9 12.8 12.8 12.4 12.1 12.8 12.7 11.3 10.5 11.2 11.6 12.6 145.6
Italy 18.7 20.2 19.0 19.3 19.1 21.0 17.8 16.6 17.1 16.7 16.9 17.6 220.0
Netherlands 5.6 5.4 5.2 5.0 4.9 5.2 4.7 4.2 4.0 4.6 4.6 4.9 58.3
Poland 34.1 34.8 33.2 30.9 28.7 31.4 28.8 29.5 25.2 26.7 29.2 33.8 366.2
Portugal 2.0 2.0 2.3 2.3 2.2 2.6 2.6 2.5 2.1 1.9 1.9 2.3 26.9
Russia 5.8 5.7 6.2 5.4 5.1 4.2 3.5 3.7 3.4 4.1 4.5 4.7 56.2
Spain 5.2 5.4 5.4 5.6 5.3 5.8 5.1 5.1 4.9 4.6 4.4 5.3 62.1
Sweden 3.1 3.0 2.9 2.8 2.6 2.6 2.1 2.3 2.2 2.6 2.7 3.0 31.9
Switzerland .9 .9 .9 .9 .9 .9 .8 .8 .7 .9 .9 .9 10.5
Ukraine 1.5 1.6 1.7 1.5 1.3 1.1 .9 .9 .8 .9 .9 1.0 14.1
United Kingdom 2.5 2.7 2.6 2.7 2.6 2.6 2.7 2.4 2.0 2.4 2.4 2.8 30.6
United States 5.1 5.3 5.4 5.0 5.1 5.4 5.4 4.8 4.3 4.4 4.8 5.2 60.3
Canada 1.1 1.1 1.2 1.1 1.0 1.0 1.0 1.0 .9 1.0 1.0 1.1 12.5
Turkey 29.9 25.0 25.8 23.5 24.0 24.7 23.6 23.8 22.0 20.7 20.9 21.1 285.1
India .5 .5 .7 .8 1.0 .9 .6 .6 .6 .6 .6 .6 8.0
Indonesia .7 .5 .6 .7 .8 .9 1.0 1.0 .7 .6 .7 .8 9.0
Argentina 9.7 10.1 9.9 9.4 10.0 10.6 11.5 10.8 9.5 9.7 8.2 9.5 119.0
Brazil 27.2 23.7 22.8 21.7 22.8 24.0 24.7 22.7 20.5 20.6 19.2 22.5 272.6
Mexico 12.0 12.0 13.2 13.4 13.8 14.4 14.4 13.3 10.2 10.0 9.6 11.3 147.6
LATAM 17.6 16.6 16.8 16.4 16.4 18.2 18.6 18.0 15.3 15.3 14.2 16.5 199.8
ROW 15.7 13.8 14.5 13.5 13.7 14.8 14.4 13.2 11.2 10.9 11.2 12.6 159.6
Grand Total 232.6 225.1 223.3 214.9 213.5 225.8 216.4 207.3 185.2 189.4 190.3 212.4 2536.1
1. Forecast data is loaded into Excel template (right)
to load in adjustments
2. Channel/market leaders provide feedback on
initiatives and expected impact, along with non-
initiative adjustments (if needed)
3. Revised forecast data is committed and
uploaded; primary outputs (visits, pageviews,
gameplays, advertising revenues) are
recalculated
4. Final forecast is shared with Management Team