FROM BIG DATA 
TO ACTIONABLE 
ANALYTICS 
Tuomas Rinta, Development Director 
Everyplay / Unity Technologies
So 
what 
is 
?
and 
numbers 
• Live 
in 
about 
1000 
games 
across 
iOS 
and 
Android 
• Nearly 
100 
million 
game 
sessions 
recorded 
daily 
• About 
2 
billion 
events 
of 
usage 
data 
generated 
every 
week
Why 
do 
we 
care 
about 
big 
data? 
• Mobile 
games, 
especially 
free-­‐to-­‐play, 
live 
and 
die 
by 
their 
metrics 
• Providing 
a 
service 
for 
game 
developers 
must 
have 
proven 
value, 
and 
each 
opFmizaFon 
counts
So 
let’s 
talk 
about 
how 
we 
use 
big 
data, 
and 
how 
we 
got 
started
Our 
goal 
“How 
do 
we 
create 
a 
metrics-­‐driven 
product 
based 
on 
big 
data?”
This 
needs 
to 
be 
as 
quick 
as 
possible 
Collect 
data 
Analyze 
Improve 
product 
Create 
A/B 
tests
Challenges 
• We 
ship 
an 
SDK 
– 
and 
normal 
update 
cycle 
by 
clients 
can 
be 
as 
long 
as 
6-­‐12 
months, 
not 
very 
dynamic 
– This 
conflicts 
with 
the 
fast 
improvement 
cycle 
– Technology 
must 
adapt 
to 
supporFng 
big 
data 
• The 
product 
evolves 
constantly 
– AnalyFcs 
requirements 
change 
constantly
SDK 
is 
instrumented 
to 
send 
everything 
the 
user 
does 
to 
the 
servers 
Scribe 
Amazon S3 
Real-time production system Batch data processing 
Apache 
Pig
Tackling 
evolving 
analy;cs
Issues 
with 
big 
data 
and 
analyFcs 
• AnalyFcs 
requirements 
change 
• RedshiS 
is 
based 
on 
PostgreSQL, 
so 
there 
needs 
to 
be 
a 
scheme 
– Schemes 
are 
the 
most 
restricFve 
factor 
with 
RedshiS 
• How 
does 
that 
work 
with 
evolving 
analyFcs? 
• Everything 
would 
be 
easy 
if 
there 
weren’t 
billions 
of 
rows 
of 
data…
How 
should 
data 
be 
reported? 
• Choosing 
how 
the 
end-­‐user 
instrumentaFon 
sends 
events 
is 
crucial 
• Bad 
format 
of 
events 
can 
make 
analyFcs 
from 
big 
data 
nearly 
impossible 
• You 
don’t 
always 
know 
before-­‐hand 
what 
you 
need
Two 
possible 
approaches 
Separate events 
Example of video sharing: 
openVideoEditor 
trimButtonPressed 
undoTrimPressed 
activateFacecamRecording 
finishFacecamRecording 
shareButtonPressed 
• More flexible with a schema-based 
database 
• Requires much more 
data processing 
• Combining events can be 
a hassle 
Conversions with properties 
Example of video sharing: 
{event: “videoShareComplete”, 
{properties: 
[ {didTrimVideo: true}, 
{isVideoTrimmed: false}, 
{didUseFacecam: true}, 
{isFacecamEnabled: true}, 
{totalDuration: 1241} 
] 
} 
} 
• Problematic with a schema-based 
database 
• Easier and faster to process 
• All relevant data is pre-aggregated
“What 
about 
Postgre 
and 
JSON?” 
• Yes, 
Postgre 
allows 
parsing 
of 
JSON 
documents 
which 
allows 
arbitrary 
format 
of 
event 
data 
• However, 
when 
your 
data 
gets 
big, 
this 
comes 
with 
a 
warning…
Comparing 
querying 
fields 
and 
JSON 
Normal 
query: 
select count(*) from events where 
created > ‘2014-09-01’ and 
event_type=‘recordSessionClosed’; 
Vs. 
JSON-­‐based: 
select count(*) from events where 
created > ‘2014-09-01’ and 
json_extract_path_text(event_json, 
‘event_type’) = 
‘recordSessionClosed’
Results 
1400 
1200 
1000 
800 
600 
400 
200 
0 
Execution time (in seconds) 
Normal JSON
So 
what’s 
the 
best 
soluFon? 
• Combining 
single-­‐event 
sending 
with 
extra 
JSON-­‐ 
properFes 
• Querying 
the 
JSON-­‐properFes 
is 
slow, 
so 
we 
store 
only 
informaFon 
that 
is 
not 
needed 
that 
much 
there 
(drill-­‐down 
informaFon)
How 
do 
we 
then 
analyse 
the 
data? 
• Most 
on-­‐the-­‐market 
soluFons 
fell 
short 
due 
to 
– Pricing 
– Features 
– Availability 
• Turned 
out 
to 
be 
easier 
to 
“roll 
your 
own”
Solving 
an 
actual 
problem 
“What 
are 
the 
worst 
drop-­‐ 
off 
points 
for 
uploading 
a 
replay?”
Tools 
• SQL 
• JavaScript 
• Google 
Charts 
visualisaFon 
library
Why 
JavaScript 
for 
processing? 
• Dynamic, 
fast, 
relaFvely 
well-­‐known 
• Excellent 
libraries 
for 
data 
visualisaFon 
– Highcharts, 
Google 
Charts, 
D3.js, 
Dygraph 
• Good 
for 
visualizing 
data, 
but 
that’s 
it
Keys 
to 
a 
successful 
data-­‐driven 
product 
• Plan 
ahead 
for 
analyFcs 
and 
leave 
room 
for 
an 
evolving 
product 
• If 
metrics 
and 
analyFcs 
are 
not 
easily 
accessible 
by 
decision 
makers, 
they 
are 
worthless 
– 
self-­‐updaFng 
dashboards 
are 
one 
of 
the 
main 
keys 
to 
success 
• Build 
A/B 
tesFng 
and 
data-­‐driven 
behaviour 
directly 
into 
your 
product, 
don’t 
hack 
it 
on 
later
Thank 
you! 
Questions, comments? 
Email: tuomas@unity3d.com 
Twitter: @trinta 
developers.everyplay.com
Q&A
THANK YOU

[2C6]Everyplay_Big_Data

  • 2.
    FROM BIG DATA TO ACTIONABLE ANALYTICS Tuomas Rinta, Development Director Everyplay / Unity Technologies
  • 3.
  • 5.
    and numbers •Live in about 1000 games across iOS and Android • Nearly 100 million game sessions recorded daily • About 2 billion events of usage data generated every week
  • 6.
    Why do we care about big data? • Mobile games, especially free-­‐to-­‐play, live and die by their metrics • Providing a service for game developers must have proven value, and each opFmizaFon counts
  • 7.
    So let’s talk about how we use big data, and how we got started
  • 8.
    Our goal “How do we create a metrics-­‐driven product based on big data?”
  • 9.
    This needs to be as quick as possible Collect data Analyze Improve product Create A/B tests
  • 10.
    Challenges • We ship an SDK – and normal update cycle by clients can be as long as 6-­‐12 months, not very dynamic – This conflicts with the fast improvement cycle – Technology must adapt to supporFng big data • The product evolves constantly – AnalyFcs requirements change constantly
  • 11.
    SDK is instrumented to send everything the user does to the servers Scribe Amazon S3 Real-time production system Batch data processing Apache Pig
  • 12.
  • 13.
    Issues with big data and analyFcs • AnalyFcs requirements change • RedshiS is based on PostgreSQL, so there needs to be a scheme – Schemes are the most restricFve factor with RedshiS • How does that work with evolving analyFcs? • Everything would be easy if there weren’t billions of rows of data…
  • 14.
    How should data be reported? • Choosing how the end-­‐user instrumentaFon sends events is crucial • Bad format of events can make analyFcs from big data nearly impossible • You don’t always know before-­‐hand what you need
  • 15.
    Two possible approaches Separate events Example of video sharing: openVideoEditor trimButtonPressed undoTrimPressed activateFacecamRecording finishFacecamRecording shareButtonPressed • More flexible with a schema-based database • Requires much more data processing • Combining events can be a hassle Conversions with properties Example of video sharing: {event: “videoShareComplete”, {properties: [ {didTrimVideo: true}, {isVideoTrimmed: false}, {didUseFacecam: true}, {isFacecamEnabled: true}, {totalDuration: 1241} ] } } • Problematic with a schema-based database • Easier and faster to process • All relevant data is pre-aggregated
  • 16.
    “What about Postgre and JSON?” • Yes, Postgre allows parsing of JSON documents which allows arbitrary format of event data • However, when your data gets big, this comes with a warning…
  • 17.
    Comparing querying fields and JSON Normal query: select count(*) from events where created > ‘2014-09-01’ and event_type=‘recordSessionClosed’; Vs. JSON-­‐based: select count(*) from events where created > ‘2014-09-01’ and json_extract_path_text(event_json, ‘event_type’) = ‘recordSessionClosed’
  • 18.
    Results 1400 1200 1000 800 600 400 200 0 Execution time (in seconds) Normal JSON
  • 19.
    So what’s the best soluFon? • Combining single-­‐event sending with extra JSON-­‐ properFes • Querying the JSON-­‐properFes is slow, so we store only informaFon that is not needed that much there (drill-­‐down informaFon)
  • 20.
    How do we then analyse the data? • Most on-­‐the-­‐market soluFons fell short due to – Pricing – Features – Availability • Turned out to be easier to “roll your own”
  • 21.
    Solving an actual problem “What are the worst drop-­‐ off points for uploading a replay?”
  • 22.
    Tools • SQL • JavaScript • Google Charts visualisaFon library
  • 24.
    Why JavaScript for processing? • Dynamic, fast, relaFvely well-­‐known • Excellent libraries for data visualisaFon – Highcharts, Google Charts, D3.js, Dygraph • Good for visualizing data, but that’s it
  • 25.
    Keys to a successful data-­‐driven product • Plan ahead for analyFcs and leave room for an evolving product • If metrics and analyFcs are not easily accessible by decision makers, they are worthless – self-­‐updaFng dashboards are one of the main keys to success • Build A/B tesFng and data-­‐driven behaviour directly into your product, don’t hack it on later
  • 26.
    Thank you! Questions,comments? Email: tuomas@unity3d.com Twitter: @trinta developers.everyplay.com
  • 27.
  • 28.