Talk will cover the journey of data platform design and implement for game analytics industry. I will tell about modern data stack. What tools and approaches are available on the market and how leading game companies engineer the data analytics solution and make better games with data insights.
2. Disclaimer All thoughts are mine. Based on my experience and
environment I worked over decade.
3. Outline and Take
Away points
Outline:
● About myself and Microsoft
● Data Analytics Framework History
● Game Analytics Intro
● Modern Data Stack Overview
● Architectures from the Industry
● The Coalition Data Stack
Takeaways:
● All modern game studios require
analytics
● Privacy is critical
● Modern Data Analytics Open source and
commercial products
● Reference Architecture for the Game
Platform
● Challenges during designing of analytics
solution
4. - 11+ years in Analytics
- Moscow, Montenegro, Winnipeg, Vancouver, Victoria, Seattle,
Boston
- 5 years @Amazon, now @Microsoft, The Coalition
- Tableau, Snowflake, Microsoft, AWS user groups and meetups
5.
6.
7. ● Gears of War: Ultimate Edition (XB1, Win10 | 2015 | Metascore 82/73)
● Gears of War 4 (XB1, Win10 | 2016 | Metascore 84/86)
● Gears Pop! (Mobile | 2019 )
● Gears of War 5 (XB1, Win10 | 2019 | Metascore 85/82)
● Gears Tactics (XB1, PC | 2020 | Metascore 81)
● Gears 5: Hivebusters (XB1, PC | 2020 | Metascore 82)
10. Gaming Data Consumers
● Leadership
● Producers
● Artists
● Game Play Engineers
● QA Engineers
● Community Managers
11. Microsoft Privacy and Online Safety
https://privacy.microsoft.com/en-ca/privacystatement https://support.xbox.com/en-CA/help/family-online-safety/online-safety/privacy
12. 3 Game Analytics Goals
Strategic Analytics - target the global view how the game should
evolve based on analysis of user behavior and the business model.
Tactical Analytics - inform game design at the short term.
Operational Analytics - analysis and evaluation in immediate situation.
13. Telemetry as a source of Player data
The word Telemetry is derived from the Greek roots tele,
"remote", and metron, "measure".
Games are state machines - a person creates a continual
loop of actions and responses which keep the game state
changing. Often loops keeping the user engages over a
period of time.
Telemetry helps to discovering who is performing what
action when and where in the game. It cannot provide why.
14. 3 types of metrics
Gameplay metrics
user behavior in the
game
Community metrics
user engagement in
communities and social
media
Customer metrics
user as a customer,
acquisition and
retention
15. Action Third-Person Shooters (TPS) Metrics
● Weapon use
● trajectory
● item/asset use
● character/kit choice
● level/map choice
● loss/win
● heatmaps
● team scores
● map lethality
● map balance
● vehicle use metrics
● special moves
● jumps and many more.
Death map, Halo3
https://coolinfographics.com/blog/2009/1/12/halo-3-
heatmaps.html
16. Role of Data Engineer
My role is a DE to make sure that we have a infrastructure in place to collect,
transform and consolidate data for customer, community and gameplay
metrics.
The infrastructure is responsible for Strategic and Tactical Analytics during
development and post production.
17. Key Milestones in the Analytics Industry
● Relational Databases
● Custom software
● MPP Data warehouse
● Enterprise ETL
● Enterprise BI Tools
● Data Mining Tools
● Big Data: Hadoop, Hive, Spark (on-premise)
● Data Lake
● DataScience, R, Python
● Cloud Computing
● AWS Redshift, Azure SQL
DW (Synapse), Google
BigQuery, Snowflake,
Databricks
● ML frameworks
● ETL -> ELT
20. Traditional approach
Batch (ETL)
Source Layer Data Processing Storage Business
Business
Intelligence
● Ad-hoc queries
● Pixel Perfect Reports
● Cross tables (Pivot)
Game Client
Data Warehouse
21. Data Storage Layer | Data Warehouse
SMP - Symmetric Multi-Processing
● Traditionally one server systems
● Data stored locally
● Processors share single OS, memory, I/O devices
● Scale-up only - physical limitations to scaling to
accommodate workload
MPP - Massively Parallel Processing
● Multi-node(server) systems
● Data stored externally
● Scale-out - add more Compute nodes, each with
dedicated CPU, memory & I/O subsystems
● No single point of contention
24. Analytics architecture evolution
- Prior 2010 mostly Data Warehouse (SMP, MPP).
- With rise of Hadoop - shift towards data lake. Decouple Compute and Storage but lack of ACID
(Atomicity, Consistency, Isolation, Durability).
- Lake house = Data Warehouse + Data Lake.
25. Lakehouse Options
● Transaction Support (ACID)
● Schema Enforcement
● Upserts/Deletes
Key solutions:
● Apache Hudi (Hadoop
Update Delete and
Incremental) by Uber
Engineering
● Apache Iceberg by Netflix
● Delta Lake by Apache Spark
27. Streaming
Batch
(ETL/ELT)
Modern Key Layers and roles
Source Layer Data Processing Storage
Science &
Experimentation
Business
Datascience
Machine
Learning
Business
Intelligence
Data Engineer
ML Engineer
Data Scientist
BI Engineer
Product Manager - manage data product.
Game Client
28. Modern Data Stack with Open Source
Source Layer Data Processing Storage
Science &
Experimentation
Business
Spark Pool
MLlib
Game Client
29. Event Hub
Stream, Analytics
Batch
(ETL/ELT)
Modern Data Stack with Microsoft Azure
Source Layer Data Processing Storage
Science &
Experimentation
Business
Spark pools
Spark Pool
MLlib
Azure ML
(not in
Synapse)
Serverless
Pool
Azure
Synapse
Studio
Dedicated
SQL pool
Azure Data
Lake v2 Serverless
Pool
30. ADX | Ingesting
Modern Data with Azure Data Explorer (ADX)
Source Layer Data Processing Storage
Science &
Experimentation
Business
ADX | Data
Science
Azure Data
Explorer | Storage
Kusto / ADX
38. How it was
Source Layer Data Processing Storage
Science &
Experimentation
Business
Azure Cloud
On-Premise
Game
Client/Server
The Coalition
Data Lake
39. Event Grid
Streaming
Azure Data
Factory |
Batch (ELT)
How it is going
Source Layer Data Processing Storage
Science &
Experimentation Business
The Coaltion
Data Lake
Game
Client/Server
Spark Structured
Streaming*
Azure Data Lake
Storage
V2(Compute)
Spark MLlib
*Spark Structured Streaming – not in production. We are testing it.
**Spark Mllib and Mlflow – part of the future vision
40. Data Engineering Design Flow as a Funnel
Event Names:
● Weapon Use
● Damage
● Shooting
● Flock
● Map Name
● HeartBeat
● and so on
Raw Tables (Bronze)
Method: Append
Trans: Minimum
Staging Tables (Silver)
Method: Append
Trans: JSON Schema
Fact Tables (Gold)
Method: Merge
Trans: Heavy
41. ● Cross team collaboration between SDE vs DE, BI vs DE, DE vs DS
● Low data volume before Launch
● Schema Evolution
● Cost and Budgeting
● Security best practice (for example credentials)
● Privacy and compliance (GDPR, HIPAA lack of data form ML/AI)
● Data Quality at Scale (Deequ, Great Expectations)
● Responsible AI
Key Challenges
42. ● There is no bad solution/vendor
● Focus on business outcome (working
backwards)
● Engineering Excellence (dev/prod, CI/CD)
● You can build solution using Code (Python, Java,
Scala, SQL and so on) or GUI (with some
restrictions).
● Security and Privacy best practices
Summary
43. For more information visit: https://www.thecoalitionstudio.com/join-us/
The Coalition is looking for talented and diverse people to join our squad, with exciting opportunities across our Art, Design, Engineering, and
Production teams.