The Game of Big Data!
Analytics Infrastructure at KIXEYE
Randy Shoup 
@randyshoup
linkedin.com/in/randyshoup

QCon New Yor...
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Free-to-Play Real-time Strategy
Games
 •  Web and mobile
•  Strategy and tactics
•  Really real-time J
•  Deep relationsh...
Intro: Analytics at KIXEYE
User Acquisition

Game Analytics

Retention and Monetization

Analytic Requirements
User Acquisition
Goal: ELTV > acquisition cost
•  User’s estimated lifetime value is more than it
costs to acquire that us...
Game Analytics
Goal: Measure and Optimize “Fun”
•  Difficult to define 
•  Includes gameplay, feature set, performance, bugs...
Retention and Monetization
Goal: Sustainable Business
•  Monetization drivers
•  Revenue recognition

Mechanisms
•  Pricin...
Analytic Requirements
•  Data Integrity and Availability
•  Cohorting
•  Controlled experiments
•  Deep ad-hoc analysis
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
V1 Analytic System
Grew Organically
•  Built originally for user acquisition
•  Progressively grown to much more

Idiosync...
V1 Analytic System
Many Issues
•  Very slow to query
•  No data standardization or validation
•  Very difficult to add a ne...
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Goals of Deep Thought
Independent Scalability
•  Logically separate, independently scalable tiers
Stability and Outage Rec...
Goals of Deep Thought
In-Stream Event Processing
•  Sessionalization, Dimensionalization, Cohorting
Queryability
•  Struct...
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Core Capabilities
•  Sessionalization
•  Dimensionalization
•  Cohorting
Sessionalization
All events are part of a “session”
•  Explicit start event, optional stop event
•  Game-defined semantics
...
Sessionalization
Time-Series Aggregations
•  Configurable metrics
•  1-day X, 7-day X, lifetime X
•  Total attacks, total t...
Dimensionalization
Pipeline assigns unique numeric id to string enums
•  E.g., “twigs” resource  id 1234

Automatic mappi...
Dimensionalization
Metadata enumeration and manipulation
•  Easily enumerate all values for a field
•  Merge multiple value...
Cohorting
Group players along any dimension / metric
•  Well beyond classic age-based cohorts

Core analytical building bl...
Cohorting
Set-based
•  Overlapping groups: >100, >200, etc.
•  Exclusive groups: (100-200), (200-500), etc.

Time-based
• ...
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Implementation of Pipeline
•  Ingestion
•  Event Log
•  Transformation
•  Data Storage
•  Analysis and Visualization
Loggi...
Ingestion: Logging Service
HTTP / JSON Endpoint
•  Play framework
•  Non-blocking, event-driven

Responsibilities
•  Messa...
Event Log: Kafka
Persistent, replayable pipe of events
•  Events stored for 7 days

Responsibilities
•  Durability via rep...
Transformation: Importer
Consume Kafka topics, rebroadcast
•  E.g., consume batches, rebroadcast
events

Responsibilities
...
Transformation: Importer
Responsibilities (cont.)
•  Sessionalization
•  Assign event to session
•  Calculate time-series ...
Transformation: Importer
Responsibilities (cont.)
•  Cohorting
•  Process enter-cohort, exit-cohort
events
•  Process A / ...
Transformation: Session Store
Key-value store (Couchbase)
•  Fast, constant-time access to sessions,
players

Responsibili...
Storage: Hadoop 2
Camus MR
•  Kafka -> HDFS every 3 minutes

append_events table
•  Append-only log of events
•  Each even...
Storage: Hadoop 2
append_events -> base_events MR
•  Logical update of base_events
•  Update events with new metadata
•  S...
Storage: Hadoop 2
base_events table
•  Denormalized table of all events
•  Stores original JSON + decoration
•  Custom Ser...
Analysis and Visualization
Hive Warehouse
•  Normalized event-specific, game-
specific stores
•  Aggregate metric data for r...
Analysis and Visualization
Amazon Redshift
•  Fast ad-hoc querying

Tableau
•  Simple, powerful reporting
Logging	
  Servi...
Come Join Us!
KIXEYE is hiring in 

SF, Seattle, Victoria,
Brisbane, Amsterdam

rshoup@kixeye.com
@randyshoup
Deep Thought...
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/kixeye-
big-data
Upcoming SlideShare
Loading in …5
×

The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE

945 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/UkwXZ2.

Randy Shoup describes KIXEYE's analytics infrastructure from Kafka queues through Hadoop 2 to Hive and Redshift, built for flexibility, experimentation, iteration, componentization, testability, reliability and replay-ability. Filmed at qconnewyork.com.

As CTO, Randy brings 25 years of engineering leadership to KIXEYE, ranging from tiny startups to Internet-scale companies. Prior to KIXEYE, he was Director of Engineering at Google, leading several teams building Google App Engine, the world's largest Platform as a Service.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
945
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE

  1. 1. The Game of Big Data! Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /kixeye-big-data
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Free-to-Play Real-time Strategy Games •  Web and mobile •  Strategy and tactics •  Really real-time J •  Deep relationships with players •  Constantly evolving gameplay, feature set, economy, balance •  >500 employees worldwide
  5. 5. Intro: Analytics at KIXEYE User Acquisition Game Analytics Retention and Monetization Analytic Requirements
  6. 6. User Acquisition Goal: ELTV > acquisition cost •  User’s estimated lifetime value is more than it costs to acquire that user Mechanisms •  Publisher Campaigns •  On-Platform Recommendations
  7. 7. Game Analytics Goal: Measure and Optimize “Fun” •  Difficult to define •  Includes gameplay, feature set, performance, bugs •  All metrics are just proxies for fun (!) Mechanisms •  Game balance •  Match balance •  Economy management •  Player typology
  8. 8. Retention and Monetization Goal: Sustainable Business •  Monetization drivers •  Revenue recognition Mechanisms •  Pricing and Bundling •  Tournament (“Event”) Design •  Recommendations
  9. 9. Analytic Requirements •  Data Integrity and Availability •  Cohorting •  Controlled experiments •  Deep ad-hoc analysis
  10. 10. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  11. 11. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  12. 12. V1 Analytic System Grew Organically •  Built originally for user acquisition •  Progressively grown to much more Idiosyncratic mix of languages, systems, tools •  Log files -> Chukwa -> Hadoop -> Hive -> MySQL •  PHP for reports and ETL •  Single massive table with everything
  13. 13. V1 Analytic System Many Issues •  Very slow to query •  No data standardization or validation •  Very difficult to add a new game, report, ETL •  Extremely difficult to backfill on error or outage •  Difficult for analysts to use; impossible for PMs, designers, etc. … but we survived (!)
  14. 14. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  15. 15. Goals of Deep Thought Independent Scalability •  Logically separate, independently scalable tiers Stability and Outage Recovery •  Tiers can completely fail with no data loss •  Every step idempotent and replayable Standardization •  Standardized event types, fields, queries, reports
  16. 16. Goals of Deep Thought In-Stream Event Processing •  Sessionalization, Dimensionalization, Cohorting Queryability •  Structures are simple to reason about •  Simple things are simple •  Analysts, Data Scientists, PMs, Game Designers, etc. Extensibility •  Easy to add new games, events, fields, reports
  17. 17. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  18. 18. Core Capabilities •  Sessionalization •  Dimensionalization •  Cohorting
  19. 19. Sessionalization All events are part of a “session” •  Explicit start event, optional stop event •  Game-defined semantics Event Batching •  Events arrive in batch, associated with session •  Pipeline computes batch-level metrics, disaggregates events •  Can optionally attach batch-level metrics to each event
  20. 20. Sessionalization Time-Series Aggregations •  Configurable metrics •  1-day X, 7-day X, lifetime X •  Total attacks, total time played •  Accumulated in-stream •  V1 aggregate + batch delta •  Faster to calculate in-stream vs. Map-Reduce
  21. 21. Dimensionalization Pipeline assigns unique numeric id to string enums •  E.g., “twigs” resource  id 1234 Automatic mapping and assignment •  Games log strings •  Pipeline generates and maps ids •  No configuration necessary Fast dimensional queries •  Join on integers, not strings
  22. 22. Dimensionalization Metadata enumeration and manipulation •  Easily enumerate all values for a field •  Merge multiple values •  “TWIGS” == “Twigs” == “twigs” Metadata tagging •  Can assign arbitrary tags to metadata •  E.g., “Panzer 05” is {tank, mechanized infantry, event prize} •  Enables custom views
  23. 23. Cohorting Group players along any dimension / metric •  Well beyond classic age-based cohorts Core analytical building block •  Experiment groups •  User acquisition campaign tracking •  Prospective modeling •  Retrospective analysis
  24. 24. Cohorting Set-based •  Overlapping groups: >100, >200, etc. •  Exclusive groups: (100-200), (200-500), etc. Time-based •  E.g., people who played in last 3 days •  E.g., “whale” == ($$ > X) in last N days •  Autoexpire from a group without explicit intervention
  25. 25. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  26. 26. Implementation of Pipeline •  Ingestion •  Event Log •  Transformation •  Data Storage •  Analysis and Visualization Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  27. 27. Ingestion: Logging Service HTTP / JSON Endpoint •  Play framework •  Non-blocking, event-driven Responsibilities •  Message integrity via checksums •  Durability via local disk persistence •  Async batch writes to Kafka topics •  {valid, invalid, unauth} Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  28. 28. Event Log: Kafka Persistent, replayable pipe of events •  Events stored for 7 days Responsibilities •  Durability via replication and local disk streaming •  Replayability via commit log •  Scalability via partitioned brokers •  Segment data for different types of processing Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  29. 29. Transformation: Importer Consume Kafka topics, rebroadcast •  E.g., consume batches, rebroadcast events Responsibilities •  Batch validation against JSON schema •  Syntactic validation •  Semantic validation (is this event possible?) •  Batches -> events Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  30. 30. Transformation: Importer Responsibilities (cont.) •  Sessionalization •  Assign event to session •  Calculate time-series aggregates •  Dimensionalization •  String enum -> numeric id •  Merge / coalesce different string representations into single id •  Player metadata •  Join player metadata from session store Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  31. 31. Transformation: Importer Responsibilities (cont.) •  Cohorting •  Process enter-cohort, exit-cohort events •  Process A / B testing events •  Evaluate cohort rules (e.g., spend thresholds) •  Decorate events with cohort tags Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  32. 32. Transformation: Session Store Key-value store (Couchbase) •  Fast, constant-time access to sessions, players Responsibilities •  Store Sessions, Players, Dimensions, Config •  Lookup •  Idempotent update •  Store accumulated session-level metrics •  Store player history Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  33. 33. Storage: Hadoop 2 Camus MR •  Kafka -> HDFS every 3 minutes append_events table •  Append-only log of events •  Each event has session-version for deduplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  34. 34. Storage: Hadoop 2 append_events -> base_events MR •  Logical update of base_events •  Update events with new metadata •  Swap old partition for new partition •  Replayable from beginning without duplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  35. 35. Storage: Hadoop 2 base_events table •  Denormalized table of all events •  Stores original JSON + decoration •  Custom Serdes to query / extract JSON fields without materializing entire rows •  Standardized event types  lots of functionality for free Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  36. 36. Analysis and Visualization Hive Warehouse •  Normalized event-specific, game- specific stores •  Aggregate metric data for reporting, analysis •  Maintained through custom ETL •  MR •  Hive queries Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  37. 37. Analysis and Visualization Amazon Redshift •  Fast ad-hoc querying Tableau •  Simple, powerful reporting Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  38. 38. Come Join Us! KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam rshoup@kixeye.com @randyshoup Deep Thought Team: •  Mark Weaver •  Josh McDonald •  Ben Speakmon •  Snehal Nagmote •  Mark Roberts •  Kevin Lee •  Woo Chan Kim •  Tay Carpenter •  Tim Ellis •  Kazue Watanabe •  Erica Chan •  Jessica Cox •  Casey DeWitt •  Steve Morin •  Lih Chen •  Neha Kumari
  39. 39. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/kixeye- big-data

×