QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE

Randy Shoup
Randy ShoupChief Architect at eBay
The Game of Big Data
Analytics Infrastructure at KIXEYE
Randy Shoup
@randyshoup
linkedin.com/in/randyshoup
QCon New York, June 13 2014
Free-to-Play Real-time Strategy
Games • Web and mobile
• Strategy and tactics
• Really real-time 
• Deep relationships with
players
• Constantly evolving
gameplay, feature set,
economy, balance
• >500 employees
worldwide
Intro: Analytics at KIXEYE
User Acquisition
Game Analytics
Retention and Monetization
Analytic Requirements
User Acquisition
Goal: ELTV > acquisition cost
• User’s estimated lifetime value is more than it
costs to acquire that user
Mechanisms
• Publisher Campaigns
• On-Platform Recommendations
Game Analytics
Goal: Measure and Optimize “Fun”
• Difficult to define
• Includes gameplay, feature set, performance,
bugs
• All metrics are just proxies for fun (!)
Mechanisms
• Game balance
• Match balance
• Economy management
• Player typology
Retention and Monetization
Goal: Sustainable Business
• Monetization drivers
• Revenue recognition
Mechanisms
• Pricing and Bundling
• Tournament (“Event”) Design
• Recommendations
Analytic Requirements
• Data Integrity and Availability
• Cohorting
• Controlled experiments
• Deep ad-hoc analysis
“Deep Thought”
V1 Analytic System
Goals
Core Capabilities
Implementation
“Deep Thought”
V1 Analytic System
Goals
Core Capabilities
Implementation
V1 Analytic System
Grew Organically
• Built originally for user acquisition
• Progressively grown to much more
Idiosyncratic mix of languages, systems,
tools
• Log files -> Chukwa -> Hadoop -> Hive ->
MySQL
• PHP for reports and ETL
• Single massive table with everything
V1 Analytic System
Many Issues
• Very slow to query
• No data standardization or validation
• Very difficult to add a new game, report, ETL
• Extremely difficult to backfill on error or
outage
• Difficult for analysts to use; impossible for
PMs, designers, etc.
… but we survived (!)
“Deep Thought”
V1 Analytic System
Goals
Core Capabilities
Implementation
Goals of Deep Thought
Independent Scalability
• Logically separate, independently scalable
tiers
Stability and Outage Recovery
• Tiers can completely fail with no data loss
• Every step idempotent and replayable
Standardization
• Standardized event types, fields, queries,
reports
Goals of Deep Thought
In-Stream Event Processing
• Sessionalization, Dimensionalization,
Cohorting
Queryability
• Structures are simple to reason about
• Simple things are simple
• Analysts, Data Scientists, PMs, Game
Designers, etc.
Extensibility
• Easy to add new games, events, fields,
reports
“Deep Thought”
V1 Analytic System
Goals
Core Capabilities
Implementation
Core Capabilities
• Sessionalization
• Dimensionalization
• Cohorting
Sessionalization
All events are part of a “session”
• Explicit start event, optional stop event
• Game-defined semantics
Event Batching
• Events arrive in batch, associated with
session
• Pipeline computes batch-level metrics,
disaggregates events
• Can optionally attach batch-level metrics to
each event
Sessionalization
Time-Series Aggregations
• Configurable metrics
• 1-day X, 7-day X, lifetime X
• Total attacks, total time played
• Accumulated in-stream
• Old aggregate + batch delta
• Faster to calculate in-stream vs. Map-Reduce
Dimensionalization
Pipeline assigns unique numeric id to string
enums
• E.g., “twigs” resource  id 1234
Automatic mapping and assignment
• Games log strings
• Pipeline generates and maps ids
• No configuration necessary
Fast dimensional queries
• Join on integers, not strings
Dimensionalization
Metadata enumeration and manipulation
• Easily enumerate all values for a field
• Merge multiple values
• “TWIGS” == “Twigs” == “twigs”
Metadata tagging
• Can assign arbitrary tags to metadata
• E.g., “Panzer 05” is {tank, mechanized infantry, event
prize}
• Enables custom views
Cohorting
Group players along any dimension / metric
• Well beyond classic age-based cohorts
Core analytical building block
• Experiment groups
• User acquisition campaign tracking
• Prospective modeling
• Retrospective analysis
Cohorting
Set-based
• Overlapping groups: >100, >200, etc.
• Exclusive groups: (100-200), (200-500), etc.
Time-based
• E.g., people who played in last 3 days
• E.g., “whale” == ($$ > X) in last N days
• Autoexpire from a group without explicit
intervention
“Deep Thought”
V1 Analytic System
Goals
Core Capabilities
Implementation
Implementation of Pipeline
• Ingestion
• Event Log
• Transformation
• Data Storage
• Analysis and Visualization
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Ingestion: Logging Service
HTTP / JSON Endpoint
• Play framework
• Non-blocking, event-driven
Responsibilities
• Message integrity via checksums
• Durability via local disk
persistence
• Async batch writes to Kafka
topics
• {valid, invalid, unauth}
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Event Log: Kafka
Persistent, replayable pipe of
events
• Events stored for 7 days
Responsibilities
• Durability via replication and local
disk streaming
• Replayability via commit log
• Scalability via partitioned brokers
• Segment data for different types
of processing
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Transformation: Importer
Consume Kafka topics,
rebroadcast
• E.g., consume batches,
rebroadcast events
Responsibilities
• Batch validation against JSON
schema
• Syntactic validation
• Semantic validation (is this event
possible?)
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Transformation: Importer
Responsibilities (cont.)
• Sessionalization
• Assign event to session
• Calculate time-series aggregates
• Dimensionalization
• String enum -> numeric id
• Merge / coalesce different string
representations into single id
• Player metadata
• Join player metadata from session
store
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Transformation: Importer
Responsibilities (cont.)
• Cohorting
• Process enter-cohort, exit-cohort
events
• Process A / B testing events
• Evaluate cohort rules (e.g., spend
thresholds)
• Decorate events with cohort tags
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Transformation: Session Store
Key-value store (Couchbase)
• Fast, constant-time access to
sessions, players, etc.
Responsibilities
• Store Sessions, Players, Dimensions,
Config
• Lookup
• Idempotent update
• Store accumulated session-level
metrics
• Store player history
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Storage: Hadoop 2
Camus MR
• Kafka -> HDFS every 3 minutes
append_events table
• Append-only log of events
• Each event has session-version
for deduplication
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Storage: Hadoop 2
append_events -> base_events
MR
• Logical update of base_events
• Update events with new metadata
• Swap old partition for new partition
• Replayable from beginning
without duplication
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Storage: Hadoop 2
base_events table
• Denormalized table of all events
• Stores original JSON +
decoration
• Custom Serdes to query /
extract JSON fields without
materializing entire rows
• Standardized event types 
lots of functionality for free
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Analysis and Visualization
Hive Warehouse
• Normalized event-specific,
game-specific stores
• Aggregate metric data for
reporting, analysis
• Maintained through custom ETL
• MR
• Hive queries
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Analysis and Visualization
Amazon Redshift
• Fast ad-hoc querying
Tableau
• Simple, powerful reporting
Logging Service
Kafka
Importer / Session Store
Hadoop 2
Hive / Redshift
Come Join Us!
KIXEYE is hiring in
SF, Seattle, Victoria,
Brisbane, Amsterdam
rshoup@kixeye.com
@randyshoup
Deep Thought
Team:
• Mark Weaver
• Josh McDonald
• Ben Speakmon
• Snehal Nagmote
• Mark Roberts
• Kevin Lee
• Woo Chan Kim
• Tay Carpenter
• Tim Ellis
• Kazue Watanabe
• Erica Chan
• Jessica Cox
• Casey DeWitt
• Steve Morin
• Lih Chen
• Neha Kumari
1 of 36

Recommended

Concurrency at Scale: Evolution to Micro-Services by
Concurrency at Scale:  Evolution to Micro-ServicesConcurrency at Scale:  Evolution to Micro-Services
Concurrency at Scale: Evolution to Micro-ServicesRandy Shoup
9.4K views33 slides
Effective Microservices In a Data-centric World by
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
1.8K views54 slides
The Future of Services: Building Asynchronous, Resilient and Elastic Systems by
The Future of Services: Building Asynchronous, Resilient and Elastic SystemsThe Future of Services: Building Asynchronous, Resilient and Elastic Systems
The Future of Services: Building Asynchronous, Resilient and Elastic SystemsLightbend
3.4K views22 slides
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve... by
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
30.2K views39 slides
The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat by
The Hardest Part of Microservices: Your Data - Christian Posta, Red HatThe Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
The Hardest Part of Microservices: Your Data - Christian Posta, Red HatAmbassador Labs
3.4K views37 slides
Going Reactive in the Land of No by
Going Reactive in the Land of NoGoing Reactive in the Land of No
Going Reactive in the Land of NoLightbend
391 views26 slides

More Related Content

What's hot

A Microservice Journey by
A Microservice JourneyA Microservice Journey
A Microservice JourneyChristian Posta
2.4K views71 slides
Microservices Journey Fall 2017 by
Microservices Journey Fall 2017Microservices Journey Fall 2017
Microservices Journey Fall 2017Christian Posta
1.1K views81 slides
Lowering the risk of monolith to microservices by
Lowering the risk of monolith to microservicesLowering the risk of monolith to microservices
Lowering the risk of monolith to microservicesChristian Posta
1.9K views69 slides
Microservices and Integration: what's next with Istio service mesh by
Microservices and Integration: what's next with Istio service meshMicroservices and Integration: what's next with Istio service mesh
Microservices and Integration: what's next with Istio service meshChristian Posta
3.3K views60 slides
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ... by
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...Ambassador Labs
3.9K views24 slides
Microservices Journey NYC by
Microservices Journey NYCMicroservices Journey NYC
Microservices Journey NYCChristian Posta
2.3K views81 slides

What's hot(20)

Microservices Journey Fall 2017 by Christian Posta
Microservices Journey Fall 2017Microservices Journey Fall 2017
Microservices Journey Fall 2017
Christian Posta1.1K views
Lowering the risk of monolith to microservices by Christian Posta
Lowering the risk of monolith to microservicesLowering the risk of monolith to microservices
Lowering the risk of monolith to microservices
Christian Posta1.9K views
Microservices and Integration: what's next with Istio service mesh by Christian Posta
Microservices and Integration: what's next with Istio service meshMicroservices and Integration: what's next with Istio service mesh
Microservices and Integration: what's next with Istio service mesh
Christian Posta3.3K views
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ... by Ambassador Labs
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...
Ambassador Labs3.9K views
Microservices Journey Summer 2017 by Christian Posta
Microservices Journey Summer 2017Microservices Journey Summer 2017
Microservices Journey Summer 2017
Christian Posta2.3K views
Continuous Delivery: How RightScale Releases Weekly by RightScale
Continuous Delivery: How RightScale Releases WeeklyContinuous Delivery: How RightScale Releases Weekly
Continuous Delivery: How RightScale Releases Weekly
RightScale920 views
Azure Stream Analytics by Davide Mauri
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
Davide Mauri893 views
SQL Server Disaster Recovery on Azure - SQL Saturday 921 by Marco Obinu
SQL Server Disaster Recovery on Azure - SQL Saturday 921SQL Server Disaster Recovery on Azure - SQL Saturday 921
SQL Server Disaster Recovery on Azure - SQL Saturday 921
Marco Obinu99 views
Blockchain for the DBA and Data Professional by Karen Lopez
Blockchain for the DBA and Data ProfessionalBlockchain for the DBA and Data Professional
Blockchain for the DBA and Data Professional
Karen Lopez136 views
Being Elastic -- Evolving Programming for the Cloud by Randy Shoup
Being Elastic -- Evolving Programming for the CloudBeing Elastic -- Evolving Programming for the Cloud
Being Elastic -- Evolving Programming for the Cloud
Randy Shoup1.2K views
20160524 ibm fast data meetup by shinolajla
20160524 ibm fast data meetup20160524 ibm fast data meetup
20160524 ibm fast data meetup
shinolajla1.2K views
Microservices with Spring Cloud, Netflix OSS and Kubernetes by Christian Posta
Microservices with Spring Cloud, Netflix OSS and Kubernetes Microservices with Spring Cloud, Netflix OSS and Kubernetes
Microservices with Spring Cloud, Netflix OSS and Kubernetes
Christian Posta1.5K views
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ... by confluent
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
confluent2.8K views
20160520 The Future of Services by shinolajla
20160520 The Future of Services20160520 The Future of Services
20160520 The Future of Services
shinolajla859 views
Kafka Summit SF 2017 - Running Kafka for Maximum Pain by confluent
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
confluent52 views
The hardest part of microservices: your data by Christian Posta
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
Christian Posta21.4K views
The Reactive Principles: Eight Tenets For Building Cloud Native Applications by Lightbend
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
Lightbend1.1K views

Viewers also liked

QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas... by
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...Randy Shoup
2K views35 slides
Service Architectures At Scale - QCon London 2015 by
Service Architectures At Scale - QCon London 2015Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015Randy Shoup
2.4K views37 slides
Why Enterprises Are Embracing the Cloud by
Why Enterprises Are Embracing the CloudWhy Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the CloudRandy Shoup
1.7K views6 slides
DevOpsDays Silicon Valley 2014 - The Game of Operations by
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of OperationsRandy Shoup
3.7K views32 slides
A CTO's Guide to Scaling Organizations by
A CTO's Guide to Scaling OrganizationsA CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling OrganizationsRandy Shoup
6.9K views21 slides
The Importance of Culture: Building and Sustaining Effective Engineering Org... by
The Importance of Culture:  Building and Sustaining Effective Engineering Org...The Importance of Culture:  Building and Sustaining Effective Engineering Org...
The Importance of Culture: Building and Sustaining Effective Engineering Org...Randy Shoup
5.7K views29 slides

Viewers also liked(8)

QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas... by Randy Shoup
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fas...
Randy Shoup2K views
Service Architectures At Scale - QCon London 2015 by Randy Shoup
Service Architectures At Scale - QCon London 2015Service Architectures At Scale - QCon London 2015
Service Architectures At Scale - QCon London 2015
Randy Shoup2.4K views
Why Enterprises Are Embracing the Cloud by Randy Shoup
Why Enterprises Are Embracing the CloudWhy Enterprises Are Embracing the Cloud
Why Enterprises Are Embracing the Cloud
Randy Shoup1.7K views
DevOpsDays Silicon Valley 2014 - The Game of Operations by Randy Shoup
DevOpsDays Silicon Valley 2014 - The Game of OperationsDevOpsDays Silicon Valley 2014 - The Game of Operations
DevOpsDays Silicon Valley 2014 - The Game of Operations
Randy Shoup3.7K views
A CTO's Guide to Scaling Organizations by Randy Shoup
A CTO's Guide to Scaling OrganizationsA CTO's Guide to Scaling Organizations
A CTO's Guide to Scaling Organizations
Randy Shoup6.9K views
The Importance of Culture: Building and Sustaining Effective Engineering Org... by Randy Shoup
The Importance of Culture:  Building and Sustaining Effective Engineering Org...The Importance of Culture:  Building and Sustaining Effective Engineering Org...
The Importance of Culture: Building and Sustaining Effective Engineering Org...
Randy Shoup5.7K views
From the Monolith to Microservices - CraftConf 2015 by Randy Shoup
From the Monolith to Microservices - CraftConf 2015From the Monolith to Microservices - CraftConf 2015
From the Monolith to Microservices - CraftConf 2015
Randy Shoup8.3K views
Pragmatic Microservices by Randy Shoup
Pragmatic MicroservicesPragmatic Microservices
Pragmatic Microservices
Randy Shoup2.2K views

Similar to QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE

AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner... by
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
918 views54 slides
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner... by
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
1.9K views54 slides
Snowplow: open source game analytics powered by AWS by
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
1.9K views34 slides
Christian Corsano, Io Interactive by
Christian Corsano, Io InteractiveChristian Corsano, Io Interactive
Christian Corsano, Io InteractiveWhite Nights Conference
396 views19 slides
Sensu Monitoring by
Sensu MonitoringSensu Monitoring
Sensu MonitoringMohanasundaram Ponnusamy
1.2K views29 slides
AWS Webcast - Introduction to Amazon Kinesis by
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
10.7K views36 slides

Similar to QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE(20)

AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner... by Amazon Web Services
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner... by Amazon Web Services
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
Amazon Web Services1.9K views
Snowplow: open source game analytics powered by AWS by Giuseppe Gaviani
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
Giuseppe Gaviani1.9K views
AWS Webcast - Introduction to Amazon Kinesis by Amazon Web Services
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
Amazon Web Services10.7K views
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev by Altinity Ltd
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd3.8K views
Azure Messaging Crossroads by Sean Feldman
Azure Messaging CrossroadsAzure Messaging Crossroads
Azure Messaging Crossroads
Sean Feldman224 views
Chef Actions: Delightful near real-time activity tracking! by James Casey
Chef Actions: Delightful near real-time activity tracking!Chef Actions: Delightful near real-time activity tracking!
Chef Actions: Delightful near real-time activity tracking!
James Casey1.7K views
PlayFab analytics gdc by Crystin Cox
PlayFab analytics gdcPlayFab analytics gdc
PlayFab analytics gdc
Crystin Cox497 views
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis by Amazon Web Services
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Amazon Web Services10.1K views
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale by Seunghyun Lee
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee2.8K views
AWS APAC Webinar Week - Real Time Data Processing with Kinesis by Amazon Web Services
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
Amazon Web Services4.5K views
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events by sqlserver.co.il
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
sqlserver.co.il963 views
Scalable and Reliable Logging at Pinterest by Krishna Gade
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
Krishna Gade2.4K views
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest by Hakka Labs
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs502 views
Realtime Analytics on AWS by Sungmin Kim
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
Sungmin Kim502 views
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu... by Amazon Web Services
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
Amazon Web Services2.1K views

More from Randy Shoup

Anatomy of Three Incidents -- Commonalities and Lessons by
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
209 views42 slides
One Terrible Day at Google, and How It Made Us Better by
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterRandy Shoup
181 views30 slides
Scaling Your Architecture for the Long Term by
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermRandy Shoup
693 views35 slides
Minimal Viable Architecture - Silicon Slopes 2020 by
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Randy Shoup
965 views43 slides
An Agile Approach to Machine Learning by
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
604 views45 slides
Moving Fast at Scale by
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at ScaleRandy Shoup
740 views44 slides

More from Randy Shoup(20)

Anatomy of Three Incidents -- Commonalities and Lessons by Randy Shoup
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and Lessons
Randy Shoup209 views
One Terrible Day at Google, and How It Made Us Better by Randy Shoup
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us Better
Randy Shoup181 views
Scaling Your Architecture for the Long Term by Randy Shoup
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
Randy Shoup693 views
Minimal Viable Architecture - Silicon Slopes 2020 by Randy Shoup
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
Randy Shoup965 views
An Agile Approach to Machine Learning by Randy Shoup
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup604 views
Moving Fast at Scale by Randy Shoup
Moving Fast at ScaleMoving Fast at Scale
Moving Fast at Scale
Randy Shoup740 views
Breaking Codes, Designing Jets, and Building Teams by Randy Shoup
Breaking Codes, Designing Jets, and Building TeamsBreaking Codes, Designing Jets, and Building Teams
Breaking Codes, Designing Jets, and Building Teams
Randy Shoup521 views
Scaling Your Architecture with Services and Events by Randy Shoup
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
Randy Shoup809 views
Learning from Learnings: Anatomy of Three Incidents by Randy Shoup
Learning from Learnings: Anatomy of Three IncidentsLearning from Learnings: Anatomy of Three Incidents
Learning from Learnings: Anatomy of Three Incidents
Randy Shoup745 views
Minimum Viable Architecture - Good Enough is Good Enough by Randy Shoup
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
Randy Shoup2K views
Managing Data at Scale - Microservices and Events by Randy Shoup
Managing Data at Scale - Microservices and EventsManaging Data at Scale - Microservices and Events
Managing Data at Scale - Microservices and Events
Randy Shoup2.2K views
Service Architectures at Scale by Randy Shoup
Service Architectures at ScaleService Architectures at Scale
Service Architectures at Scale
Randy Shoup1.4K views
Monoliths, Migrations, and Microservices by Randy Shoup
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and Microservices
Randy Shoup1.8K views
Evolving Architecture and Organization - Lessons from Google and eBay by Randy Shoup
Evolving Architecture and Organization - Lessons from Google and eBayEvolving Architecture and Organization - Lessons from Google and eBay
Evolving Architecture and Organization - Lessons from Google and eBay
Randy Shoup1.4K views
Moving Fast At Scale by Randy Shoup
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
Randy Shoup2.7K views
DevOps - It's About How We Work by Randy Shoup
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We Work
Randy Shoup1.4K views
Ten Lessons of the DevOps Transition by Randy Shoup
Ten Lessons of the DevOps TransitionTen Lessons of the DevOps Transition
Ten Lessons of the DevOps Transition
Randy Shoup1.4K views
Managing Data in Microservices by Randy Shoup
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
Randy Shoup18.4K views
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup by Randy Shoup
Minimum Viable Architecture -- Good Enough is Good Enough in a StartupMinimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Randy Shoup17.2K views
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at... by Randy Shoup
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Flowcon2013 - Virtuous Cycles of Velocity: What I Learned About Going Fast at...
Randy Shoup1.4K views

Recently uploaded

PharoJS - Zürich Smalltalk Group Meetup November 2023 by
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
113 views17 slides
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV by
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
86 views20 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
165 views20 slides
MemVerge: Past Present and Future of CXL by
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLCXL Forum
110 views26 slides
Micron CXL product and architecture update by
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture updateCXL Forum
27 views7 slides
AMD: 4th Generation EPYC CXL Demo by
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL DemoCXL Forum
126 views6 slides

Recently uploaded(20)

PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi113 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV by Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10165 views
MemVerge: Past Present and Future of CXL by CXL Forum
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
CXL Forum110 views
Micron CXL product and architecture update by CXL Forum
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
CXL Forum27 views
AMD: 4th Generation EPYC CXL Demo by CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 views
GigaIO: The March of Composability Onward to Memory with CXL by CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 views
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM by CXL Forum
Samsung: CMM-H Tiered Memory Solution with Built-in DRAMSamsung: CMM-H Tiered Memory Solution with Built-in DRAM
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum105 views
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur by Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays40 views
Spesifikasi Lengkap ASUS Vivobook Go 14 by Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin70 views
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure by CXL Forum
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
CXL Forum125 views
Photowave Presentation Slides - 11.8.23.pptx by CXL Forum
Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum126 views
[2023] Putting the R! in R&D.pdf by Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada110 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... by Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy by Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 views

QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE

  • 1. The Game of Big Data Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014
  • 2. Free-to-Play Real-time Strategy Games • Web and mobile • Strategy and tactics • Really real-time  • Deep relationships with players • Constantly evolving gameplay, feature set, economy, balance • >500 employees worldwide
  • 3. Intro: Analytics at KIXEYE User Acquisition Game Analytics Retention and Monetization Analytic Requirements
  • 4. User Acquisition Goal: ELTV > acquisition cost • User’s estimated lifetime value is more than it costs to acquire that user Mechanisms • Publisher Campaigns • On-Platform Recommendations
  • 5. Game Analytics Goal: Measure and Optimize “Fun” • Difficult to define • Includes gameplay, feature set, performance, bugs • All metrics are just proxies for fun (!) Mechanisms • Game balance • Match balance • Economy management • Player typology
  • 6. Retention and Monetization Goal: Sustainable Business • Monetization drivers • Revenue recognition Mechanisms • Pricing and Bundling • Tournament (“Event”) Design • Recommendations
  • 7. Analytic Requirements • Data Integrity and Availability • Cohorting • Controlled experiments • Deep ad-hoc analysis
  • 8. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 9. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 10. V1 Analytic System Grew Organically • Built originally for user acquisition • Progressively grown to much more Idiosyncratic mix of languages, systems, tools • Log files -> Chukwa -> Hadoop -> Hive -> MySQL • PHP for reports and ETL • Single massive table with everything
  • 11. V1 Analytic System Many Issues • Very slow to query • No data standardization or validation • Very difficult to add a new game, report, ETL • Extremely difficult to backfill on error or outage • Difficult for analysts to use; impossible for PMs, designers, etc. … but we survived (!)
  • 12. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 13. Goals of Deep Thought Independent Scalability • Logically separate, independently scalable tiers Stability and Outage Recovery • Tiers can completely fail with no data loss • Every step idempotent and replayable Standardization • Standardized event types, fields, queries, reports
  • 14. Goals of Deep Thought In-Stream Event Processing • Sessionalization, Dimensionalization, Cohorting Queryability • Structures are simple to reason about • Simple things are simple • Analysts, Data Scientists, PMs, Game Designers, etc. Extensibility • Easy to add new games, events, fields, reports
  • 15. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 16. Core Capabilities • Sessionalization • Dimensionalization • Cohorting
  • 17. Sessionalization All events are part of a “session” • Explicit start event, optional stop event • Game-defined semantics Event Batching • Events arrive in batch, associated with session • Pipeline computes batch-level metrics, disaggregates events • Can optionally attach batch-level metrics to each event
  • 18. Sessionalization Time-Series Aggregations • Configurable metrics • 1-day X, 7-day X, lifetime X • Total attacks, total time played • Accumulated in-stream • Old aggregate + batch delta • Faster to calculate in-stream vs. Map-Reduce
  • 19. Dimensionalization Pipeline assigns unique numeric id to string enums • E.g., “twigs” resource  id 1234 Automatic mapping and assignment • Games log strings • Pipeline generates and maps ids • No configuration necessary Fast dimensional queries • Join on integers, not strings
  • 20. Dimensionalization Metadata enumeration and manipulation • Easily enumerate all values for a field • Merge multiple values • “TWIGS” == “Twigs” == “twigs” Metadata tagging • Can assign arbitrary tags to metadata • E.g., “Panzer 05” is {tank, mechanized infantry, event prize} • Enables custom views
  • 21. Cohorting Group players along any dimension / metric • Well beyond classic age-based cohorts Core analytical building block • Experiment groups • User acquisition campaign tracking • Prospective modeling • Retrospective analysis
  • 22. Cohorting Set-based • Overlapping groups: >100, >200, etc. • Exclusive groups: (100-200), (200-500), etc. Time-based • E.g., people who played in last 3 days • E.g., “whale” == ($$ > X) in last N days • Autoexpire from a group without explicit intervention
  • 23. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 24. Implementation of Pipeline • Ingestion • Event Log • Transformation • Data Storage • Analysis and Visualization Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 25. Ingestion: Logging Service HTTP / JSON Endpoint • Play framework • Non-blocking, event-driven Responsibilities • Message integrity via checksums • Durability via local disk persistence • Async batch writes to Kafka topics • {valid, invalid, unauth} Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 26. Event Log: Kafka Persistent, replayable pipe of events • Events stored for 7 days Responsibilities • Durability via replication and local disk streaming • Replayability via commit log • Scalability via partitioned brokers • Segment data for different types of processing Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 27. Transformation: Importer Consume Kafka topics, rebroadcast • E.g., consume batches, rebroadcast events Responsibilities • Batch validation against JSON schema • Syntactic validation • Semantic validation (is this event possible?) Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 28. Transformation: Importer Responsibilities (cont.) • Sessionalization • Assign event to session • Calculate time-series aggregates • Dimensionalization • String enum -> numeric id • Merge / coalesce different string representations into single id • Player metadata • Join player metadata from session store Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 29. Transformation: Importer Responsibilities (cont.) • Cohorting • Process enter-cohort, exit-cohort events • Process A / B testing events • Evaluate cohort rules (e.g., spend thresholds) • Decorate events with cohort tags Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 30. Transformation: Session Store Key-value store (Couchbase) • Fast, constant-time access to sessions, players, etc. Responsibilities • Store Sessions, Players, Dimensions, Config • Lookup • Idempotent update • Store accumulated session-level metrics • Store player history Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 31. Storage: Hadoop 2 Camus MR • Kafka -> HDFS every 3 minutes append_events table • Append-only log of events • Each event has session-version for deduplication Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 32. Storage: Hadoop 2 append_events -> base_events MR • Logical update of base_events • Update events with new metadata • Swap old partition for new partition • Replayable from beginning without duplication Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 33. Storage: Hadoop 2 base_events table • Denormalized table of all events • Stores original JSON + decoration • Custom Serdes to query / extract JSON fields without materializing entire rows • Standardized event types  lots of functionality for free Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 34. Analysis and Visualization Hive Warehouse • Normalized event-specific, game-specific stores • Aggregate metric data for reporting, analysis • Maintained through custom ETL • MR • Hive queries Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 35. Analysis and Visualization Amazon Redshift • Fast ad-hoc querying Tableau • Simple, powerful reporting Logging Service Kafka Importer / Session Store Hadoop 2 Hive / Redshift
  • 36. Come Join Us! KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam rshoup@kixeye.com @randyshoup Deep Thought Team: • Mark Weaver • Josh McDonald • Ben Speakmon • Snehal Nagmote • Mark Roberts • Kevin Lee • Woo Chan Kim • Tay Carpenter • Tim Ellis • Kazue Watanabe • Erica Chan • Jessica Cox • Casey DeWitt • Steve Morin • Lih Chen • Neha Kumari