Your SlideShare is downloading. ×
0
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE

538

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/UkwXZ2. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/UkwXZ2.

Randy Shoup describes KIXEYE's analytics infrastructure from Kafka queues through Hadoop 2 to Hive and Redshift, built for flexibility, experimentation, iteration, componentization, testability, reliability and replay-ability. Filmed at qconnewyork.com.

As CTO, Randy brings 25 years of engineering leadership to KIXEYE, ranging from tiny startups to Internet-scale companies. Prior to KIXEYE, he was Director of Engineering at Google, leading several teams building Google App Engine, the world's largest Platform as a Service.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
538
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Game of Big Data! Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /kixeye-big-data
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Free-to-Play Real-time Strategy Games •  Web and mobile •  Strategy and tactics •  Really real-time J •  Deep relationships with players •  Constantly evolving gameplay, feature set, economy, balance •  >500 employees worldwide
  • 5. Intro: Analytics at KIXEYE User Acquisition Game Analytics Retention and Monetization Analytic Requirements
  • 6. User Acquisition Goal: ELTV > acquisition cost •  User’s estimated lifetime value is more than it costs to acquire that user Mechanisms •  Publisher Campaigns •  On-Platform Recommendations
  • 7. Game Analytics Goal: Measure and Optimize “Fun” •  Difficult to define •  Includes gameplay, feature set, performance, bugs •  All metrics are just proxies for fun (!) Mechanisms •  Game balance •  Match balance •  Economy management •  Player typology
  • 8. Retention and Monetization Goal: Sustainable Business •  Monetization drivers •  Revenue recognition Mechanisms •  Pricing and Bundling •  Tournament (“Event”) Design •  Recommendations
  • 9. Analytic Requirements •  Data Integrity and Availability •  Cohorting •  Controlled experiments •  Deep ad-hoc analysis
  • 10. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 11. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 12. V1 Analytic System Grew Organically •  Built originally for user acquisition •  Progressively grown to much more Idiosyncratic mix of languages, systems, tools •  Log files -> Chukwa -> Hadoop -> Hive -> MySQL •  PHP for reports and ETL •  Single massive table with everything
  • 13. V1 Analytic System Many Issues •  Very slow to query •  No data standardization or validation •  Very difficult to add a new game, report, ETL •  Extremely difficult to backfill on error or outage •  Difficult for analysts to use; impossible for PMs, designers, etc. … but we survived (!)
  • 14. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 15. Goals of Deep Thought Independent Scalability •  Logically separate, independently scalable tiers Stability and Outage Recovery •  Tiers can completely fail with no data loss •  Every step idempotent and replayable Standardization •  Standardized event types, fields, queries, reports
  • 16. Goals of Deep Thought In-Stream Event Processing •  Sessionalization, Dimensionalization, Cohorting Queryability •  Structures are simple to reason about •  Simple things are simple •  Analysts, Data Scientists, PMs, Game Designers, etc. Extensibility •  Easy to add new games, events, fields, reports
  • 17. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 18. Core Capabilities •  Sessionalization •  Dimensionalization •  Cohorting
  • 19. Sessionalization All events are part of a “session” •  Explicit start event, optional stop event •  Game-defined semantics Event Batching •  Events arrive in batch, associated with session •  Pipeline computes batch-level metrics, disaggregates events •  Can optionally attach batch-level metrics to each event
  • 20. Sessionalization Time-Series Aggregations •  Configurable metrics •  1-day X, 7-day X, lifetime X •  Total attacks, total time played •  Accumulated in-stream •  V1 aggregate + batch delta •  Faster to calculate in-stream vs. Map-Reduce
  • 21. Dimensionalization Pipeline assigns unique numeric id to string enums •  E.g., “twigs” resource  id 1234 Automatic mapping and assignment •  Games log strings •  Pipeline generates and maps ids •  No configuration necessary Fast dimensional queries •  Join on integers, not strings
  • 22. Dimensionalization Metadata enumeration and manipulation •  Easily enumerate all values for a field •  Merge multiple values •  “TWIGS” == “Twigs” == “twigs” Metadata tagging •  Can assign arbitrary tags to metadata •  E.g., “Panzer 05” is {tank, mechanized infantry, event prize} •  Enables custom views
  • 23. Cohorting Group players along any dimension / metric •  Well beyond classic age-based cohorts Core analytical building block •  Experiment groups •  User acquisition campaign tracking •  Prospective modeling •  Retrospective analysis
  • 24. Cohorting Set-based •  Overlapping groups: >100, >200, etc. •  Exclusive groups: (100-200), (200-500), etc. Time-based •  E.g., people who played in last 3 days •  E.g., “whale” == ($$ > X) in last N days •  Autoexpire from a group without explicit intervention
  • 25. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 26. Implementation of Pipeline •  Ingestion •  Event Log •  Transformation •  Data Storage •  Analysis and Visualization Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 27. Ingestion: Logging Service HTTP / JSON Endpoint •  Play framework •  Non-blocking, event-driven Responsibilities •  Message integrity via checksums •  Durability via local disk persistence •  Async batch writes to Kafka topics •  {valid, invalid, unauth} Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 28. Event Log: Kafka Persistent, replayable pipe of events •  Events stored for 7 days Responsibilities •  Durability via replication and local disk streaming •  Replayability via commit log •  Scalability via partitioned brokers •  Segment data for different types of processing Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 29. Transformation: Importer Consume Kafka topics, rebroadcast •  E.g., consume batches, rebroadcast events Responsibilities •  Batch validation against JSON schema •  Syntactic validation •  Semantic validation (is this event possible?) •  Batches -> events Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 30. Transformation: Importer Responsibilities (cont.) •  Sessionalization •  Assign event to session •  Calculate time-series aggregates •  Dimensionalization •  String enum -> numeric id •  Merge / coalesce different string representations into single id •  Player metadata •  Join player metadata from session store Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 31. Transformation: Importer Responsibilities (cont.) •  Cohorting •  Process enter-cohort, exit-cohort events •  Process A / B testing events •  Evaluate cohort rules (e.g., spend thresholds) •  Decorate events with cohort tags Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 32. Transformation: Session Store Key-value store (Couchbase) •  Fast, constant-time access to sessions, players Responsibilities •  Store Sessions, Players, Dimensions, Config •  Lookup •  Idempotent update •  Store accumulated session-level metrics •  Store player history Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 33. Storage: Hadoop 2 Camus MR •  Kafka -> HDFS every 3 minutes append_events table •  Append-only log of events •  Each event has session-version for deduplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 34. Storage: Hadoop 2 append_events -> base_events MR •  Logical update of base_events •  Update events with new metadata •  Swap old partition for new partition •  Replayable from beginning without duplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 35. Storage: Hadoop 2 base_events table •  Denormalized table of all events •  Stores original JSON + decoration •  Custom Serdes to query / extract JSON fields without materializing entire rows •  Standardized event types  lots of functionality for free Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 36. Analysis and Visualization Hive Warehouse •  Normalized event-specific, game- specific stores •  Aggregate metric data for reporting, analysis •  Maintained through custom ETL •  MR •  Hive queries Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 37. Analysis and Visualization Amazon Redshift •  Fast ad-hoc querying Tableau •  Simple, powerful reporting Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 38. Come Join Us! KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam rshoup@kixeye.com @randyshoup Deep Thought Team: •  Mark Weaver •  Josh McDonald •  Ben Speakmon •  Snehal Nagmote •  Mark Roberts •  Kevin Lee •  Woo Chan Kim •  Tay Carpenter •  Tim Ellis •  Kazue Watanabe •  Erica Chan •  Jessica Cox •  Casey DeWitt •  Steve Morin •  Lih Chen •  Neha Kumari
  • 39. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/kixeye- big-data

×