Big Data + Social + Games @Is Cool 16/03/2012TITRE DOCUMENT
Who is IsCool Entertainment? Social game publisher based in Agenda Paris, France • What do we do? Social Gaming #1 French publisher in terms of • What kind of (Big) Analytics we do? audience (450k Daily Active Lots Users) & revenue • How we do it ? Hadoop, Python, R, Tableau, Geph and stuff… 2.8 Millions Fans 80 employees Florian Douetteau 9.1 million € revenue in 2010 CTO 4 live applications on Facebook @fdouetteau
Is Cool Games IsCool, Absolute Solitaire, Delirious Collectible The best solitaire game Game available online Temple Of Mahjong, Belote Multijoueur, Collect, Play, Exchange Play, Win, Meet
Games & Virtual Goods Play the Game & Gain some virtual goods Play again & Gain more Collaborate with other players & Gain More …. Possibly buy To grow quicker To help others
Virtual Goods Virtual Economy Virtual Goods Must not be too easy to get The game would not be fun ! No monetization Virtual Goods must not be hard to get People would churn because of Let’s Trade 1 Watch against frustration ! 3 Hammers Virtual Goods can be usually traded between players Virtual and actual “Price” of a good
Why is this Big Data ? Number of object transactions per day NYSE 3,600,000,000 18 Million users generated actions per day IsCool 2,150,000,000 7 Billions per year. Nasdaq 1,600,000,000 9,8 TB Data to Nikkey 1,500,000,000 analyze Footsie 860,000,000 CAC 40 142,500,000
The Real Big Data Challenge Collaborate for collective insights Programmers’ Perspective : Game Designer Perspective : Log Files & Work ? Nice Charts ? Realtime? what metrics? data scientist?BI Veteran: Business Guy Perspective:Schema Definition ? Revenue Forecast ?
Specifics of Game Analytics Virtual Goods We are the Factory AND the Shop, and most of the products are free. Social Networks Network effects are key Games The product changes EVERY day ! Sudden wage of unexpected players from Guatemala ! People try to cheat !
Use Case 1 : Understanding Users 1: Defining engagement Tenure length Visit frequency Virality Traffic Key drivers??? Paying user conversion ARPPU Score Use of feature A,B,C…
Case Study 1 - Segment User Behaviours 2: Describing engagement patterns: Running a segment analysis
Use Case 2 : Understanding Users as a whole 10 Million Nodes Around 1 000 Billion Edges How does the graph evolve in time ? What are the communities?
Understanding Users as a WholeLots of small clusters ((mostly 2players) Some mid size communities A very large community
Use Case 3 : Analyze Long Terms effect of a feature A/B Tests Some features can be A/B tested …and some cannot ! How to measure the uplift ? Are players using the new feature… More engaged? Generate more virality ? etc…. Complexity Multiple variable to observe (other features, history ) TITRE DOCUMENT 16/03/2012
… Howover the last 3 years Analyzing the Offer• Tools changed • Online Analytics Platform• Scale changed • Commercial / Open Source ETL• Focus Changed • Commercial BI Visualization Software • Commercial / Open Source databases (column stores) •…
What we learned Diversity Relativity Superciality• Theres no Hadoop+R • Windows / Linux ? Cloud • Ability to display is more Magic (Expertise, Entry or on-premise ? important than the Costs, Maintenance) • Do you have internal data result.• There’s no XYZ Magical mining experts (yes/no) ? Product • Do you have internal scalability experts (yes/no) ? • What is _real_ budget ? 0K ? 10K ? 100K ? 1000K ?
Mixed Approach SaaS Analytics Platforms For common, business metrics (virality, traffic, engagement) Corporate Level Visibility Day-to-day Internal Datawarehousing Detailed Business Metrics Virtual Economy Modeling Long term behaviours Business Level Visibility Week-to-Week Datamining tools Ad-hoc analytics Graph Analytics
Datawarehouse for the Big Data era Hadoop/Hive (through Amazon’s Open Source ETL (PyBabe) Elastric Map Reduce) • Pure Python ETL • Used to reduce the amount of information : • Good integration with AWS/ S3 10 GB a day => 1GB a day • Easy to integrate in our development • High cost of development for "business" environment related processing Columnar Database (Infinidb, Open Dashboarding (Tableau Software) Source) • +Direct connection to the database • Free (as beer) • +Excel fan biz guy can use it with no training ! • Good performance for analytics tasks on a few hundreds million lines ( SELECT … GROUP BY … ORDER … ) • Featured and limited performance compared to commercial Column Stores