• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data At Riot Games - Hadoop Summit'12
 

Big Data At Riot Games - Hadoop Summit'12

on

  • 1,623 views

Was presented at Hadoop Summit \’12 http://www.slideshare.net/Hadoop_Summit/big-data-at-riot-games

Was presented at Hadoop Summit \’12 http://www.slideshare.net/Hadoop_Summit/big-data-at-riot-games

Statistics

Views

Total Views
1,623
Views on SlideShare
1,593
Embed Views
30

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 30

http://www.linkedin.com 30

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Andy was a designer and analyst that began our data warehouseHe was the only resource focused on building out the DW and our analytical capability for the first year of its existenceHe’s also a really nice guy!He made an excellent point, and one that I want to carry through this presentation.
  • Times where there were 20% month over month growth in a single environment2 environments w/~200K CCU to 16 environments and 1.3million CCU in the space of 12 monthsResources were focused on getting our operational systems to scale along with demand
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • Before we talk about our first usecase, we need to give you a little bit of context about the game and gameplay (super high level)Session Based Team play - basic idea is like “capture the flag” – MOBA!If you die, you re-spawn after a certain amount of time (that time grows as the game progresses)Lots of strategy to the game
  • Each player “summons” a Champion that he playsEach champion has very different abilities
  • All players begin at level 1 in a gameplay session and can progress to a maximum of level 18Gain abilitiesGain gold and use that gold to equip your player
  • Shen is not a Yordle. Shen is a ninja
  • Early this year, Shen was underpoweredWe decided to fix himHowever, we accidentally made him highly overpoweredWe recognized this fact quickly, and a fix was in place within 2 days
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • Shen is not a Yordle. Shen is a ninja
  • For international player populations on the North American environment
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  • One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)

Big Data At Riot Games - Hadoop Summit'12 Big Data At Riot Games - Hadoop Summit'12 Presentation Transcript

  • RIOT GAMESTRACKING YORDLES THROUGH 630 MILLION MINUTES OF HARDCORE GAMING A DAY BARRY LIVINGSTON & DANI RAYAN
  • 1234 INTRODUCTION567
  • 1 ABOUT THE SPEAKERS INTRO234567
  • 1 THIS PRESENTATION IS ABOUT… INTRO2 • The history of Riot’s data warehouse3 • Why we incorporated Hadoop • Our high level architecture4 • Usecases Hadoop has enabled • Lessons learned5 • Where we’re headed67
  • 1 WHO? INTRO2 • Developer and publisher of League of Legends3 • Founded 2006 by gamers for gamers • Player experience focused – requires data4567
  • 1 INTRO2 4.2 MILLION 32.5 MILLION DAILY REGISTERED345 1.3 MILLION 11.5 MILLION CONCURRENT MONTHLY67
  • 1234 HISTORY567
  • 1 MEET ANDY HO2 HISTORY “With enough data, even simple questions3 become difficult questions”4567
  • 1 SCRAPPY START-UP PHASE2 HISTORY START-UP3 • One initial beta environment for North America • Queries done directly off production MySQL slaves4 • This is obviously not a good practice567
  • 1 AROUND OUR INITIAL LAUNCH INITIAL2 HISTORY START-UP LAUNCH3 • Moved to a dedicated, single MySQL instance for the DW • Data ETL’d from production slaves into this instance (by Andy)4 • Queries run in MySQL (by Andy) • Reporting was done in Excel (by Andy)56 This worked great!7
  • 1 THEN WE STARTED GROWING INITIAL2 HISTORY START-UP LAUNCH GROWTH3 • Resources were focused elsewhere – We had competition – Focused on producing features and scaling our systems4 • Opened EU environment June 2010 • Needed something speedy – created parallel installation – This was bad5 – But we could still get the answers we wanted67
  • 1 AND THEN – CRAZY GROWTH! INITIAL CRAZY2 HISTORY START-UP LAUNCH GROWTH GROWTH3 # unique logins TOTAL ACTIVE PLAYERS4 4.2M5 NOV. 2011 1.5MM JULY 201167 time
  • 1 THE BREAKING POINT INITIAL CRAZY BREAKING2 HISTORY START-UP LAUNCH GROWTH GROWTH POINT3 • NA Data Warehouse reached a breaking point 9 months ago – 24 hours of data took 24.5 hours to ETL • We couldn’t handle…4 – multiple environments in a vertical MySQL instance – a single environment in a vertical MySQL instance5 • We needed to change!67
  • 1234 SOLUTION567
  • 1 WHY HADOOP?2 COST EFFECTIVE Expanding rapidly, so CAPEX was a concern3 SOLUTION SCALABLE Handles massive data sets and diverse data sets4 (both structured and unstructured) OPEN SOURCE5 Our engineers can dive into problems6 SPEED OF EXECUTION We needed to move fast!7
  • 1 HIGH LEVEL ARCHITECTURE – CURRENT Business2 Audit Plat Analyst LoL Tableau3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL4 + Warehouse LoL Sqoop EUROPE5 Audit Plat LoL6 Analysts KOREA7
  • 1 WHAT MAKES UP OUR ETL Business2 Audit Plat Analyst LoL Tableau3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL4 + Warehouse LoL Sqoop EUROPE5 Audit Plat LoL6 Analysts KOREA7
  • 1 WHAT MAKES UP OUR ETL23 SOLUTION Pentaho All of these orchestrated by Pentaho + Custom ETL4 + Sqoop We use Sqoop for staging data only5 Then dynamically partition data into Hive tables67
  • 1 WHAT MAKES UP OUR ETL Business2 Audit Plat Analyst LoL Tableau3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL4 + Warehouse LoL Sqoop EUROPE5 Audit Plat LoL6 Analysts KOREA7
  • 1 WHAT MAKES UP OUR ETL Hive Data Warehouse23 SOLUTION Data Temp Staging4 Area5 1 Data written into temp staging area6 Prevents analysts from running queries out of partially written tables Helps us leverage Hive’s merging and compression settings7
  • 1 WHAT MAKES UP OUR ETL Hive Data Warehouse2 Partition A3 SOLUTION Partition B Data Temp Staging Partition C4 Area Partition D Partition E5 2 Hive dynamically inserts data into6 appropriate partitions According to value generated for partition key in the target table7 Non-existent partitions will be created by Hive
  • 1 WHAT MAKES UP OUR ETL Hive Data Warehouse2 Partition A1 Partition A Partition A2 Partition A3 Partition B13 SOLUTION Partition B Partition B2 Data Temp Partition B3 Partition C1 Staging Partition C Partition C2 Partition C34 Area Partition D1 Partition D Partition D2 Partition D3 Partition E1 Partition E Partition E2 Partition E35 3 Layered partitioning = very helpful for6 region-based partitioning Helps maintain one table definition across regions7
  • 1 WHAT MAKES UP OUR ETL Hive Data Warehouse23 SOLUTION Data Temp Staging4 Area5 TO OPTIMIZE DISK IO FOR USER QUERIES,6 WE ENABLED COMPRESSION7
  • Hive Data Warehouse1 Data Temp2 Staging Area3 SOLUTION WHY COMPRESSION? We have 24 cores and disk IO is always the bottleneck,4 so compression is essential WHY SNAPPY COMPRESSED5 SEQUENCEFILE BLOCKS? Lots of “why Snappy” discussion on the interwebs already SequenceFile can be split by Hadoop and can run6 multiple maps in parallel Block compression yields better compression ratio while keeping the file splittable; this block size is configurable7
  • 1 WHAT WE DO IN HIVE23 SOLUTION4 Hive Data Warehouse5 We ETL data from OLTP MySQL slaves daily67
  • 1 WHAT WE DO IN HIVE2 Our analysts shoot Hive queries every day3 SOLUTION4 Hive Data Warehouse5 Translating to 1000s of MR jobs daily67
  • 1 WHAT WE DO IN HIVE2 We have some pretty large tables:3 SOLUTION4 e.g., one with 50,795,997,734 rows Hive Data Warehouse5 We use metrics derived from Hive queries to6 improve our matchmaking system and player behavior7
  • 1 WHAT DID WE LEARN FROM ETL?2 • If you use custom ETL, keep an eye out for block distribution • DRY: Re-inventing the wheel is not a good idea3 SOLUTION – Invest time in researching proper tools that suit your needs – Tons of options for ETL and workflow management – Just because company X is using a particular ETL or workflow4 management tool, it may or may not work effectively for you567
  • 1 WHY TABLEAU? Business2 Audit Plat Analyst LoL Tableau3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL4 + Warehouse LoL Sqoop EUROPE5 Audit Plat LoL6 Analysts KOREA7
  • 1 WHY TABLEAU? Business2 Analyst • We needed to democratize access for Tableau non-technical folks3 SOLUTION – Design – Execs MySQL – Player Support4 • Great visualization capability • Easy to work with5 • Has a Hive connector*67
  • 1 LEAGUE OF LEGENDS GAMEPLAY BASICS23 SOLUTION4567
  • 123 SOLUTION4567
  • 123 SOLUTION4567
  • 123 USECASE # 14 THE STORY OF SHEN567
  • 123 WAIT, SO WHAT’S A YORDLE? • Yordles = very cute race of champions in League of Legends4 • We track Yordles (and the rest of our champions) because game balance is exceptionally important567
  • 1 DESIGN BALANCE IS IMPORTANT2 • Highly competitive game • Updated every 2-3 weeks3 – New champions – New items4 USECASE #1 • Game is a living, breathing service that’s always in motion • Have to maintain a level playing field567
  • 1 QUICKLY REACTING TO CHANGES2 = wins3 USECASE4 #156 total plays7 time
  • 1 HOW DID WE CREATE THAT?23 USECASE4 #1567 *All logos are trademarks of respective owners
  • 1 WHY NOT JUST HIVE?23 USECASE4 #1567 *All logos are trademarks of respective owners
  • 1 WHY NOT JUST HIVE?23 HIVE IS FOR MASSIVE JOBS USECASE4 #1567
  • 1 HIVE TO MYSQL TRANSFORMATION2 • Many of our stakeholders use Tableau • Transformed required data into cubes for direct Tableau consumption using Pentaho3 • Initially experimented with Hive-to-Tableau connector – Had issues, e.g., triggering MR jobs for every change and non- USECASE persistent Hive-Server4 #1567
  • 1 WE WANTED TO KNOW MORE ABOUT…2 Which champions and skins are popular across all regions?3 USECASE4 #1 What are the win-rates of champions across all regions?5 Are better players choosing different champions?67
  • 1 WE CREATED CUBES OF AGGREGATED DATA2 win rates3 USECASE4 #156 champions7
  • 1 HOW WE DID IT: TRANSFORMATION++2 Massive tables reside in Hive3 Hive MySQL TABLEAU transformation transformed creates into cubes for USECASE4 #1 dimension tables Tableau consumption56 Some dimension tables moved to join with other fact tables in Hive7
  • 1 WHY DID WE GO THIS ROUTE?23 Not good for slowly changing MySQL is not awesome for joining dimensions massive tables USECASE • No automatic primary key4 #1 generation • Can’t regenerate dimension table quickly enough since it requires a full-table scan56 • Decided to use best of both worlds • Also leveraged map-side joins and distributed cache7
  • 123 USECASE #24 MATCHMAKING AND REGIONAL METRICS567
  • 1 FIRST, SOME CONTEXT2 • League of Legends is global in scale, with players logging in from >145 countries in a typical day3 • No-fee play means very low barrier to play • Players often play on multiple environments regularly (e.g. EU players on NA environments and vice versa)4 • Same features and mechanics deployed in all territories • It’s vitally important that we understand game5 USECASE #2 performance metrics by geography and region67
  • 1 MATCHMAKING2 • One of the most important features outside of gameplay • Like a dating service, the objective is to match people up;3 • Number of different queues that players can line up in, depending on the type of match they’re looking for4 USECASE5 #26 Critical that this system is balanced balanced and able to create good matches quickly7
  • 1 MATCHMAKING – IS IT WORKING?2 • Matchmaking algorithm based on modified Elo system • Inspecting the “curve” of these scores:3 – Should show a similar distribution in all regions – May show interesting trends, such as win/lose ratios4 USECASE5 #267
  • 1 MATCHMAKING – IS IT WORKING?2 % players ELO DISTRIBUTION GRAPH34 USECASE5 #267 ELO score
  • 1 WHAT WAS NEEDED TO GENERATE IT? 12 Had to join massive tables with session and player data MASSIVE MASSIVE MASSIVE3 TABLE TABLE TABLE WITH WITH WITH SESSION PLAYER GAME4 DATA DATA DATA 2 USECASE Needed to lookup and range-query IP-addresses in same join5 #2 Required for many region-based metrics67
  • 1 LIMITATIONS OF HIVE2 Hive34 No good indexing Not efficient for mechanism in our lookup and range version queries USECASE5 #26 This made region-based queries computationally difficult7
  • 1 SOLUTION2 Hive3 leveraged open-source4 libraries online GeoIP UDFs USECASE UDFs = user-defined functions that one5 #2 can add to the Hive interpreter67
  • 1234 LESSONS567
  • 123456 LESSONS7
  • 123456 LESSONS7
  • 123456 LESSONS7
  • 123456 LESSONS7
  • 123456 LESSONS7
  • 1234 THE FUTURE567
  • 1 OUR IMMEDIATE GOALS2 • Shorten time to insight • Increase depth of insight3 • Enable data analysis for client-side features • Log ingestion and analysis4 • Flexible auditing framework • International data infrastructure56 THE7 FUTURE
  • 1 CHALLENGE: MAKE IT GLOBAL2 • Data centers across the globe since latency has huge effect on gameplay  log data scattered around the world3 • Large presence in Asia -- some areas (e.g., PH) have bandwidth challenges or bandwidth is expensive456 THE7 FUTURE
  • 1 CHALLENGE: WE HAVE BIG DATA STRUCTURED DATA2 500G DAILY APPLICATION AND OPERATIONAL LOGS3 4.5TB DAILY4 OFFICIAL LOL SITE TRAFFIC 6MM HITS DAILY5 RIOT YOUTUBE CHANNEL 1.7MM SUBSCRIBERS 270+MM VIEWS6 + chat logs + detailed gameplay event tracking7 THE FUTURE + so on….
  • 1 OUR AUDACIOUS GOALS2 Build a world-class data and analytics organization • Deeply understand players across the globe • Apply that understanding to improve games for players3 • Deeply understand our entire ecosystem, including social media4 Have ability to identify, understand and react to meaningful trends in real time5 Have deep, real-time understanding of our systems from player experience and operational standpoints6 THE7 FUTURE
  • 1 SHAMELESS HIRING PLUG2 • Like most everybody else at this conference… we’re hiring!3 • The Riot Manifesto Player experience first4 Challenge convention Focus on talent and team5 Take play seriously Stay hungry, stay humble6 THE7 FUTURE
  • 1 SHAMELESS HIRING PLUG23456 THE And yes, you can play games at work.7 FUTURE It’s encouraged!
  • THANK YOU!QUESTIONS? BARRY LIVINGSTON & DANI RAYAN blivingston@riotgames.com drayan@riotgames.com