SlideShare a Scribd company logo
1 of 69
BIG DATA @
RIOT GAMES
USING HADOOP TO IMPROVE THE PLAYER EXPERIENCE
BARRY LIVINGSTON & SANDEEP SHRESTHA | JULY 2013
SPEAKERS
CONTEXT
HIGH LEVEL ARCHITECTURE
PLAYER EXPERIENCE USE CASES
SUMMARY
QUICK DATA WAREHOUSE HISTORY
FIRST, A BIT OF CONTEXT…
WHAT IS LEAGUE OF LEGENDS?
2009
LAUNCH
TEAM
ORIENTED
100+
CHAMPS
MODERN
FANTASY
WHAT IS LEAGUE OF LEGENDS?
LEAGUE OF LEGENDS GAMEPLAY - CHAMPIONS
LEAGUE OF LEGENDS GAMEPLAY - GAMEPLAY
A QUICK HISTORY
INITIAL LAUNCH / SCRAPPY START UP PHASE
‣  Had	
  a	
  single,	
  dedicated	
  MySQL	
  instance	
  for	
  the	
  DW	
  
‣  Data	
  was	
  ETL’d	
  from	
  produc@on	
  slaves	
  into	
  this	
  instance	
  
‣  Queries	
  were	
  run	
  in	
  MySQL	
  
‣  Repor@ng	
  was	
  done	
  in	
  Excel	
  
▾  All	
  ETLs,	
  queries	
  and	
  repor@ng	
  were	
  done	
  by	
  one	
  person	
  
HISTORY	
   START-­‐UP	
  
THIS WORKED GREAT!
THEN – CRAZY GROWTH
HISTORY	
   START-­‐UP	
  
@me	
  
#	
  unique	
  logins	
  
TOTAL	
  ACTIVE	
  PLAYERS	
  
	
  June	
  2012	
  
CRAZY	
  
GROWTH	
  
THE BREAKING POINT
HISTORY	
   START-­‐UP	
  
CRAZY	
  
GROWTH	
  
BREAKING	
  
POINT	
  
‣  Data	
  warehouse	
  reached	
  a	
  breaking	
  point	
  
▾  24	
  hours	
  of	
  data	
  took	
  24.5	
  hours	
  to	
  ETL	
  
‣  We	
  couldn’t	
  handle…	
  
▾  Mul@ple	
  environments	
  in	
  a	
  ver@cal	
  MySQL	
  instance	
  	
  
▾  A	
  single	
  environment	
  in	
  a	
  ver@cal	
  MySQL	
  instance	
  
‣  We	
  needed	
  to	
  change	
  
	
  
INTRODUCTION OF HADOOP
HISTORY	
   START-­‐UP	
  
CRAZY	
  
GROWTH	
  
BREAKING	
  
POINT	
  
‣  Hadoop	
  has	
  a	
  number	
  of	
  great	
  quali@es	
  
▾  Cost	
  effec@ve	
  
▾  Scalable	
  
▾  Open	
  source	
  
▾  We	
  could	
  execute	
  quickly	
  
HADOOP	
  
HIGH LEVEL ARCHITECTURE – JUNE 2012
Tableau	
  
	
  
Hive	
  Data	
  Warehouse	
  
Pentaho	
  
	
  
+	
  
	
  
Custom	
  
ETL	
  
	
  
+	
  
	
  
Sqoop	
  
MySQL	
  Pentaho	
  
Analysts	
  
EUROPE	
  
Audit	
   Plat	
  
LoL	
  
KOREA	
  
Audit	
   Plat	
  
LoL	
  
NORTH	
  AMERICA	
  
Audit	
   Plat	
  
LoL	
  
Business	
  
Analyst	
  
BUT, THIS WASN’T GOOD ENOUGH
‣  The	
  @me	
  to	
  arrive	
  at	
  insight	
  was	
  too	
  long!	
  
‣  Our	
  solu@on	
  required	
  too	
  much	
  data	
  team	
  involvement	
  
▾  Schema	
  changes	
  
▾  ETL	
  tweaks	
  
▾  Hive	
  metadata	
  updates	
  
‣  Hive	
  is	
  painful	
  for	
  ad-­‐hoc	
  or	
  interac@ve	
  analysis	
  
▾  Especially	
  for	
  non-­‐technical	
  folks	
  
GOALS
‣  Democra@ze	
  data	
  access	
  
▾  Enable	
  Self-­‐service	
  Data	
  Collec@on	
  and	
  
Analysis	
  
‣  Create	
  ac@onable	
  insights	
  
‣  Increase	
  speed	
  to	
  insight	
  
USE CASE:
GAME CLIENT PERFORMANCE
CLIENT FOOTPRINT
‣  Significant	
  por@on	
  of	
  our	
  soware	
  runs	
  directly	
  on	
  players’	
  
machines	
  
▾  High	
  performance	
  graphics	
  
▾  Responsiveness	
  
‣  There	
  is	
  logic	
  in	
  these	
  components	
  that's	
  ONLY	
  exercised	
  
on	
  the	
  client-­‐side	
  
‣  Understanding	
  the	
  performance,	
  reliability	
  and	
  stability	
  of	
  
these	
  features	
  is	
  paramount	
  to	
  improving	
  the	
  player	
  
experience	
  
PATCHER
LOBBY CLIENT
GAME CLIENT
ITEM SHOP
CHALLENGE: THE GAME IS ALIVE
The	
  game	
  is	
  a	
  living,	
  breathing	
  service	
  that’s	
  always	
  in	
  mo@on	
  
‣  New	
  champions	
  
‣  New	
  items 	
  	
  
‣  New	
  effects/par@cles	
  
‣  Changes	
  in	
  environment	
  
‣  Changes	
  in	
  design	
  and	
  design	
  
balance	
  
	
  	
  
UPDATE
2-3WEEKS
CHALLENGE: WE’RE GLOBAL
CHALLENGE: PC VARIABILITY
‣  Hardware	
  and	
  OS	
  profiles	
  are	
  significantly	
  different	
  even	
  
within	
  regions	
  
▾  OS	
  and	
  patch	
  level	
  
▾  CPU	
  
▾  Memory	
  
▾  Video	
  card	
  
▾  Video	
  card	
  memory	
  
▾  Drivers	
  
CHALLENGE: GRAPHIC SETTINGS
CHALLENGE: CLIENT-SIDE LOGIC
IMPROVING THE PLAYER EXPERIENCE
‣  We	
  need	
  to	
  gather	
  informa@on	
  across	
  all	
  of	
  these	
  
dimensions	
  in	
  order	
  to	
  UNDERSTAND	
  the	
  player	
  experience	
  
‣  We	
  use	
  this	
  info	
  to:	
  
▾  React	
  quickly	
  to	
  changes	
  
▾  Op@mize	
  performance	
  
▾  Op@mize	
  designs	
  
▾  Improve	
  our	
  tes@ng	
  
•  Like	
  crea@ng	
  our	
  compa@bility	
  tes@ng	
  lab	
  
REACTING QUICKLY
GAME LOAD SCREEN
IMPROVING LOAD TIME
OPTIMIZING DESIGN AND PERFORMANCE
OPTIMIZING DESIGN AND PERFORMANCE
OPTIMIZING DESIGN AND PERFORMANCE
OPTIMIZING DESIGN AND PERFORMANCE
HOW DID WE SOLVE THIS
WE HAVE AN ARMY OF TEEMOS WATCHING PLAYERS’ MACHINES THROUGH THEIR TELESCOPES?!
(NOT REALLY, BUT WE DID CONSIDER IT)
HONU: GENERATE - COLLECT - ANALYZE
‣  Riot’s	
  self-­‐service	
  end-­‐to-­‐end	
  Big	
  Data	
  pipeline	
  
▾  Cloud-­‐ready	
  (AWS	
  compa@ble)	
  
▾  Internal	
  data-­‐center	
  ready	
  
▾  Persistent	
  storage:	
  HDFS/S3	
  
▾  Batch	
  processing:	
  Apache	
  Hadoop/AWS	
  EMR	
  
▾  Data	
  publish:	
  Apache	
  Hive	
  
	
  
EVENT GENERATION
‣  Honu	
  SDKs:	
  Java,	
  C++,	
  Erlang	
  
‣  Collector	
  discovery	
  
‣  Failover	
  
‣  Load	
  balancing	
  
‣  Buffering/Batching	
  
‣  Dispatching	
  
‣  Thri	
  transport	
  
HONU CLIENT SDK
Select	
  avg(f[‘pingAVG’])	
  from	
  game_client_stats	
  group	
  by	
  f[‘serverId’];	
  
pingAvg	
   serverId	
   system	
  source	
   	
  	
  app	
  @mestamp	
  
1234567890	
   99.123.456.78	
   game_client	
   220.9542	
   12.345.678.90	
   Intel64	
  …	
  
GAME_CLIENT_STATS	
  
EVENT COLLECTION
‣  Honu	
  collector	
  
‣  Online	
  system	
  
‣  High	
  availability	
  –	
  100%	
  up@me	
  
‣  Horizontally	
  scalable	
  
‣  Elas@c	
  
‣  Fault	
  tolerant	
  
‣  Neulix	
  OSS	
  Eureka	
  discovery	
  service	
  
HONU COLLECTOR
‣  Collect	
  events	
  from	
  mul@ple	
  clients	
  
(Thri/NIO)	
  
‣  Save	
  all	
  events	
  to	
  one	
  compressed	
  
file	
  locally	
  
‣  Upload	
  that	
  file	
  every	
  XX	
  minutes	
  to	
  
HDFS/S3	
  
‣  Send	
  a	
  message	
  to	
  Queue/SQS	
  for	
  
Demux	
  
H	
  o	
  n	
  u	
  C	
  o	
  l	
  l	
  e	
  c	
  t	
  o	
  r	
  s	
  
S	
  Q	
  S	
  
S	
  3	
  
EVENT ORGANIZATION
‣  Honu	
  demux	
  
‣  Mul@-­‐stage	
  batch	
  processing	
  pipeline	
  
‣  Elas@c	
  producer-­‐consumer	
  
‣  Apache	
  Hadoop	
  map	
  reduce	
  
‣  Standalone	
  map	
  reduce	
  mode	
  
‣  Apache	
  Hive	
  integra@on	
  
HONU DEMUX
‣  Mul@-­‐Stage	
  batch	
  
processing	
  pipeline	
  
‣  Bucket	
  events	
  to	
  separate	
  
tables	
  
‣  Write	
  Hive	
  par@@on	
  files	
  
‣  Add	
  par@@ons	
  to	
  Hive	
  
metastore	
  
‣  Merge	
  par@@ons	
  
	
  
Demux	
  
	
  SQS	
  
S3
S3	
  
Standalone
Demux
Standalone
Demux
Standalone
Demux
Standalone
Demux
S3 S3
S3 S3
HIVE	
  
MERGE	
  
HONU PIPELINE
HONU
CLIENT
SDK
HONU
COLLECTORS
HONU
DEMUX
ORGANIZECOLLECTGENERATE
USE CASE:
PLAYER BEHAVIOR
PLAYER BEHAVIOR
PLAYER BEHAVIOR INITIATIVES
TRIBUNAL JUSTICE
‣  Community	
  regulated	
  
‣  In-­‐game	
  chat	
  log	
  
‣  Player	
  stats	
  
‣  Inventory	
  
‣  Game	
  Info	
  
PLAYER BEHAVIOR INITIATIVES
HONOR SYSTEM
‣  Recognize	
  posi@ve	
  experience	
  
‣  Improve	
  sportsmanship	
  
STARTUP TIPS
TEAMS THAT USE SMART PINGS TO ALERT OTHER PLAYERS TO THREATS ARE MORE LIKELY TO WIN GAME
PLAYERS WHO FOLLOW THE SUMMONER'S CODE WIN 27% MORE GAMES
THE TRIBUNAL BANS PLAYERS FOR NEGATIVE BEHAVIOR SUCH AS VERBAL HARASSMENT
PLAYERS WHO COOPERATE WITH THEIR TEAM WIN 31% MORE GAMES
HOW WE SOLVED IT – EXTEND HONU
HONU
CLIENT
SDK
HONU
COLLECTORS
HONU
DEMUX
ORGANIZECOLLECTGENERATE
HONU TOOLS: DRADIS
‣  Hwp	
  based	
  data	
  collec@on	
  
‣  Large	
  volume	
  of	
  data	
  from	
  
untrusted	
  source	
  
‣  C10K	
  
‣  Nginx	
  +	
  Newy	
  
‣  4+	
  billion	
  API	
  calls/day	
  
‣  Peak	
  100K+	
  calls/sec	
  
	
  
HONU TOOLS: DRADIS
‣  Json	
  Messages:	
  
▾  curl	
  -­‐d	
  ’[	
  
{"messageType":	
  "Foo",	
  "@mestamp":	
  1369064555,	
  "fact":	
  "Hello	
  World!"},	
  {"messageType":	
  
"Foo",	
  "@mestamp":	
  1369064555,	
  "fact":	
  "Hello	
  Dradis!",	
  	
  
"fic@on":	
  "Hello	
  Honu!"}]’	
  	
  
‣  Hive	
  Query:	
  
▾  Select	
  *	
  from	
  foo	
  where	
  f[‘fact’]	
  =	
  ‘Hello	
  Dradis!’	
  
Table:	
  Foo	
  
HONU TOOLS: ECHO SERVICE
‣  Web	
  UI	
  to	
  easily	
  and	
  immediately	
  visualize	
  the	
  data	
  that	
  has	
  been	
  sent	
  
to	
  Honu	
  collectors	
  
‣  Self-­‐service	
  end-­‐to-­‐end	
  pipeline	
  
HONU TOOLS: ECHO SERVICE
‣  Web	
  UI	
  to	
  easily	
  and	
  immediately	
  visualize	
  the	
  data	
  that	
  has	
  been	
  sent	
  
to	
  Honu	
  collectors	
  
‣  Self-­‐service	
  end-­‐to-­‐end	
  pipeline	
  
HONU TOOLS: ECHO SERVICE
‣  Web	
  UI	
  to	
  easily	
  and	
  immediately	
  visualize	
  the	
  data	
  that	
  has	
  been	
  sent	
  
to	
  Honu	
  collectors	
  
‣  Self-­‐service	
  end-­‐to-­‐end	
  pipeline	
  
HONU TOOLS: METADATA SERVICE
‣  Data	
  discovery	
  
‣  Schema	
  management	
  
‣  Counter,	
  @me	
  
HONU TOOLS: REAL-TIME SLICING/DICING
‣  Integration with Platfora
‣  End-user ad-hoc analysis tool
‣  Interactive visual feedback
‣  Realtime exploration/graphing @ 109 data points
HONU TOOLS: REAL-TIME SLICING/DICING
HONU TOOLS: WORKFLOW MANAGEMENT
ENTERPRISE WORKFLOW
MANAGEMENT
MATT GOEKE
@ LATER TODAY
ClientMobile
WWW
HONU STATS
‣  7+ billion events/day
‣  Tested @ 70+ billion events/day
‣  100+ tables
▾  10+ tables @ 100M – 1B rows/day
‣  7 Petabytes Game Event Dataset
‣  Semi-global deployment
‣  0 downtime
‣  Runs in cloud (AWS) +
datacenter
SUMMARY
GOALS
ü Democra@ze	
  Data	
  Access	
  
ü Enable	
  Self-­‐service	
  Data	
  Collec@on	
  and	
  Analysis	
  
ü Create	
  Ac@onable	
  Insights	
  
ü Increase	
  Speed	
  to	
  Insight	
  
HONU
HONU
CLIENT
SDK
FUTURE
‣  Improve	
  self-­‐service	
  workflow	
  &	
  tooling	
  
▾  Metadata	
  management	
  
▾  Discovery	
  of	
  captured	
  data	
  
▾  Workflow	
  management	
  
▾  Plauora	
  to	
  all	
  teams	
  
‣  Real@me	
  event	
  aggrega@on	
  
‣  Global	
  data	
  infrastructure	
  
‣  Replace	
  legacy	
  audit/event	
  logging	
  services	
  
HANDLE INCREASING DATA VELOCITY
JUNE 2012 JULY 2013
MySQL	
  tables	
   180	
   1200	
  
Pipeline	
  Events/day	
   0	
   7+	
  Billion	
  
Workflows	
   Cronjob	
  +	
  Pentaho	
   Oozie	
  
Environment	
   Datacenter	
   DC	
  +	
  AWS	
  
SLA	
   1	
  day	
   2	
  hours	
  
Event	
  tracking	
   •  2+	
  weeks	
  (DB	
  
update)	
  
•  Dependencies:	
  DBA	
  
teams	
  +	
  ETL	
  teams	
  +	
  
Tools	
  teams	
  
•  Down@me	
  (3h	
  min.)	
  
•  10	
  minutes	
  
•  Self-­‐Service	
  
	
  
•  No	
  down@me	
  
DECREASE TEEMO DEATHS?
SHAMELESS HIRING PLUG
Like most everybody else at this conference… we’re hiring!
PLAYER EXPERIENCE FIRST
CHALLENGE CONVENTION
FOCUS ON TALENT AND TEAM
TAKE PLAY SERIOUSLY
STAY HUNGRY, STAY HUMBLE
THE RIOT MANIFESTO
SHAMELESS HIRING PLUG
AND YES, YOU CAN PLAY GAMES AT WORK
IT’S ENCOURAGED!
THANK YOU! QUESTIONS?
BARRY LIVINGSTON
blivingston@riotgames.com
SANDEEP SHRESTHA
sshrestha@riotgames.com

More Related Content

What's hot

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 

What's hot (20)

Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
The best way to run Elastic on Kubernetes
The best way to run Elastic on KubernetesThe best way to run Elastic on Kubernetes
The best way to run Elastic on Kubernetes
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Observability will not fix your Broken Monitoring ,Ignite
Observability will not fix your Broken Monitoring ,IgniteObservability will not fix your Broken Monitoring ,Ignite
Observability will not fix your Broken Monitoring ,Ignite
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 

Similar to Big Data at Riot Games – Using Hadoop to Understand Player Experience - StampedeCon 2013

Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)
Kevin Weil
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
DataWorks Summit
 
Supersize your production pipe enjmin 2013 v1.1 hd
Supersize your production pipe    enjmin 2013 v1.1 hdSupersize your production pipe    enjmin 2013 v1.1 hd
Supersize your production pipe enjmin 2013 v1.1 hd
slantsixgames
 

Similar to Big Data at Riot Games – Using Hadoop to Understand Player Experience - StampedeCon 2013 (20)

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
 
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka AggregationKafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
 
Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)Hadoop at Twitter (Hadoop Summit 2010)
Hadoop at Twitter (Hadoop Summit 2010)
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to back
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...
Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...
Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...
 
Using Event Streams in Serverless Applications
Using Event Streams in Serverless ApplicationsUsing Event Streams in Serverless Applications
Using Event Streams in Serverless Applications
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Supersize your production pipe enjmin 2013 v1.1 hd
Supersize your production pipe    enjmin 2013 v1.1 hdSupersize your production pipe    enjmin 2013 v1.1 hd
Supersize your production pipe enjmin 2013 v1.1 hd
 

More from StampedeCon

Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 

Big Data at Riot Games – Using Hadoop to Understand Player Experience - StampedeCon 2013

  • 1. BIG DATA @ RIOT GAMES USING HADOOP TO IMPROVE THE PLAYER EXPERIENCE BARRY LIVINGSTON & SANDEEP SHRESTHA | JULY 2013
  • 3. CONTEXT HIGH LEVEL ARCHITECTURE PLAYER EXPERIENCE USE CASES SUMMARY QUICK DATA WAREHOUSE HISTORY
  • 4. FIRST, A BIT OF CONTEXT…
  • 5.
  • 6. WHAT IS LEAGUE OF LEGENDS? 2009 LAUNCH TEAM ORIENTED 100+ CHAMPS MODERN FANTASY
  • 7. WHAT IS LEAGUE OF LEGENDS?
  • 8. LEAGUE OF LEGENDS GAMEPLAY - CHAMPIONS
  • 9. LEAGUE OF LEGENDS GAMEPLAY - GAMEPLAY
  • 11. INITIAL LAUNCH / SCRAPPY START UP PHASE ‣  Had  a  single,  dedicated  MySQL  instance  for  the  DW   ‣  Data  was  ETL’d  from  produc@on  slaves  into  this  instance   ‣  Queries  were  run  in  MySQL   ‣  Repor@ng  was  done  in  Excel   ▾  All  ETLs,  queries  and  repor@ng  were  done  by  one  person   HISTORY   START-­‐UP   THIS WORKED GREAT!
  • 12. THEN – CRAZY GROWTH HISTORY   START-­‐UP   @me   #  unique  logins   TOTAL  ACTIVE  PLAYERS    June  2012   CRAZY   GROWTH  
  • 13. THE BREAKING POINT HISTORY   START-­‐UP   CRAZY   GROWTH   BREAKING   POINT   ‣  Data  warehouse  reached  a  breaking  point   ▾  24  hours  of  data  took  24.5  hours  to  ETL   ‣  We  couldn’t  handle…   ▾  Mul@ple  environments  in  a  ver@cal  MySQL  instance     ▾  A  single  environment  in  a  ver@cal  MySQL  instance   ‣  We  needed  to  change    
  • 14. INTRODUCTION OF HADOOP HISTORY   START-­‐UP   CRAZY   GROWTH   BREAKING   POINT   ‣  Hadoop  has  a  number  of  great  quali@es   ▾  Cost  effec@ve   ▾  Scalable   ▾  Open  source   ▾  We  could  execute  quickly   HADOOP  
  • 15. HIGH LEVEL ARCHITECTURE – JUNE 2012 Tableau     Hive  Data  Warehouse   Pentaho     +     Custom   ETL     +     Sqoop   MySQL  Pentaho   Analysts   EUROPE   Audit   Plat   LoL   KOREA   Audit   Plat   LoL   NORTH  AMERICA   Audit   Plat   LoL   Business   Analyst  
  • 16. BUT, THIS WASN’T GOOD ENOUGH ‣  The  @me  to  arrive  at  insight  was  too  long!   ‣  Our  solu@on  required  too  much  data  team  involvement   ▾  Schema  changes   ▾  ETL  tweaks   ▾  Hive  metadata  updates   ‣  Hive  is  painful  for  ad-­‐hoc  or  interac@ve  analysis   ▾  Especially  for  non-­‐technical  folks  
  • 17. GOALS ‣  Democra@ze  data  access   ▾  Enable  Self-­‐service  Data  Collec@on  and   Analysis   ‣  Create  ac@onable  insights   ‣  Increase  speed  to  insight  
  • 18. USE CASE: GAME CLIENT PERFORMANCE
  • 19. CLIENT FOOTPRINT ‣  Significant  por@on  of  our  soware  runs  directly  on  players’   machines   ▾  High  performance  graphics   ▾  Responsiveness   ‣  There  is  logic  in  these  components  that's  ONLY  exercised   on  the  client-­‐side   ‣  Understanding  the  performance,  reliability  and  stability  of   these  features  is  paramount  to  improving  the  player   experience  
  • 24. CHALLENGE: THE GAME IS ALIVE The  game  is  a  living,  breathing  service  that’s  always  in  mo@on   ‣  New  champions   ‣  New  items     ‣  New  effects/par@cles   ‣  Changes  in  environment   ‣  Changes  in  design  and  design   balance       UPDATE 2-3WEEKS
  • 26. CHALLENGE: PC VARIABILITY ‣  Hardware  and  OS  profiles  are  significantly  different  even   within  regions   ▾  OS  and  patch  level   ▾  CPU   ▾  Memory   ▾  Video  card   ▾  Video  card  memory   ▾  Drivers  
  • 29. IMPROVING THE PLAYER EXPERIENCE ‣  We  need  to  gather  informa@on  across  all  of  these   dimensions  in  order  to  UNDERSTAND  the  player  experience   ‣  We  use  this  info  to:   ▾  React  quickly  to  changes   ▾  Op@mize  performance   ▾  Op@mize  designs   ▾  Improve  our  tes@ng   •  Like  crea@ng  our  compa@bility  tes@ng  lab  
  • 33. OPTIMIZING DESIGN AND PERFORMANCE
  • 34. OPTIMIZING DESIGN AND PERFORMANCE
  • 35. OPTIMIZING DESIGN AND PERFORMANCE
  • 36. OPTIMIZING DESIGN AND PERFORMANCE
  • 37. HOW DID WE SOLVE THIS WE HAVE AN ARMY OF TEEMOS WATCHING PLAYERS’ MACHINES THROUGH THEIR TELESCOPES?! (NOT REALLY, BUT WE DID CONSIDER IT)
  • 38. HONU: GENERATE - COLLECT - ANALYZE ‣  Riot’s  self-­‐service  end-­‐to-­‐end  Big  Data  pipeline   ▾  Cloud-­‐ready  (AWS  compa@ble)   ▾  Internal  data-­‐center  ready   ▾  Persistent  storage:  HDFS/S3   ▾  Batch  processing:  Apache  Hadoop/AWS  EMR   ▾  Data  publish:  Apache  Hive    
  • 39. EVENT GENERATION ‣  Honu  SDKs:  Java,  C++,  Erlang   ‣  Collector  discovery   ‣  Failover   ‣  Load  balancing   ‣  Buffering/Batching   ‣  Dispatching   ‣  Thri  transport  
  • 40. HONU CLIENT SDK Select  avg(f[‘pingAVG’])  from  game_client_stats  group  by  f[‘serverId’];   pingAvg   serverId   system  source      app  @mestamp   1234567890   99.123.456.78   game_client   220.9542   12.345.678.90   Intel64  …   GAME_CLIENT_STATS  
  • 41. EVENT COLLECTION ‣  Honu  collector   ‣  Online  system   ‣  High  availability  –  100%  up@me   ‣  Horizontally  scalable   ‣  Elas@c   ‣  Fault  tolerant   ‣  Neulix  OSS  Eureka  discovery  service  
  • 42. HONU COLLECTOR ‣  Collect  events  from  mul@ple  clients   (Thri/NIO)   ‣  Save  all  events  to  one  compressed   file  locally   ‣  Upload  that  file  every  XX  minutes  to   HDFS/S3   ‣  Send  a  message  to  Queue/SQS  for   Demux   H  o  n  u  C  o  l  l  e  c  t  o  r  s   S  Q  S   S  3  
  • 43. EVENT ORGANIZATION ‣  Honu  demux   ‣  Mul@-­‐stage  batch  processing  pipeline   ‣  Elas@c  producer-­‐consumer   ‣  Apache  Hadoop  map  reduce   ‣  Standalone  map  reduce  mode   ‣  Apache  Hive  integra@on  
  • 44. HONU DEMUX ‣  Mul@-­‐Stage  batch   processing  pipeline   ‣  Bucket  events  to  separate   tables   ‣  Write  Hive  par@@on  files   ‣  Add  par@@ons  to  Hive   metastore   ‣  Merge  par@@ons     Demux    SQS   S3 S3   Standalone Demux Standalone Demux Standalone Demux Standalone Demux S3 S3 S3 S3 HIVE   MERGE  
  • 48. PLAYER BEHAVIOR INITIATIVES TRIBUNAL JUSTICE ‣  Community  regulated   ‣  In-­‐game  chat  log   ‣  Player  stats   ‣  Inventory   ‣  Game  Info  
  • 49. PLAYER BEHAVIOR INITIATIVES HONOR SYSTEM ‣  Recognize  posi@ve  experience   ‣  Improve  sportsmanship  
  • 50. STARTUP TIPS TEAMS THAT USE SMART PINGS TO ALERT OTHER PLAYERS TO THREATS ARE MORE LIKELY TO WIN GAME PLAYERS WHO FOLLOW THE SUMMONER'S CODE WIN 27% MORE GAMES THE TRIBUNAL BANS PLAYERS FOR NEGATIVE BEHAVIOR SUCH AS VERBAL HARASSMENT PLAYERS WHO COOPERATE WITH THEIR TEAM WIN 31% MORE GAMES
  • 51. HOW WE SOLVED IT – EXTEND HONU HONU CLIENT SDK HONU COLLECTORS HONU DEMUX ORGANIZECOLLECTGENERATE
  • 52. HONU TOOLS: DRADIS ‣  Hwp  based  data  collec@on   ‣  Large  volume  of  data  from   untrusted  source   ‣  C10K   ‣  Nginx  +  Newy   ‣  4+  billion  API  calls/day   ‣  Peak  100K+  calls/sec    
  • 53. HONU TOOLS: DRADIS ‣  Json  Messages:   ▾  curl  -­‐d  ’[   {"messageType":  "Foo",  "@mestamp":  1369064555,  "fact":  "Hello  World!"},  {"messageType":   "Foo",  "@mestamp":  1369064555,  "fact":  "Hello  Dradis!",     "fic@on":  "Hello  Honu!"}]’     ‣  Hive  Query:   ▾  Select  *  from  foo  where  f[‘fact’]  =  ‘Hello  Dradis!’   Table:  Foo  
  • 54. HONU TOOLS: ECHO SERVICE ‣  Web  UI  to  easily  and  immediately  visualize  the  data  that  has  been  sent   to  Honu  collectors   ‣  Self-­‐service  end-­‐to-­‐end  pipeline  
  • 55. HONU TOOLS: ECHO SERVICE ‣  Web  UI  to  easily  and  immediately  visualize  the  data  that  has  been  sent   to  Honu  collectors   ‣  Self-­‐service  end-­‐to-­‐end  pipeline  
  • 56. HONU TOOLS: ECHO SERVICE ‣  Web  UI  to  easily  and  immediately  visualize  the  data  that  has  been  sent   to  Honu  collectors   ‣  Self-­‐service  end-­‐to-­‐end  pipeline  
  • 57. HONU TOOLS: METADATA SERVICE ‣  Data  discovery   ‣  Schema  management   ‣  Counter,  @me  
  • 58. HONU TOOLS: REAL-TIME SLICING/DICING ‣  Integration with Platfora ‣  End-user ad-hoc analysis tool ‣  Interactive visual feedback ‣  Realtime exploration/graphing @ 109 data points
  • 59. HONU TOOLS: REAL-TIME SLICING/DICING
  • 60. HONU TOOLS: WORKFLOW MANAGEMENT ENTERPRISE WORKFLOW MANAGEMENT MATT GOEKE @ LATER TODAY ClientMobile WWW
  • 61. HONU STATS ‣  7+ billion events/day ‣  Tested @ 70+ billion events/day ‣  100+ tables ▾  10+ tables @ 100M – 1B rows/day ‣  7 Petabytes Game Event Dataset ‣  Semi-global deployment ‣  0 downtime ‣  Runs in cloud (AWS) + datacenter
  • 63. GOALS ü Democra@ze  Data  Access   ü Enable  Self-­‐service  Data  Collec@on  and  Analysis   ü Create  Ac@onable  Insights   ü Increase  Speed  to  Insight   HONU HONU CLIENT SDK
  • 64. FUTURE ‣  Improve  self-­‐service  workflow  &  tooling   ▾  Metadata  management   ▾  Discovery  of  captured  data   ▾  Workflow  management   ▾  Plauora  to  all  teams   ‣  Real@me  event  aggrega@on   ‣  Global  data  infrastructure   ‣  Replace  legacy  audit/event  logging  services  
  • 65. HANDLE INCREASING DATA VELOCITY JUNE 2012 JULY 2013 MySQL  tables   180   1200   Pipeline  Events/day   0   7+  Billion   Workflows   Cronjob  +  Pentaho   Oozie   Environment   Datacenter   DC  +  AWS   SLA   1  day   2  hours   Event  tracking   •  2+  weeks  (DB   update)   •  Dependencies:  DBA   teams  +  ETL  teams  +   Tools  teams   •  Down@me  (3h  min.)   •  10  minutes   •  Self-­‐Service     •  No  down@me  
  • 67. SHAMELESS HIRING PLUG Like most everybody else at this conference… we’re hiring! PLAYER EXPERIENCE FIRST CHALLENGE CONVENTION FOCUS ON TALENT AND TEAM TAKE PLAY SERIOUSLY STAY HUNGRY, STAY HUMBLE THE RIOT MANIFESTO
  • 68. SHAMELESS HIRING PLUG AND YES, YOU CAN PLAY GAMES AT WORK IT’S ENCOURAGED!
  • 69. THANK YOU! QUESTIONS? BARRY LIVINGSTON blivingston@riotgames.com SANDEEP SHRESTHA sshrestha@riotgames.com