Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scalability & Big Data challenges in
Real-Time Multiplayer
games
Real-Time games in Top 100 Grossing (2017)
2014
2015
2016
2017
(3)
(6)
(8)
(13)
2018 ???
Enabling Factors
Source: PC Mag
Enabling Factors
Source: OpenSignal
Enabling Factors
QUIZ TIME: In 2017, which of these games
has made the most revenue?
The world’s most popular
MOBA on PC
The world’s most p...
The world’s most popular
MOBA on PC
The world’s most popular
First Person Shooter
Some game by Blizzard
Some game by EA
A ...
>$400M Monthly Revenue
Source: Bloomberg
>80M DAU
Source: Tencent
10-20 inputs/s, sensitive to lags (> 300ms)
unpredictable network, limited bandwidth
Decisions, decisions...
Build vs Buy?
Self-hosted vs Cloud?
Global deployment vs Centralized?
TCP vs UDP?
Server Authorita...
Constraints/Trade-offs
Latency (RTT)
Cost
Complexity
Scalability
Operational overhead
Global Deployment
vs
Centralised
10-20 inputs/s, sensitive to lags (> 300ms)
optimize for this
Global Deployment
● Players are geo-routed to closest multiplayer server.
● Matched with other players in the same geo-reg...
Global Deployment
● Should leaderboards be global or regional?
● Should guilds/alliances be global or regional?
● Should c...
Server Authoritative
vs
Lock-Step
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (eithe...
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (eithe...
Client 1 Client 2Server
C1 control 1 C2 control 1
game state 1
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
game state 3
C1 control 1
C2 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 cont...
Pros
● Always in-sync.
● Hard to cheat - no memory hacks, etc.
● Easy (and quick) to join mid-match.
● Server can detect l...
Cons
● High server load.
● High bandwidth usage.
● Synchronization on the client is complicated.
● Little experience in th...
high server load and
bandwidth needs
client has to receive
more data
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffere...
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffere...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 cont...
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 cont...
Pros
● Light server load.
● Lower bandwidth usage.
● Simpler server implementation.
Cons
● Needs deterministic game engine.
● Unity has long-standing determinism problem with floating point.
● Hackable, req...
fix-point math,
server validation, ...
bandwidth
Build vs Buy
Pros
● Easy to use.
● Already use it for prototype games.
● Multi-region, lobby, etc. come out-of-the-box.
● Had a long ti...
Cons
● Quite expensive, pay for provisioned peak monthly CCU.
● “can we bet the future of our company on a third-party?”.
...
So, we decided to build our
own networking stack
+
A model for describing computation, coined by
Carl Hewitt & co in 1973.
Later popularised by Erlang.
Actor Model
Carl Hewi...
Everything is an actor.
Every actor has a mailbox.
An actor is the fundamental unit that embodies
the 3 essential things f...
Actors don’t share memory, they communicate
only via messages.
When an actor receives a message, it can:
● create new acto...
Actors don’t share memory, they communicate
only via messages.
When an actor receives a message, it can:
● create new acto...
Ericsson AXD301
Inside an actor, messages are processed one-at-a-time, in a
single-threaded fashion.
No need for locks!
Actor Model
single...
Inside an actor, messages are processed one-at-a-time, in a
single-threaded fashion.
No need for locks!
Simplifies concurr...
Lifts concurrency management to the mailbox.
Allows you to “think globally, but act locally”.
Actor Model
Lifts concurrency management to the mailbox.
Allows you to “think globally, but act locally”.
Easier to think about a comp...
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
authenticate
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
buffering
connection open
authenticate
send/receive
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined...
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
sen...
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
sen...
MATCH 1
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 input
connection open
authenticate
se...
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcas...
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcas...
concurrency
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
connection open
authenticat...
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
buffering
broadcast!
C1 inp...
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
b...
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
b...
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 joined
act locally
think globally
how a...
the secret to building high
performance systems is simplicity
complexity kills performance
Higher CCU per server
Fewer servers
Lower cost
Less operational overhead
Performance Matters
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet ...
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet ...
Threads are heavy OS constructs.
Each thread is allocated 1MB stack space by default.
Context Switching is expensive at sc...
Non-blocking I/O framework for JVM.
Highly performant.
Simplifies implementation of socket servers (TCP/ UDP).
UDP support...
Custom network protocol (bandwidth).
Buffer pooling (GC pressure).
Minimise Netty object creations (GC pressure).
Using di...
AWS Lambda functions to run bot clients (written with Akka):
● Cheaper
● Faster to boot up
● Easy to update
Each Lambda in...
from US-EAST (Lambda)
to EU-WEST (game server)
optimize for tail latencies
from US-EAST (Lambda)
to EU-WEST (game server)
http://bit.ly/2xgGHXZ
Thank You!
QUESTIONS?
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Scalability & Big Data challenges in real time multiplayer games
Upcoming SlideShare
Loading in …5
×

Scalability & Big Data challenges in real time multiplayer games

5,037 views

Published on

With the recent emergence of top grossing real time games in the East (Honour of Kings) and West (Clash Royale, Golf Clash, etc) - the doors have opened for new genres of social games on mobile. However, building a real time multiplayer game for mobile comes with many interesting technical and design challenges. Join us in this session to see how the team at Space Ape Games is approaching these challenges head on with the help of Akka and AWS.

Published in: Technology
  • Be the first to comment

Scalability & Big Data challenges in real time multiplayer games

  1. 1. Scalability & Big Data challenges in Real-Time Multiplayer games
  2. 2. Real-Time games in Top 100 Grossing (2017) 2014 2015 2016 2017 (3) (6) (8) (13) 2018 ???
  3. 3. Enabling Factors Source: PC Mag
  4. 4. Enabling Factors Source: OpenSignal
  5. 5. Enabling Factors
  6. 6. QUIZ TIME: In 2017, which of these games has made the most revenue? The world’s most popular MOBA on PC The world’s most popular First Person Shooter Some game by Blizzard Some game by EA A Chinese 5v5 mobile game you never hear of Some game by King
  7. 7. The world’s most popular MOBA on PC The world’s most popular First Person Shooter Some game by Blizzard Some game by EA A Chinese 5v5 mobile game you never heard of Some game by King QUIZ TIME: In 2017, which of these games has made the most revenue?
  8. 8. >$400M Monthly Revenue Source: Bloomberg >80M DAU Source: Tencent
  9. 9. 10-20 inputs/s, sensitive to lags (> 300ms)
  10. 10. unpredictable network, limited bandwidth
  11. 11. Decisions, decisions... Build vs Buy? Self-hosted vs Cloud? Global deployment vs Centralized? TCP vs UDP? Server Authoritative vs Lock-Step?
  12. 12. Constraints/Trade-offs Latency (RTT) Cost Complexity Scalability Operational overhead
  13. 13. Global Deployment vs Centralised
  14. 14. 10-20 inputs/s, sensitive to lags (> 300ms)
  15. 15. optimize for this
  16. 16. Global Deployment ● Players are geo-routed to closest multiplayer server. ● Matched with other players in the same geo-region for best UX. ● No need for players to “choose server”, it should just work.
  17. 17. Global Deployment ● Should leaderboards be global or regional? ● Should guilds/alliances be global or regional? ● Should chatrooms be global or regional? ● Should liveops events be global or regional? ● Should players be allowed to play with others in another region? ie. play with distant relatives/friends. ● Should players be allowed to switch default region? eg. moved to Europe after Brexit
  18. 18. Server Authoritative vs Lock-Step
  19. 19. Server Authoritative ● Server decides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server.
  20. 20. Server Authoritative ● Server decides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server. ● Client keeps internal state for game world, which mirrors server state. ● Client doesn’t modify world state directly, only display with some prediction to mask network latency.
  21. 21. Client 1 Client 2Server C1 control 1 C2 control 1 game state 1
  22. 22. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  23. 23. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  24. 24. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  25. 25. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  26. 26. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  27. 27. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  28. 28. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  29. 29. Pros ● Always in-sync. ● Hard to cheat - no memory hacks, etc. ● Easy (and quick) to join mid-match. ● Server can detect lagged/DC’d client and take over with AI.
  30. 30. Cons ● High server load. ● High bandwidth usage. ● Synchronization on the client is complicated. ● Little experience in the company with server-side .Net stack. (bus factor of 1) ● .NetCore was/is still a moving target.
  31. 31. high server load and bandwidth needs client has to receive more data
  32. 32. Lock-Step* ● Client sends all inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. * traditional RTS games tend to use peer-to-peer model
  33. 33. Lock-Step* ● Client sends all inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. ● Client executes all inputs in the same order. ● Because everyone is 'guaranteed' to have executed the same input at the same frame in the same order, we get synchronicity. ● Use prediction to mask network latency. * traditional RTS games tend to use peer-to-peer model
  34. 34. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 inputs, instead of game state
  35. 35. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT: time between sending an input to receiving it back from server
  36. 36. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  37. 37. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time
  38. 38. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time RTT = latency x 2 + X Xmin = 0, Xmax = frame time
  39. 39. Pros ● Light server load. ● Lower bandwidth usage. ● Simpler server implementation.
  40. 40. Cons ● Needs deterministic game engine. ● Unity has long-standing determinism problem with floating point. ● Hackable, requires some form of server-side validation. ● All clients must take over lagged/DC’d client with AI. ● Slower to join mid-match, need to process all inputs. ● Need to ensure all clients in a match are compatible.
  41. 41. fix-point math, server validation, ...
  42. 42. bandwidth
  43. 43. Build vs Buy
  44. 44. Pros ● Easy to use. ● Already use it for prototype games. ● Multi-region, lobby, etc. come out-of-the-box. ● Had a long time to optimize their solution.
  45. 45. Cons ● Quite expensive, pay for provisioned peak monthly CCU. ● “can we bet the future of our company on a third-party?”. ● Unknown global distribution at scale ● Accessibility of support. ● Limited extensibility. ● Runs on Windows.
  46. 46. So, we decided to build our own networking stack
  47. 47. +
  48. 48. A model for describing computation, coined by Carl Hewitt & co in 1973. Later popularised by Erlang. Actor Model Carl Hewitt
  49. 49. Everything is an actor. Every actor has a mailbox. An actor is the fundamental unit that embodies the 3 essential things for computation: ● processing ● storage ● communications Actor Model
  50. 50. Actors don’t share memory, they communicate only via messages. When an actor receives a message, it can: ● create new actors ● send messages to other actors ● do work Actor Model
  51. 51. Actors don’t share memory, they communicate only via messages. When an actor receives a message, it can: ● create new actors ● send messages to other actors ● do work Actor Model Johnny? Not sharing memory prevents cascade failures when an actor crashes.
  52. 52. Ericsson AXD301
  53. 53. Inside an actor, messages are processed one-at-a-time, in a single-threaded fashion. No need for locks! Actor Model single-threaded
  54. 54. Inside an actor, messages are processed one-at-a-time, in a single-threaded fashion. No need for locks! Simplifies concurrency, no deadlocks, race conditions, etc. Actor Model single-threaded
  55. 55. Lifts concurrency management to the mailbox. Allows you to “think globally, but act locally”. Actor Model
  56. 56. Lifts concurrency management to the mailbox. Allows you to “think globally, but act locally”. Easier to think about a complex system in terms of states and transitions, than to manage state mutations. Actor Model
  57. 57. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3 buffering
  58. 58. connection open MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3 buffering
  59. 59. connection open authenticate MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  60. 60. connection open authenticate send/receive MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  61. 61. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering
  62. 62. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  63. 63. MATCH 1 current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  64. 64. MATCH 1 current frame history frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast!
  65. 65. MATCH 1 current frame history frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast! C3 input
  66. 66. concurrency
  67. 67. MATCH 1 current frame history frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast! C1 input
  68. 68. MATCH 1 current frame history frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input buffering broadcast! C1 input C2 input
  69. 69. MATCH MATCH MATCH MATCH MATCH
  70. 70. MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH
  71. 71. MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH
  72. 72. MATCH C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  73. 73. MATCH C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  74. 74. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Socket actor Match actor
  75. 75. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  76. 76. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  77. 77. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined
  78. 78. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined C3 joined act locally think globally how actors interact with each other aka, the “protocol”
  79. 79. the secret to building high performance systems is simplicity complexity kills performance
  80. 80. Higher CCU per server Fewer servers Lower cost Less operational overhead Performance Matters
  81. 81. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  82. 82. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  83. 83. Threads are heavy OS constructs. Each thread is allocated 1MB stack space by default. Context Switching is expensive at scale. Actors are cheap. Actor system can optimise use of threads to minimise context switching. Actor Model >
  84. 84. Non-blocking I/O framework for JVM. Highly performant. Simplifies implementation of socket servers (TCP/ UDP). UDP support is “meh”... Netty
  85. 85. Custom network protocol (bandwidth). Buffer pooling (GC pressure). Minimise Netty object creations (GC pressure). Using direct buffers (GC pressure). Disable Nagle's algorithm (latency). Epoll. Performance Tuning
  86. 86. AWS Lambda functions to run bot clients (written with Akka): ● Cheaper ● Faster to boot up ● Easy to update Each Lambda invocation could simulate up to 100 bots. Automated Load Testing
  87. 87. from US-EAST (Lambda) to EU-WEST (game server)
  88. 88. optimize for tail latencies from US-EAST (Lambda) to EU-WEST (game server)
  89. 89. http://bit.ly/2xgGHXZ
  90. 90. Thank You!
  91. 91. QUESTIONS?

×