Akka for real-time Multiplayer
mobile games
Who am I?
Real-Time games in Top 100 Grossing (2017)
2014
2015
2016
2017
(3)
(6)
(8)
(13)
2018 ???
Enabling Factors
Source: PC Mag
Enabling Factors
Source: OpenSignal
Enabling Factors
> $400M Monthly Revenue
Source: Bloomberg
> 80M DAU
Source: Tencent
10-20 inputs/s, sensitive to lags (> 300ms)
unpredictable network, limited bandwidth
Decisions, decisions...
Build vs Buy?
Global deployment vs Centralized?
TCP vs UDP?
Server Authoritative vs Lock-Step?
Constraints/Trade-offs
Latency (RTT)
Cost
Complexity
Scalability
Operational overhead
Global Deployment
vs
Centralised
10-20 inputs/s, sensitive to lags (> 300ms)
optimize for this
Global Deployment
● Players are geo-routed to closest multiplayer server.
● Matched with other players in the same geo-region for best UX.
● No need for players to “choose server”, it should just work.
Global Deployment
● Should leaderboards be global or regional?
● Should guilds/alliances be global or regional?
● Should chatrooms be global or regional?
● Should liveops events be global or regional?
● Should players be allowed to play with others in another region?
ie. play with distant relatives/friends.
● Should players be allowed to switch default region?
eg. moved to Europe after Brexit
Server Authoritative
vs
Lock-Step
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (either full, or delta) from server.
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (either full, or delta) from server.
● Client keeps internal state for game world, which mirrors server state.
● Client doesn’t modify world state directly, only display with some
prediction to mask network latency.
Client 1 Client 2Server
C1 control 1 C2 control 1
game state 1
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
C2 control 3
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 5
C2 control 3
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 5
C2 control 3
game state 4
Pros
● Always in-sync.
● Hard to cheat - no memory hacks, etc.
● Easy (and quick) to join mid-match.
● Server can detect lagged/DC’d client and take over with AI.
Cons
● High server load.
● High bandwidth usage.
● Synchronization on the client is complicated.
● Little experience in the company with server-side .Net stack.
(bus factor of 1)
● .NetCore was/is still a moving target.
high server load and
bandwidth needs
client has to receive
more data
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffered inputs to all clients X times a second.
* traditional RTS games tend to use peer-to-peer model
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffered inputs to all clients X times a second.
● Client executes all inputs in the same order.
● Because everyone is 'guaranteed' to have executed the same input at
the same frame in the same order, we get synchronicity.
● Use prediction to mask network latency.
* traditional RTS games tend to use peer-to-peer model
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
inputs, instead
of game state
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT: time between sending an input
to receiving it back from server
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT
frame time
latency
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT
frame time
RTT = latency x 2 + X
Xmin = 0, Xmax = frame time
latency
Pros
● Light server load.
● Lower bandwidth usage.
● Simpler server implementation.
Cons
● Needs deterministic game engine.
● Unity has long-standing determinism problem with floating point.
● Hackable, requires some form of server-side validation.
● All clients must take over lagged/DC’d client with AI.
● Slower to join mid-match, need to process all inputs.
● Need to ensure all clients in a match are compatible.
fix-point math,
server validation, ...
bandwidth
Build vs Buy
ApeSync
+
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
authenticate
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
buffering
connection open
authenticate
send/receive
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
buffering
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
send/receive
buffering
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcast!
C3 input
concurrency
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
C1 input
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
buffering
broadcast!
C1 input
C2 input
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
buffering
broadcast!
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
buffering
broadcast!
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 joined
act locally
think globally
how actors interact with each other
aka, the “protocol”
the secret to building high
performance systems is simplicity
complexity kills performance
Higher CCU per server
Fewer servers
Lower cost
Less operational overhead
Performance Matters
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet we should
not pass up our opportunities in
that critical 3%.
Performance Matters
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet we should
not pass up our opportunities in
that critical 3%.
Performance Matters
Threads are heavy OS constructs.
Each thread is allocated 1MB stack space by default.
Context Switching is expensive at scale.
Actors are cheap.
Actor system can optimise use of threads to minimise context switching.
Actor Model
>
Non-blocking I/O framework for JVM.
Very performant.
Simplifies implementation of socket servers (TCP/ UDP).
UDP support is “meh”...
Netty
Custom network protocol (bandwidth).
Minimise Netty object creations (GC pressure).
Performance Tuning
Buffer pooling (GC pressure).
Using direct buffers (GC pressure).
Performance Tuning
Disable Nagle's algorithm (latency).
TCP tuning (including BBR, etc. with differing results).
Epoll.
ENA-enabled EC2 instances.
Performance Tuning
Custom Reliable UDP protocol, optimized for countries with poor networking
conditions.
Performance Tuning
AWS Lambda functions to run bot clients (written with Akka):
● Cheaper
● Faster to boot up
● Easy to update
Each Lambda invocation could simulate up to 100 bots.
Automated Load Testing
from US-EAST (Lambda)
to EU-WEST (game server)
optimize for tail latencies
from US-EAST (Lambda)
to EU-WEST (game server)
(in the future)
● Linux traffic control to simulate different network conditions
● Load test on every commit
● ML for TCP/UDP tuning?
Automated Load Testing
http://bit.ly/2xgGHXZ
Monitoring
Monitoring
Performance in the Wild(in poor networking condition)
95 percentile
max playable RTT
outperforms Photon
on performance
Performance in the Wild(in poor networking condition)
fewer disconnects
Performance in the Wild
● Improved KPIs - D1 retention, session time, etc.
● 14% cheaper vs. Photon, based on current cost projection
We are hiring! :-)
http://www.spaceapegames.com/careers
talk to this guy for more detail =>
Alessandro Simi
Thank You!

Akka for realtime multiplayer mobile games

  • 1.
    Akka for real-timeMultiplayer mobile games
  • 2.
  • 3.
    Real-Time games inTop 100 Grossing (2017) 2014 2015 2016 2017 (3) (6) (8) (13) 2018 ???
  • 4.
  • 5.
  • 6.
  • 7.
    > $400M MonthlyRevenue Source: Bloomberg > 80M DAU Source: Tencent
  • 9.
    10-20 inputs/s, sensitiveto lags (> 300ms)
  • 10.
  • 12.
    Decisions, decisions... Build vsBuy? Global deployment vs Centralized? TCP vs UDP? Server Authoritative vs Lock-Step?
  • 13.
  • 14.
  • 17.
    10-20 inputs/s, sensitiveto lags (> 300ms)
  • 20.
  • 21.
    Global Deployment ● Playersare geo-routed to closest multiplayer server. ● Matched with other players in the same geo-region for best UX. ● No need for players to “choose server”, it should just work.
  • 22.
    Global Deployment ● Shouldleaderboards be global or regional? ● Should guilds/alliances be global or regional? ● Should chatrooms be global or regional? ● Should liveops events be global or regional? ● Should players be allowed to play with others in another region? ie. play with distant relatives/friends. ● Should players be allowed to switch default region? eg. moved to Europe after Brexit
  • 23.
  • 24.
    Server Authoritative ● Serverdecides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server.
  • 25.
    Server Authoritative ● Serverdecides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server. ● Client keeps internal state for game world, which mirrors server state. ● Client doesn’t modify world state directly, only display with some prediction to mask network latency.
  • 26.
    Client 1 Client2Server C1 control 1 C2 control 1 game state 1
  • 27.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  • 28.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  • 29.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  • 30.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  • 31.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  • 32.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  • 33.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  • 35.
    Pros ● Always in-sync. ●Hard to cheat - no memory hacks, etc. ● Easy (and quick) to join mid-match. ● Server can detect lagged/DC’d client and take over with AI.
  • 36.
    Cons ● High serverload. ● High bandwidth usage. ● Synchronization on the client is complicated. ● Little experience in the company with server-side .Net stack. (bus factor of 1) ● .NetCore was/is still a moving target.
  • 37.
    high server loadand bandwidth needs client has to receive more data
  • 38.
    Lock-Step* ● Client sendsall inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. * traditional RTS games tend to use peer-to-peer model
  • 39.
    Lock-Step* ● Client sendsall inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. ● Client executes all inputs in the same order. ● Because everyone is 'guaranteed' to have executed the same input at the same frame in the same order, we get synchronicity. ● Use prediction to mask network latency. * traditional RTS games tend to use peer-to-peer model
  • 40.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 inputs, instead of game state
  • 41.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT: time between sending an input to receiving it back from server
  • 42.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  • 43.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time latency
  • 44.
    Client 1 Client2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time RTT = latency x 2 + X Xmin = 0, Xmax = frame time latency
  • 46.
    Pros ● Light serverload. ● Lower bandwidth usage. ● Simpler server implementation.
  • 47.
    Cons ● Needs deterministicgame engine. ● Unity has long-standing determinism problem with floating point. ● Hackable, requires some form of server-side validation. ● All clients must take over lagged/DC’d client with AI. ● Slower to join mid-match, need to process all inputs. ● Need to ensure all clients in a match are compatible.
  • 48.
  • 51.
  • 52.
  • 54.
  • 55.
  • 56.
    MATCH 1 C1 input C2input current frame history frame 1 frame 2 frame 3 buffering
  • 57.
    connection open MATCH 1 C1input C2 input current frame history frame 1 frame 2 frame 3 buffering
  • 58.
    connection open authenticate MATCH 1 C1input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  • 59.
    connection open authenticate send/receive MATCH 1 C1input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  • 60.
    MATCH 1 C1 input C2input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering
  • 61.
    MATCH 1 C1 input C2input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  • 62.
    MATCH 1 current framehistory frame 1 frame 2 frame 3 C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  • 63.
    MATCH 1 current framehistory frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast!
  • 64.
    MATCH 1 current framehistory frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast! C3 input
  • 66.
  • 67.
    MATCH 1 current framehistory frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast! C1 input
  • 68.
    MATCH 1 current framehistory frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input buffering broadcast! C1 input C2 input
  • 69.
    MATCH MATCH MATCHMATCH MATCH
  • 70.
    MATCH MATCH MATCHMATCH MATCH MATCH MATCH MATCH MATCH MATCH
  • 71.
    MATCH MATCH MATCHMATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH
  • 72.
    MATCH C1 input C2 input currentframe history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  • 73.
    MATCH C1 input C2 input currentframe history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  • 74.
    MATCH current frame history frame1 frame 2 frame 3 C1 input C2 input C3 joined Socket actor Match actor
  • 75.
    MATCH current frame history frame1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  • 76.
    MATCH current frame history frame1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  • 77.
    MATCH current frame history frame1 frame 2 frame 3 C1 input C2 input C3 joined
  • 78.
    MATCH current frame history frame1 frame 2 frame 3 C1 input C2 input C3 joined C3 joined act locally think globally how actors interact with each other aka, the “protocol”
  • 80.
    the secret tobuilding high performance systems is simplicity complexity kills performance
  • 81.
    Higher CCU perserver Fewer servers Lower cost Less operational overhead Performance Matters
  • 82.
    We should forgetabout small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  • 83.
    We should forgetabout small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  • 84.
    Threads are heavyOS constructs. Each thread is allocated 1MB stack space by default. Context Switching is expensive at scale. Actors are cheap. Actor system can optimise use of threads to minimise context switching. Actor Model >
  • 85.
    Non-blocking I/O frameworkfor JVM. Very performant. Simplifies implementation of socket servers (TCP/ UDP). UDP support is “meh”... Netty
  • 86.
    Custom network protocol(bandwidth). Minimise Netty object creations (GC pressure). Performance Tuning
  • 87.
    Buffer pooling (GCpressure). Using direct buffers (GC pressure). Performance Tuning
  • 88.
    Disable Nagle's algorithm(latency). TCP tuning (including BBR, etc. with differing results). Epoll. ENA-enabled EC2 instances. Performance Tuning
  • 89.
    Custom Reliable UDPprotocol, optimized for countries with poor networking conditions. Performance Tuning
  • 90.
    AWS Lambda functionsto run bot clients (written with Akka): ● Cheaper ● Faster to boot up ● Easy to update Each Lambda invocation could simulate up to 100 bots. Automated Load Testing
  • 93.
    from US-EAST (Lambda) toEU-WEST (game server)
  • 94.
    optimize for taillatencies from US-EAST (Lambda) to EU-WEST (game server)
  • 95.
    (in the future) ●Linux traffic control to simulate different network conditions ● Load test on every commit ● ML for TCP/UDP tuning? Automated Load Testing
  • 96.
  • 97.
  • 98.
  • 99.
    Performance in theWild(in poor networking condition) 95 percentile max playable RTT outperforms Photon on performance
  • 100.
    Performance in theWild(in poor networking condition) fewer disconnects
  • 101.
    Performance in theWild ● Improved KPIs - D1 retention, session time, etc. ● 14% cheaper vs. Photon, based on current cost projection
  • 102.
    We are hiring!:-) http://www.spaceapegames.com/careers talk to this guy for more detail => Alessandro Simi
  • 103.