Building fast, scalable game server in
node.js

Charlie Crane
@xiecc
Who am I

 NASDAQ: NTES
 Senior engineer, architect(8 years) in NetEase Inc. ,

developed many web, game products
 Recently created the open source game server framework in

node.js: pomelo
Top cities for npm modules
https://gist.github.com/substack/7373338
Agenda
 State of pomelo
 Scalability
 Framework
 Performance
 Realtime application
The state of pomelo

Fast, scalable, distributed game server
framework for node.js
Open sourced in 2012.11.20
Newest version: V0.7
https://github.com/NetEase/pomelo
Pomelo position
 Game server framework
 Mobile game
 Web game
 Social game
 MMO RPG(middle scale)

Realtime Multiple User Interaction
 Realtime application server framework
State of pomelo
 Pomelo is not a

single project
 Almost 40 repos in total
Framework ---
https://github.com/NetEase/pomelo
Success cases
Card Game — ShenMu

http://shenmu.iccgame.com/index.shtml
Success cases
Combat Strategy Game — Ancient songs

http://www.ixuanyou.com/
Success cases
Realtime Communication Tools —
ZhongQingBangBang

Q&A

http://wap.youth.cn/bangbang/
Success cases
 The magic card hunter https://itunes.apple.com/cn/app/id6649640688
 Beauty Fried golden flower http://app.xiaomi.com/detail/41461
 Texas Hold ’ em poker http://www.yiihua.com/
 Metal Slug http://www.17erzhan.com/ww2/ww2index.php
 Coal instant messaging http://www.mtjst.com
 Yong Le golden flower http://pan.baidu.com/s/1pCmfD
 17 Bullfight http://app.91.com/Soft/Detail.aspx?Platform=iPhone
…
Job huntings
Agenda
 The state of pomelo
 Scalability
 Framework
 Performance

 Realtime appliation
Demo first
http://pomelo.netease.com/lordofpomelo
Scalability---Web and Game server
 Web server

unlimited scalability
 Game server

World record: world of tanks(bigworld), 74,536 online users
MMORPG:7k-8k maximum

Why?
Why does game server not scale?
 Reason 1: Long connection
 Response time

Web response time: 2000ms
Game、realtime web: <100ms
 Message direction

web: request/reponse
game: request/push/broadcast, bi-direction
How to solve
 Node.js to rescue

Perfect for:
Network-insentive application
Real-time application
Holding connections
Mqtt: 120,000 connections, 1 process, 10K/connection
Socket.io: 25,000 connections, 1 process, 50K/connection
Why does game server not scale
 Reason 2: stateful
 Web app -- stateless
 No coordinates, no adjacency
 Partition randomly, stateless

 Game server -- stateful
 Have coordinate, adjacency
 Partition by area, stateful
How to solve
 No good solution in technical level
 Depending on game design
 Don’t make one area too crowded
 Balance of area
Too crowded area
Disperse player
Why does game server not scale
Reason 3: Broadcast

MESSAGES IN

1

MESSAGES OUT

1

1

1
Why does game server not scale

1

2

MESSAGES IN

MESSAGES OUT

4

2
2

1
Why does game server not scale
MESSAGES IN

MESSAGES OUT

4

1

16

4

4

4

1

1
4

1
Why does game server not scale
MESSAGES IN

MESSAGES OUT

100

1

10000

100

100

100

1

1
100

1
Why does game server not scale
MESSAGES IN

MESSAGES OUT

1000

1

1000

1000000

1000

1000

1

1
1000

1
1、How to solve --- broadcast
 AOI --- area of interested

module: pomelo-aoi
2、How to solve -- broadcast
 Split process, seperate load , frontend is stateless

broadcast

Frontend
(connector)

Single Process

Game logic

Backend
(area)
3、How to solve – broadcast
 Game design, don’t make one place(in area) too crowded
 Map size
 Character density
 Disperse born place
Disperse born place
4、How to solve -- pushScheduler
app.set(‘pushSchedulerConfig’, {
scheduler: new MessageScheduler(app);
});
 Direct:send message directly
 Buffer:

buffer message in array, and flush it every 50ms
 Other Strategies: flexible, depending on client number
Why does game server not scale
 Reason 4: Tick (game loop)
 setInterval(tick, 100)

 What does every tick do?
 Update every entity in the scene(disappear,move, revive)
 Refresh monster
 Driving ai logic(monster, player)

Tick must be far less than 100ms
Problem of tick
 The entity number should be limited
 Pay attention to update algorithm: AI etc.

 GC, full gc should be limited
 V8 is good at GC when memory is under 500M, even full GC
 Memory must be limited
 Try to divide process
At last--- runtime architecture
How to solve complexity
 Too … complicated?

solution:

framework
Agenda
 The state of pomelo
 Scalability
 Framework
 Performance
 Realtime application
Pomelo Framework
The essence of pomelo:
A distributed, scalable, realtime application
framework.
Framework --- design goal
 Abstract of servers(processes)
 Auto extend server types
 Auto extend servers

 Abstract of request/response and broadcast/push
 Zero config request
 Simple broadcast api

 Servers communication---rpc framework
Framework -- some features
 Server scalability

 Network
 Plugins
Server scalability
Server Scalability-- dynamic
extensions
 pomelo add host=[host] port=[port] id=[id]

serverType=[serverType]
 node app.js env=production host=[host]

port=[port] id=[id] serverType=[serverType]
Network
Pomelo
component
Loader

MQTTConnector

Mobile client

Connector

SIOConnector

HybridConnector

Socket.io client

Socket, websocket
client
Network--Message encode, decode
Network--original protobuf
 Original protobuf
Network--pomelo protobuf
 Our protobuf
Network -- Proto compress
 Config:
Network---protobuf definition
 Handler return json, auto encoded to protobuf
Network--pomelo protobuf
https://github.com/pomelonode/pomelo-protobuf
As an independent project, supporting unity3d, iOS, android, C
Network -- Pomelo Netflow – V0.2
 Version 0.2, 400 online users

Out: 3645Kb/S in: 270Kb/s
Network -- Pomelo Netflow – V0.3
 Version 0.3, 400 online users

Out: 925KB/S, in: 217KB/S, 25% Netflow of 0.2
Plugins
 A mechanism to extend framework

 Example:
 pomelo-status-plugin , online status plugin, based on redis
Plugins example -- Reliability
 pomelo-masterha-plugin, based on zookeeper, master HA
 Servers auto reconnect

Zookeeper

Master

Slave

Slave
Agenda
 State of pomelo
 Scalability
 Framework
 Performance

 Realtime application
Performance --- overview
 The performance varies largely in different games,

applications
 The variation
 Application type: Game or realtime application
 Game type: round or realtime fight, room or infinite
 Game design: Map size, character density, balance of areas
 Test parameters: Think time, test action
Performance --- reference data
 Online users per process
 next-gen: support 20,000 concurrent players in one area

 World record – not one area
 world of tanks(bigworld), 74,536 online users

 But in real world(MMO rpg)
 The real online data: maximum 800 concurrent users per area, 7,000

concurrent users per group game servers
Performance--Server configuration
Openstack virtual machine
Performance --- stress testing
 Use Case 1: fight
 Stress on single area(area 3), increasing step by step,

login every 2 seconds
 Game logic: attack monstors or other players every

2~5 seconds
Performance --- stress testing
Performance--result
 486 onlines
Performance -- top
 Using toobusy, limit cpu 80%
Performance – stress test
 Use case 2 – move
 Stress on single area(area 3), increasing step by step,

login every 2 seconds
 Game logic: move in the map every 2~5 seconds
Performance – stress test
 800 online users
Performance --- stress testing
 Use Case 3: fight & move
 Stress on single area(area 3), increasing step by step,

login every 2 seconds
 Game logic: 50% attack monstors or other players

every 2~5 seconds, 50% move around
 Result: 558 onlines in one area
Agenda
 State of pomelo
 Scalability
 Framework
 Performance

 Realtime application
Realtime app and game
 Many success stories in realtime application
 Message push platform in NetEase
 ZhongQinBangBang

 Not that frequent messages as game
 Do do need that realtime (less than 50ms), we can use

different pushScheduler strategy
 Much more scalable than game
Message push platform
Supporting more than 10,000,000 online users
 Products using message push platform
 love.163.com
 music.163.com

 news.163.com
 note.youdao.com
 game.163.com
 yuehui.163.com

 yuedu.163.com
Message push platform
AMQP
Receiver
Receiver

publish

Receiver
Receiver

Redis,
Device

APNS publisher
APNS publisher

Channel push, socket

Connector

Connector

Connector

LVS
socket.io

Web client

mqtt
Android
client

mqtt

iOS client
Message push systems
 High load, More than 10,000,000 online users

 Realtime, broadcast to 5 million users in 1minute
 Reliability, qos=1 in any conditions
 Save power and network

 Support all the clients
 iOS(APNS, mqtt)
 android(mqtt)
 browser(socket.io)

 windows desktop application(mqtt)
Q&A
@xiecc
https://github.com/NetEase/pomelo

Building fast,scalable game server in node.js

  • 1.
    Building fast, scalablegame server in node.js Charlie Crane @xiecc
  • 2.
    Who am I NASDAQ: NTES  Senior engineer, architect(8 years) in NetEase Inc. , developed many web, game products  Recently created the open source game server framework in node.js: pomelo
  • 3.
    Top cities fornpm modules https://gist.github.com/substack/7373338
  • 4.
    Agenda  State ofpomelo  Scalability  Framework  Performance  Realtime application
  • 5.
    The state ofpomelo Fast, scalable, distributed game server framework for node.js Open sourced in 2012.11.20 Newest version: V0.7 https://github.com/NetEase/pomelo
  • 6.
    Pomelo position  Gameserver framework  Mobile game  Web game  Social game  MMO RPG(middle scale) Realtime Multiple User Interaction  Realtime application server framework
  • 7.
    State of pomelo Pomelo is not a single project  Almost 40 repos in total
  • 8.
  • 9.
  • 10.
    Success cases Card Game— ShenMu http://shenmu.iccgame.com/index.shtml
  • 11.
    Success cases Combat StrategyGame — Ancient songs http://www.ixuanyou.com/
  • 12.
    Success cases Realtime CommunicationTools — ZhongQingBangBang Q&A http://wap.youth.cn/bangbang/
  • 13.
    Success cases  Themagic card hunter https://itunes.apple.com/cn/app/id6649640688  Beauty Fried golden flower http://app.xiaomi.com/detail/41461  Texas Hold ’ em poker http://www.yiihua.com/  Metal Slug http://www.17erzhan.com/ww2/ww2index.php  Coal instant messaging http://www.mtjst.com  Yong Le golden flower http://pan.baidu.com/s/1pCmfD  17 Bullfight http://app.91.com/Soft/Detail.aspx?Platform=iPhone …
  • 14.
  • 15.
    Agenda  The stateof pomelo  Scalability  Framework  Performance  Realtime appliation
  • 16.
  • 17.
    Scalability---Web and Gameserver  Web server unlimited scalability  Game server World record: world of tanks(bigworld), 74,536 online users MMORPG:7k-8k maximum Why?
  • 18.
    Why does gameserver not scale?  Reason 1: Long connection  Response time Web response time: 2000ms Game、realtime web: <100ms  Message direction web: request/reponse game: request/push/broadcast, bi-direction
  • 19.
    How to solve Node.js to rescue Perfect for: Network-insentive application Real-time application Holding connections Mqtt: 120,000 connections, 1 process, 10K/connection Socket.io: 25,000 connections, 1 process, 50K/connection
  • 20.
    Why does gameserver not scale  Reason 2: stateful  Web app -- stateless  No coordinates, no adjacency  Partition randomly, stateless  Game server -- stateful  Have coordinate, adjacency  Partition by area, stateful
  • 21.
    How to solve No good solution in technical level  Depending on game design  Don’t make one area too crowded  Balance of area
  • 22.
  • 23.
  • 24.
    Why does gameserver not scale Reason 3: Broadcast MESSAGES IN 1 MESSAGES OUT 1 1 1
  • 25.
    Why does gameserver not scale 1 2 MESSAGES IN MESSAGES OUT 4 2 2 1
  • 26.
    Why does gameserver not scale MESSAGES IN MESSAGES OUT 4 1 16 4 4 4 1 1 4 1
  • 27.
    Why does gameserver not scale MESSAGES IN MESSAGES OUT 100 1 10000 100 100 100 1 1 100 1
  • 28.
    Why does gameserver not scale MESSAGES IN MESSAGES OUT 1000 1 1000 1000000 1000 1000 1 1 1000 1
  • 29.
    1、How to solve--- broadcast  AOI --- area of interested module: pomelo-aoi
  • 30.
    2、How to solve-- broadcast  Split process, seperate load , frontend is stateless broadcast Frontend (connector) Single Process Game logic Backend (area)
  • 31.
    3、How to solve– broadcast  Game design, don’t make one place(in area) too crowded  Map size  Character density  Disperse born place
  • 32.
  • 33.
    4、How to solve-- pushScheduler app.set(‘pushSchedulerConfig’, { scheduler: new MessageScheduler(app); });  Direct:send message directly  Buffer: buffer message in array, and flush it every 50ms  Other Strategies: flexible, depending on client number
  • 34.
    Why does gameserver not scale  Reason 4: Tick (game loop)  setInterval(tick, 100)  What does every tick do?  Update every entity in the scene(disappear,move, revive)  Refresh monster  Driving ai logic(monster, player) Tick must be far less than 100ms
  • 35.
    Problem of tick The entity number should be limited  Pay attention to update algorithm: AI etc.  GC, full gc should be limited  V8 is good at GC when memory is under 500M, even full GC  Memory must be limited  Try to divide process
  • 36.
    At last--- runtimearchitecture
  • 37.
    How to solvecomplexity  Too … complicated? solution: framework
  • 38.
    Agenda  The stateof pomelo  Scalability  Framework  Performance  Realtime application
  • 39.
    Pomelo Framework The essenceof pomelo: A distributed, scalable, realtime application framework.
  • 40.
    Framework --- designgoal  Abstract of servers(processes)  Auto extend server types  Auto extend servers  Abstract of request/response and broadcast/push  Zero config request  Simple broadcast api  Servers communication---rpc framework
  • 41.
    Framework -- somefeatures  Server scalability  Network  Plugins
  • 42.
  • 43.
    Server Scalability-- dynamic extensions pomelo add host=[host] port=[port] id=[id] serverType=[serverType]  node app.js env=production host=[host] port=[port] id=[id] serverType=[serverType]
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
    Network -- Protocompress  Config:
  • 49.
    Network---protobuf definition  Handlerreturn json, auto encoded to protobuf
  • 50.
    Network--pomelo protobuf https://github.com/pomelonode/pomelo-protobuf As anindependent project, supporting unity3d, iOS, android, C
  • 51.
    Network -- PomeloNetflow – V0.2  Version 0.2, 400 online users Out: 3645Kb/S in: 270Kb/s
  • 52.
    Network -- PomeloNetflow – V0.3  Version 0.3, 400 online users Out: 925KB/S, in: 217KB/S, 25% Netflow of 0.2
  • 53.
    Plugins  A mechanismto extend framework  Example:  pomelo-status-plugin , online status plugin, based on redis
  • 54.
    Plugins example --Reliability  pomelo-masterha-plugin, based on zookeeper, master HA  Servers auto reconnect Zookeeper Master Slave Slave
  • 55.
    Agenda  State ofpomelo  Scalability  Framework  Performance  Realtime application
  • 56.
    Performance --- overview The performance varies largely in different games, applications  The variation  Application type: Game or realtime application  Game type: round or realtime fight, room or infinite  Game design: Map size, character density, balance of areas  Test parameters: Think time, test action
  • 57.
    Performance --- referencedata  Online users per process  next-gen: support 20,000 concurrent players in one area  World record – not one area  world of tanks(bigworld), 74,536 online users  But in real world(MMO rpg)  The real online data: maximum 800 concurrent users per area, 7,000 concurrent users per group game servers
  • 58.
  • 59.
    Performance --- stresstesting  Use Case 1: fight  Stress on single area(area 3), increasing step by step, login every 2 seconds  Game logic: attack monstors or other players every 2~5 seconds
  • 60.
  • 61.
  • 62.
    Performance -- top Using toobusy, limit cpu 80%
  • 63.
    Performance – stresstest  Use case 2 – move  Stress on single area(area 3), increasing step by step, login every 2 seconds  Game logic: move in the map every 2~5 seconds
  • 64.
    Performance – stresstest  800 online users
  • 65.
    Performance --- stresstesting  Use Case 3: fight & move  Stress on single area(area 3), increasing step by step, login every 2 seconds  Game logic: 50% attack monstors or other players every 2~5 seconds, 50% move around  Result: 558 onlines in one area
  • 66.
    Agenda  State ofpomelo  Scalability  Framework  Performance  Realtime application
  • 67.
    Realtime app andgame  Many success stories in realtime application  Message push platform in NetEase  ZhongQinBangBang  Not that frequent messages as game  Do do need that realtime (less than 50ms), we can use different pushScheduler strategy  Much more scalable than game
  • 68.
    Message push platform Supportingmore than 10,000,000 online users  Products using message push platform  love.163.com  music.163.com  news.163.com  note.youdao.com  game.163.com  yuehui.163.com  yuedu.163.com
  • 69.
    Message push platform AMQP Receiver Receiver publish Receiver Receiver Redis, Device APNSpublisher APNS publisher Channel push, socket Connector Connector Connector LVS socket.io Web client mqtt Android client mqtt iOS client
  • 70.
    Message push systems High load, More than 10,000,000 online users  Realtime, broadcast to 5 million users in 1minute  Reliability, qos=1 in any conditions  Save power and network  Support all the clients  iOS(APNS, mqtt)  android(mqtt)  browser(socket.io)  windows desktop application(mqtt)
  • 71.