1. The 7 Lessons for Highly Effective Real-time Server
G re a t Te c h n o l o g y F o r G re a t G a m e s
D K M o o n
dkmoon@ifunfactory.com
1
2. ✓ Worked on six MMORPG game servers at Nexon from 1999 thru 2005.
✓ Left it for a better opportunity in the States.
✓ Returned to it for no better place. (unfortunately…)
✓ Worked on mobile game server framework from 2011 thru 2013.
✓ Left again to start a business in the game server software industry.
Personal Relationship with Nexon
Great Technology For Great Games
2
3. ✓ Case studies on server-related issues and lessons from them.
✓ Reference game #1: PC MMORPG
• Peak CCU: 140K
• Peak CCU/Server: 15K
✓ Reference game #2: Mobile real-time MO game
• Being developed by Nexon/IFF and published by Tencent
• Passed 3 beta tests (CCU confidential)
About This Talk
Great Technology For Great Games
3
4. Game ServiceArchitecture (Data Plane)
Great Technology For Great Games
4
Game Client
Game Server
DB Server
Cache
Load balancer
/ Switch
5. Game ServiceArchitecture (Data Plane)
Great Technology For Great Games
5
Game Client
Game Server
DB Server
Cache
Presentation
Layer
Access
Layer
Logic/Application
Layer
Cache
Layer
Persistence
Layer
Load balancer
/ Switch
6. ✓ Case: saturated NIC & oversubscribed network
• In the past, network interface card (NIC) could be saturated.
• This does not happen any more, but core switch can be choked.
✓ Observation
• Real-time MMO: 200-300 Bps
• REST-based mobile: 700-800 Bps w/ spikes
✓ Suggestion
• Meter your traffic
Lesson #1
Know Your Traffic Pattern
Great Technology For Great Games
6
… …
…
Core switch
7. ✓ Case: saturated NIC & oversubscribed network
• In the past, network interface card (NIC) could be saturated.
• This does not happen any more, but core switch can be choked.
✓ Observation
• Real-time MMO: 200-300 Bps.
• REST-based mobile: 700-800 Bps w/ spikes.
✓ Suggestion
• Meter your traffic
Lesson #1
Know Your Traffic Pattern
Great Technology For Great Games
7
… …
…
Core switch
8. ✓ Case: saturated NIC & oversubscribed network
• In the past, network interface card (NIC) could be saturated.
• This does not happen any more, but core switch can be choked.
✓ Observation
• Real-time MMO: 200-300 Bps.
• REST-based mobile: 700-800 Bps w/ spikes.
✓ Suggestion
• Meter your traffic.
• Prefer binary message format.
• Also, correlate traffic to CPU usage.
Lesson #1
Know Your Traffic Pattern
Great Technology For Great Games
8
… …
…
Core switch
9. ✓ Case: Memory copy functions always at the top of profiling results.
✓ Reference CPU usage: 50-70%
• Too low figures: inefficient program concurrency.
• Too high figures: unnecessary memory copying, looping.
✓ Suggestion
• Pass around packets by pointer.
• Minimizing packet sizes also helps here, too.
Lesson #2
Avoid Copying Packets
Great Technology For Great Games
9
10. ✓ Case: Memory copy functions always at the top of profiling results.
✓ Target CPU usage: 50-70%
• Too low figures: inefficient program concurrency.
• Too high figures: unnecessary memory copying, looping.
✓ Suggestion
• Pass around packets by pointer.
• Minimizing packet sizes also helps here, too.
Lesson #2
Avoid Copying Packets
Great Technology For Great Games
10
11. ✓ Case: Memory copy functions always at the top of profiling results.
✓ Target CPU usage: 50-70%
• Too low figures: inefficient program concurrency.
• Too high figures: unnecessary memory copying, looping.
✓ Suggestion
• Pass around packets by pointer.
• Minimizing packet sizes also helps here, too.
Lesson #2
Avoid Copying Packets
Great Technology For Great Games
11
12. ✓ Case: Adopted lightweight byte-by-byte XOR encryption.
Never being hacked into encryption algorithm.
Instead, lots of packet replay attacks and client hack attempts.
✓ Observation
• Hackers do not bother reverse-engineering encryption algorithm.
• Instead, they hack into the client and let it do the encryption job.
✓ Suggestion
• Pick a lightweight encryption algorithm as long as it can prevent
packet forgery. (Complex algorithm uses up too much CPU.)
• More focus on preventing game client manipulation.
• Also, prepare for packet replay attacks.
Lesson #3
Focus on Client Obfuscation
Great Technology For Great Games
12
13. ✓ Case: Adopted lightweight byte-by-byte XOR encryption.
Never being hacked into encryption algorithm.
Instead, lots of packet replay attacks and client hack attempts.
✓ Observation
• Hackers do not bother reverse-engineering encryption algorithm.
• Instead, they hack into the client and let it do the encryption job.
✓ Suggestion
• Pick a lightweight encryption algorithm as long as it can prevent
packet forgery. (Complex algorithm uses up too much CPU.)
• More focus on preventing game client manipulation.
• Also, prepare for packet replay attacks.
Lesson #3
Focus on Client Obfuscation
Great Technology For Great Games
13
14. ✓ Case: Adopted lightweight byte-by-byte XOR encryption.
Never being hacked into encryption algorithm.
Instead, lots of packet replay attacks and client hack attempts.
✓ Observation
• Hackers do not bother reverse-engineering encryption algorithm.
• Instead, they hack into the client and let it do the encryption job.
✓ Suggestion
• Pick a lightweight encryption algorithm as long as it can prevent
packet forgery. (Complex algorithm uses up too much CPU.)
• More focus on preventing game client manipulation.
• Also, prepare for packet replay attacks.
Lesson #3
Focus on Client Obfuscation
Great Technology For Great Games
14
15. ✓ Case: Broadcasting hinders scalability in both CPU and network BW.
Refactored multiple times for visibility-based multicasting.
✓ Observation
• Along the memory copy, loop for broadcasting is the key reason for
high CPU usage.
• Such broadcasting triggers packet copies, too. (because different
players use different encryption seeds.)
✓ Suggestion
• Avoid broadcasting.
• Manage players list in a way of easy multicasting.
Lesson #4
Use Multicasting with Limited Scope
Great Technology For Great Games
15
16. ✓ Case: Broadcasting hinders scalability in both CPU and network BW.
Refactored multiple times for visibility-based multicasting.
✓ Observation
• Along the memory copy, loop for broadcasting is the key reason for
high CPU usage.
• Such broadcasting triggers packet copies, too. (because different
players use different encryption seeds.)
✓ Suggestion
• Avoid broadcasting.
• Manage players list in a way of easy multicasting.
Lesson #4
Use Multicasting with Limited Scope
Great Technology For Great Games
16
17. ✓ Case: Broadcasting hinders scalability in both CPU and network BW.
Refactored multiple times for visibility-based multicasting.
✓ Observation
• Along the memory copy, loop for broadcasting is the key reason for
high CPU usage.
• Such broadcasting triggers packet copies, too. (because different
players use different encryption seeds.)
✓ Suggestion
• Avoid broadcasting.
• Manage players list in a way of easy multicasting.
Lesson #4
Use Multicasting with Limited Scope
Great Technology For Great Games
17
18. ✓ Case: Heavily relied on DB transaction for inter-server synchronization
DB gets overloaded.
Servers gets serialized for DB I/O waiting.
✓ Observation
• Many programmers overuse DB xaction for synchronization.
• DB is heavyweight to guarantee properties like ACID, which means
they pay extremely high costs for synchronization.
✓ Suggestion
• Use inter-server RPC or memory cache for synchronization.
Lesson #5
Don’t Use DB as Synchronization Point
Great Technology For Great Games
18
19. ✓ Case: Heavily relied on DB transaction for inter-server synchronization
DB gets overloaded.
Servers gets serialized for DB I/O waiting.
✓ Observation
• Many programmers overuse DB xaction for synchronization.
• DB is heavyweight to guarantee properties like ACID, which means
they pay extremely high costs for synchronization.
✓ Suggestion
• Use inter-server RPC or memory cache for synchronization.
Lesson #5
Don’t Use DB as Synchronization Point
Great Technology For Great Games
19
20. ✓ Case: Heavily relied on DB transaction for inter-server synchronization
DB gets overloaded.
Servers gets serialized for DB I/O waiting.
✓ Observation
• Many programmers overuse DB xaction for synchronization.
• DB is heavyweight to guarantee properties like ACID, which means
they pay extremely high costs for synchronization.
✓ Suggestion
• Use inter-server RPC or memory cache for synchronization.
Lesson #5
Don’t Use DB as Synchronization Point
Great Technology For Great Games
20
21. ✓ Case: Intensive use of REDIS for data sharing among the servers.
REDIS quickly becomes a bottleneck.
✓ Observation
• Caching like REDIS is much lighter than DB for sure.
• But it also runs jobs to maintain persistency/consistency/
availability.
• Direct server-to-server RPC may be a better solution in some cases.
✓ Suggestion
• Mind the persistency/consistency/availability setting of cache prog.
• Consider server-to-server RPC unless caching is unavoidable.
Lesson #6
Caching is Not For Free
Great Technology For Great Games
21
22. ✓ Case: Intensive use of REDIS for data sharing among the servers.
REDIS quickly becomes a bottleneck.
✓ Observation
• Caching like REDIS is much lighter than DB for sure.
• But it also runs jobs to maintain persistency/consistency/
availability.
• Direct server-to-server RPC may be a better solution in some cases.
✓ Suggestion
• Mind the persistency/consistency/availability setting of cache prog.
• Consider server-to-server RPC unless caching is unavoidable.
Lesson #6
Caching is Not For Free
Great Technology For Great Games
22
23. ✓ Case: Intensive use of REDIS for data sharing among the servers.
REDIS quickly becomes a bottleneck.
✓ Observation
• Caching like REDIS is much lighter than DB for sure.
• But it also runs jobs to maintain persistency/consistency/
availability.
• Direct server-to-server RPC may be a better solution in some cases.
✓ Suggestion
• Mind the persistency/consistency/availability setting of cache prog.
• Consider server-to-server RPC unless caching is unavoidable.
Lesson #6
Caching is Not For Free
Great Technology For Great Games
23
24. ✓ Case: Opens a service without any monitoring / operation tools.
✓ Observation
• The day of service open is the most hectic
• Also, the service on the day is the most buggy.
• Tools to understand what’s happening inside server is a must have.
• operation tools are mandatory unless you don’t want to sleep.
✓ Suggestion
• Spend enough time developing tools.
Lesson #7
Start with Tools for Server Visibility
Great Technology For Great Games
24
25. ✓ Case: Opens a service without any monitoring / operation tools.
✓ Observation
• The day of service open is the most hectic.
• Also, the service on the day is the most buggy.
• Tools to understand what’s happening inside server is a must have.
• Operation tools are mandatory unless you don’t want to sleep.
✓ Suggestion
• Spend enough time developing tools.
Lesson #7
Start with Tools for Server Visibility
Great Technology For Great Games
25
26. ✓ Case: Opens a service without any monitoring / operation tools.
✓ Observation
• The day of service open is the most hectic.
• Also, the service on the day is the most buggy.
• Tools to understand what’s happening inside server is a must have.
• Operation tools are mandatory unless you don’t want to sleep.
✓ Suggestion
• Spend enough time developing tools.
Lesson #7
Start with Tools for Server Visibility
Great Technology For Great Games
26
27. ✓ Understand your traffic pattern and try to minimize it.
✓ Avoid packet copies inside server.
✓ Focus on client obfuscation instead of complex network encryption.
✓ Prefer multicasting to broadcasting. Especially, consider visibility.
✓ Avoid using DB as a synchronization point. It will collapse.
✓ Caching is not for free. Do not rely on it too much.
✓ Develop a proper tools for server visibility before opening a service.
Recap
Great Technology For Great Games
27
28. Survey Result for Fun
Great Technology For Great Games
28
✓ What OS for game server?
Count RateChoice
NA
No answer
29. Survey Result for Fun
Great Technology For Great Games
29
✓ What language for game server?
Choice Count Rate