Data Storage Solutions
            for SNS game
Dinh Nguyen Anh Dung – P2S – G6 – VNG
CONTENT
• SNS games and SQL-based databases
• NoSQL technology and Couchbase
• NoSQL does not come without challenges
• SNS Storage Engine (SSE)
SNS games AND SQL-based databases
SNS games characteristics
• Huge amount of concurrent requests but
  require low response time
• Accounts can be stored separately
  – No need for centralized storage
  – In most cases, no need to put strict constrains on
    data relationship
Native limitations of SQL-based DBMS
• Centralized fundamentally
  – Vertical scale up issue
• Schema
  – High risk (and cost) for updates
• Normalized data
  – Unnecessary overhead: join tables, locking, data
    constrain check,…
Native limitations of SQL-based DBMS




                          Source : NoSQL - WhitePaper
Native limitations of SQL-based DBMS
• SQL processing overhead at both DBMS and
  client side.
• Most data accesses end up at hard-disk
  – Very challenging to meet low response time
  – Internal caching does not help much
• Hard to distributed data across multiple-
  servers
NoSQL technology and Couchbase
NoSQL technology
• Persistent distributed hash-table
• Active set resides on RAM
  – Extremely fast response time
• Horizontal scale up
• Raw and direct data access
  – set, get, add, inc, dec : no overhead
NoSQL technology
                     Key               Value
                     Jack.Gold         50123
                     Jack.Exp          4670
                     Jack.Coin         700
                     Peter.Gold        7050
                     Peter.Exp         20005
                     Peter.Coin        1


Key          Value         Key             Value   Key          Value
Peter.Gold   7050          Jack.Gold       50123   Peter.Coin   1
Jack.Exp     4670                                  Jack.Coin    700
Peter.Exp    20005
Active set on RAM
                     CLIENT




                ACTIVE SET ON RAM


                          Lazy write



                    HDD
Couchbase server
• Based on membase technology
• Distributed
• Replica
• Since 1.8, have native client for PHP
• Bucket types
  – Couchbase (persistent)
  – Memcache (memory only)
NoSQL does not come without challenges
Our first SNS game with Couchbase
Architecture and design issues
• Transition from relational database design to
  key-value design
  – Account data => keys : how ?
• Only minimum support for locking,
  concurrency control
  – add : failed if exists - mutex
  – cas : read get cas, write failed if cas is out-dated
Architecture and design issues
• No transaction support
  – Data corruption becomes so easy!
• No high-level data support (e.g. list,queue,…)
• No tools for raw data viewing / editing
Pitfalls
• Too much freedom for developers
  – Anyone can add / modify any key any time
• Epic key design mindset
  – One key for all : bad performance, concurrency
    control is a true night mare
• Abuse the power of set
  – Never fail ! Developer LOVE it !
SSE – SNS Storage Engine
Our second SNS game with Couchbase
What is SSE ?
• A thin “layer” between developers and the
  all-mighty Couchbase
  – SSE is simply a PHP library
• Provide better support for locking and
  concurrency control
  – Basic support for : Begin – update - commit
• Provide high-level data structures
  – Collection, queue, stack, integer (gold), inc-only
    integer (exp), binary flags (quest)…
What is SSE ?
• Minimize the risk of weak concurrency support
  – Ability to rollback pending writes
• Schema
  – Limit freedom of developers!
  – No more nightmare for backup and raw data
    view/editing
• Buffers to eliminate repeated read / writes
Raw account view / editing tool
What is SSE ?
What is SSE ?
Multi-instance architecture
• Replica is too costly to performance
• One node failed means cluster failed
• Adding nodes requires rebalance
  – Only good when having clusters with large
    number of nodes (more than 20 nodes)
Multi-instance architecture
• One instance for index (user-to-instance
  mapping)
  – Use APC on logic servers to cache / reduce load
    to index instance
• Many instances of data
  – Dynamically adjust weight on each instance base
    on average load of instance
  – Node failure only affects part of the user-base
Multi-instance architecture

       Game Logic   Game Logic      Game Logic    Game Logic


              APC      APC             APC            APC




    Index                 Data          Data         Data
   Instance            Instance 1    Instance 2   Instance 3
Disavantages
• Lower performance of multi-get
• Not well balance between instances in terms
  of accesses
How good is SSE for us ?
• No more data loss due to concurrency
• No more data corruption
• No mysterious bugs due to un-intended
  writes
• Reduce more than 3 times workload of server
  developers
Data storage solutions for SNS game

Data storage solutions for SNS game

  • 1.
    Data Storage Solutions for SNS game Dinh Nguyen Anh Dung – P2S – G6 – VNG
  • 2.
    CONTENT • SNS gamesand SQL-based databases • NoSQL technology and Couchbase • NoSQL does not come without challenges • SNS Storage Engine (SSE)
  • 3.
    SNS games ANDSQL-based databases
  • 4.
    SNS games characteristics •Huge amount of concurrent requests but require low response time • Accounts can be stored separately – No need for centralized storage – In most cases, no need to put strict constrains on data relationship
  • 5.
    Native limitations ofSQL-based DBMS • Centralized fundamentally – Vertical scale up issue • Schema – High risk (and cost) for updates • Normalized data – Unnecessary overhead: join tables, locking, data constrain check,…
  • 6.
    Native limitations ofSQL-based DBMS Source : NoSQL - WhitePaper
  • 7.
    Native limitations ofSQL-based DBMS • SQL processing overhead at both DBMS and client side. • Most data accesses end up at hard-disk – Very challenging to meet low response time – Internal caching does not help much • Hard to distributed data across multiple- servers
  • 8.
  • 9.
    NoSQL technology • Persistentdistributed hash-table • Active set resides on RAM – Extremely fast response time • Horizontal scale up • Raw and direct data access – set, get, add, inc, dec : no overhead
  • 10.
    NoSQL technology Key Value Jack.Gold 50123 Jack.Exp 4670 Jack.Coin 700 Peter.Gold 7050 Peter.Exp 20005 Peter.Coin 1 Key Value Key Value Key Value Peter.Gold 7050 Jack.Gold 50123 Peter.Coin 1 Jack.Exp 4670 Jack.Coin 700 Peter.Exp 20005
  • 11.
    Active set onRAM CLIENT ACTIVE SET ON RAM Lazy write HDD
  • 12.
    Couchbase server • Basedon membase technology • Distributed • Replica • Since 1.8, have native client for PHP • Bucket types – Couchbase (persistent) – Memcache (memory only)
  • 13.
    NoSQL does notcome without challenges
  • 14.
    Our first SNSgame with Couchbase
  • 15.
    Architecture and designissues • Transition from relational database design to key-value design – Account data => keys : how ? • Only minimum support for locking, concurrency control – add : failed if exists - mutex – cas : read get cas, write failed if cas is out-dated
  • 16.
    Architecture and designissues • No transaction support – Data corruption becomes so easy! • No high-level data support (e.g. list,queue,…) • No tools for raw data viewing / editing
  • 17.
    Pitfalls • Too muchfreedom for developers – Anyone can add / modify any key any time • Epic key design mindset – One key for all : bad performance, concurrency control is a true night mare • Abuse the power of set – Never fail ! Developer LOVE it !
  • 18.
    SSE – SNSStorage Engine
  • 19.
    Our second SNSgame with Couchbase
  • 20.
    What is SSE? • A thin “layer” between developers and the all-mighty Couchbase – SSE is simply a PHP library • Provide better support for locking and concurrency control – Basic support for : Begin – update - commit • Provide high-level data structures – Collection, queue, stack, integer (gold), inc-only integer (exp), binary flags (quest)…
  • 21.
    What is SSE? • Minimize the risk of weak concurrency support – Ability to rollback pending writes • Schema – Limit freedom of developers! – No more nightmare for backup and raw data view/editing • Buffers to eliminate repeated read / writes
  • 22.
    Raw account view/ editing tool
  • 23.
  • 24.
  • 25.
    Multi-instance architecture • Replicais too costly to performance • One node failed means cluster failed • Adding nodes requires rebalance – Only good when having clusters with large number of nodes (more than 20 nodes)
  • 26.
    Multi-instance architecture • Oneinstance for index (user-to-instance mapping) – Use APC on logic servers to cache / reduce load to index instance • Many instances of data – Dynamically adjust weight on each instance base on average load of instance – Node failure only affects part of the user-base
  • 27.
    Multi-instance architecture Game Logic Game Logic Game Logic Game Logic APC APC APC APC Index Data Data Data Instance Instance 1 Instance 2 Instance 3
  • 28.
    Disavantages • Lower performanceof multi-get • Not well balance between instances in terms of accesses
  • 29.
    How good isSSE for us ? • No more data loss due to concurrency • No more data corruption • No mysterious bugs due to un-intended writes • Reduce more than 3 times workload of server developers