DragonCraft
Architectural Overview
       Freeverse Inc.
Jesse Sanford, Joshua Kehn
Freeverse / ngmoco:) / DeNA
â—Ź Web guys brought in to design RESTful
  HTTP based games for handheld clients.
â—Ź Platform concurrently being built by Ngmoco
  team out in San Francisco.
â—Ź First games in company's history built
  entirely on
   â—‹ EC2
   â—‹ Node.js
   â—‹ MongoDB
â—Ź There is a lot of firsts here!
Why Node.js?
â—Ź Already using javascript. Knowledge share!
â—Ź Fast growing ecosystem.
â—Ź Reasonable to bring libraries from client to
  server and vice versa.
â—Ź Lots of javascript patterns and best practices
  to follow.
â—Ź Growing talent pool.
Why MongoDB?
â—Ź Ever changing schemas make document
  stores attractive.
â—Ź Easy path to horizontal scalability.
â—Ź 10gen is very easy to work with.
â—Ź Lots of best practice patterns for running on
  ec2 infrastructure.
â—Ź Javascript is a friendly interface.
Handling the change.
â—Ź Lots of patience.
â—Ź Many proof of concepts.
â—Ź Dedicated poc's for different puzzle pieces.
  (Platform services, Game Libraries)
â—Ź Developer training and evangelists.
â—Ź Performance testing and open source library
  vetting.
â—Ź Lots of patience. Seriously.
Building from scratch.
â—Ź   Lots of testing.
â—Ź   Pre-flight environment for content.
â—Ź   Duplicate of production for release staging.
â—Ź   Full stack developer sandboxes on every
    workstation.
    â—‹ Individual MongoDB and Node.js instances running.
    â—‹ Full client stack available as a browser based
      handheld simulator for client interface.
"Physical" Infrastructure
â—Ź EC2 fabric managed by Rightscale
â—Ź Extensive library of "Rightscripts" and
  "Server templates".
â—Ź Different deployments for each environment.
â—Ź Deployments a mix of single service
  machines and arrays of machines.
â—Ź Arrays load balanced by HA proxy not ELB's
â—Ź Mongo clusters are largest expense.
Logical Diagram
Physical/Logical Diagram
Mongo Infrastructure
â—Ź Mongo cluster per environment.
â—Ź 3 config nodes split between 2 availability
  zones.
â—Ź Currently only 1 shard.
â—Ź 3 db nodes split between 2 availability
â—Ź mongos processes running directly on app
  servers.
Mongo Infrastructure cont.
â—Ź Config nodes on t1-micros.
â—Ź DB nodes on m1-xlarges.
â—Ź DB nodes running raid 10 on ebs.
â—Ź XFS with LVM.
â—Ź Snapshots taken after forcing fsync and lock
  on db and then XFS freeze.
â—Ź Backups always done on secondary.
Shrinking Mongo
â—Ź Staging and testing environments too costly.
â—Ź Logically the application knows no
  MongoD/S differences.
â—Ź Still single shard.
â—Ź Spinning instances is quick.
â—Ź Only used for smoke testing at the end of
  every dev cycle.
â—Ź Moving to single master -> slave replication.
â—Ź Cost savings of 60% in these environments.
Other Services
â—Ź HA-proxy 2 m1-small
â—Ź Memcached - 1 m1-large
â—Ź PHP+Apache (cms), Flume/Syslog - 1 m1-
  large
â—Ź Ejabberd - 1 m1-large
â—Ź Beanstalkd - 1 m1-large
â—Ź Nodejs - (currently 3) c1-xlarge
Log4js-syslog, Flume
â—Ź Centralized logging from all application
  servers in the cluster.
â—Ź Configurable log levels at both the
  application layer and filters on the stream
  after that.
â—Ź Flume speaks syslog fluently
â—Ź Flume allows us to point the firehose
  wherever we want.
â—Ź It's trivial to ingest the Flume ouput from s3
  into Hadoop/Elastic Map Reduce
Daida, Beanstalkd
â—Ź Needed fast worker queue for push
  messaging and out-of-band computation.
â—Ź Considered Redis and Resque
â—Ź Considered RabbitMQ/AMPQ
â—Ź Beanstalkd was built for work queues.
â—Ź Beanstalkd is very simple.
â—Ź No real support for HA
â—Ź Workers needed to be written in javascript.
â—Ź No upfront knowledge about the runtime
  activities of workers.
Daida, Beanstalkd cont.
â—Ź Developers define jobs (payload contains
  variables needed for job to execute)
â—Ź Developers schedule jobs.
â—Ź Developers create "strategies" which know
  how to execute the jobs.
â—Ź At runtime using some functional magic
  Daida closes the developer defined strategy
  around the payload variables that came with
  the job.
â—Ź This is somewhat similar to the job being run
  by a worker inside a container with a
Daida handler example.
 var handlers = {
    bar: function(data, cb) {
       var callback = cb || function() { /* noOp */ }; //if callback wasn't passed
       console.log('test job passed data: ' + JSON.stringify(data));
       callback(); //always make sure to callback!!!!
    },

    foo: function(data, cb) {
       var callback = cb || function() { /* noOp */ };
       console.log('foo job passed name'+ data.name);
       callback(); //again never forget to callback!!!
    },
  };
  exports.handlers = handlers;

  exports.bar = handlers.bar;
  exports.foo = handlers.foo;

//taken from https://github.com/ngmoco/daida.js
Ejabberd
â—Ź Best multi-user-chat solution for the money.
â—Ź Considered IRC and other more custom
  solutions.
â—Ź Javascript handhelds can use javascript chat
  client libraries!
â—Ź Capable of being run over plain HTTP.
  (Comet/long-poll/BOSH)
â—Ź Widely used.
â—Ź Fine grained control over users and rooms.
â—Ź A little complex for our needs.
â—Ź Erlang/OTP is solid.
Other Nodejs libraries
â—Ź   Connect
â—Ź   Express
â—Ź   mongoose
â—Ź   oauth
â—Ź   connect-auth
â—Ź   connect-cookie-session
Megaphone load tester
â—Ź Written in erlang/otp to make use of it's
  lightweight processes and distributed nature.
â—Ź SSL Capable HTTP Reverse proxy.
â—Ź Records sessions from handhelds.
â—Ź Proxy is transparent and handhelds are
  stupid.
â—Ź Choose which sessions to replay.
â—Ź Write small scripts to manipulate req/resp
  during replay. OAuth handshakes?
â—Ź Interact with replay in console.
â—Ź Record results of replay.
Megaphone load tester cont.
â—Ź Replay in bulk! (Load test).
â—Ź Centralized console can spawn http replay
  processes on many headless machines.
  Similar to headless Jmeter.
â—Ź A single session (some number of individual
  requests) is sent to the client process when
  spawned
â—Ź Responses are sent back to the centralized
  databases as clients receive them.
â—Ź The same session can be sent to multiple
  clients and played back concurrently.
%% This module contains the functions to manipulate req/resp for the dcraft_session1 playback
-module(dcraft_session1).
                                                                                                   EX. Session handler
-include("blt_otp.hrl").                                                                           script for
-export([ create_request/1, create_request/2, create_request/3]).
-record(request, { url, verb, body_vars}).                                                         manipulating
-record(response, {request_number, response_obj}).
create_request(Request) -> create_request(Request, []).
                                                                                                   requests at runtime
create_request(Request, Responses) -> create_request(Request, Responses, 0).
create_request(#request{url="http://127.0.0.1:8080/1.2.1/dragoncraft/player/sanford/mission/"++OldMissionId} = Request, Responses, RequestNumber) ->
         ?DEBUG_MSG("~p Request for wall Found!~n", [?MODULE]),
         [LastResponseRecord|RestResponses] = Responses,
         {{_HttpVer, _ResponseCode, _ResponseDesc}, _Headers, ResponseBodyRaw} = LastResponseRecord#response.response_obj,
         {ok, ResponseBodyObj} = json:decode(ResponseBodyRaw),
         ResponseKVs = element(1, ResponseBodyObj),
         [Response_KV1 | [Response_KV2 | Response_KV_Rest ]] = ResponseKVs,
         Response_KV2_Key = element(1, Response_KV2),
         Response_KV2_Val = element(2, Response_KV2),
         ResponseDataObj = element(1, Response_KV2_Val),
         [ResponseDataKV | ResponseDataKVRest ] = ResponseDataObj,
         ResponseData_KV_Key = element(1, ResponseDataKV), %<<"identifier">>
         ResponseData_KV_Val = element(2, ResponseDataKV),
         MissionId = binary_to_list(ResponseData_KV_Val),
         Replaced = re:replace(Request#request.url, OldMissionId, MissionId++"/wall"),
         [ReHead|ReRest] = Replaced,
         [ReTail] = ReRest,
         ?DEBUG_MSG("~p replaced head is ~p and tail ~p ~n", [?MODULE, ReHead, ReTail]),
         NewUrl = binary_to_list(ReHead)++binary_to_list(ReTail),
         NewRequest = Request#request{url=NewUrl};
create_request(Request, Responses, RequestNumber) -> Request.
Other notables
â—Ź Recently started using python's fabric library
  for rolling releases.
â—Ź Node cluster for multiprocess node.
â—Ź Node ipc with linux signals to raise and lower
  logging levels and content updates.
Screenshots
Demo
Links
â—Ź   http://dragoncraftthegame.com/
â—Ź   http://freeverse.com/
â—Ź   http://blog.ngmoco.com/
â—Ź   https://developer.mobage.com/
â—Ź   http://dena.jp/intl/
Repos
â—Ź https://github.com/ngmoco/daida.js.git
â—Ź https://github.com/ngmoco/daida-beanstalk.
  git
â—Ź https://github.com/ngmoco/daida-local.git
â—Ź https://github.com/ngmoco/Megaphone
  (coming very soon)

Dragoncraft Architectural Overview

  • 1.
  • 3.
    Jesse Sanford, JoshuaKehn Freeverse / ngmoco:) / DeNA â—Ź Web guys brought in to design RESTful HTTP based games for handheld clients. â—Ź Platform concurrently being built by Ngmoco team out in San Francisco. â—Ź First games in company's history built entirely on â—‹ EC2 â—‹ Node.js â—‹ MongoDB â—Ź There is a lot of firsts here!
  • 4.
    Why Node.js? â—Ź Alreadyusing javascript. Knowledge share! â—Ź Fast growing ecosystem. â—Ź Reasonable to bring libraries from client to server and vice versa. â—Ź Lots of javascript patterns and best practices to follow. â—Ź Growing talent pool.
  • 5.
    Why MongoDB? â—Ź Everchanging schemas make document stores attractive. â—Ź Easy path to horizontal scalability. â—Ź 10gen is very easy to work with. â—Ź Lots of best practice patterns for running on ec2 infrastructure. â—Ź Javascript is a friendly interface.
  • 6.
    Handling the change. â—ŹLots of patience. â—Ź Many proof of concepts. â—Ź Dedicated poc's for different puzzle pieces. (Platform services, Game Libraries) â—Ź Developer training and evangelists. â—Ź Performance testing and open source library vetting. â—Ź Lots of patience. Seriously.
  • 7.
    Building from scratch. â—Ź Lots of testing. â—Ź Pre-flight environment for content. â—Ź Duplicate of production for release staging. â—Ź Full stack developer sandboxes on every workstation. â—‹ Individual MongoDB and Node.js instances running. â—‹ Full client stack available as a browser based handheld simulator for client interface.
  • 8.
    "Physical" Infrastructure â—Ź EC2fabric managed by Rightscale â—Ź Extensive library of "Rightscripts" and "Server templates". â—Ź Different deployments for each environment. â—Ź Deployments a mix of single service machines and arrays of machines. â—Ź Arrays load balanced by HA proxy not ELB's â—Ź Mongo clusters are largest expense.
  • 9.
  • 10.
  • 11.
    Mongo Infrastructure â—Ź Mongocluster per environment. â—Ź 3 config nodes split between 2 availability zones. â—Ź Currently only 1 shard. â—Ź 3 db nodes split between 2 availability â—Ź mongos processes running directly on app servers.
  • 12.
    Mongo Infrastructure cont. â—ŹConfig nodes on t1-micros. â—Ź DB nodes on m1-xlarges. â—Ź DB nodes running raid 10 on ebs. â—Ź XFS with LVM. â—Ź Snapshots taken after forcing fsync and lock on db and then XFS freeze. â—Ź Backups always done on secondary.
  • 13.
    Shrinking Mongo â—Ź Stagingand testing environments too costly. â—Ź Logically the application knows no MongoD/S differences. â—Ź Still single shard. â—Ź Spinning instances is quick. â—Ź Only used for smoke testing at the end of every dev cycle. â—Ź Moving to single master -> slave replication. â—Ź Cost savings of 60% in these environments.
  • 14.
    Other Services â—Ź HA-proxy2 m1-small â—Ź Memcached - 1 m1-large â—Ź PHP+Apache (cms), Flume/Syslog - 1 m1- large â—Ź Ejabberd - 1 m1-large â—Ź Beanstalkd - 1 m1-large â—Ź Nodejs - (currently 3) c1-xlarge
  • 15.
    Log4js-syslog, Flume â—Ź Centralizedlogging from all application servers in the cluster. â—Ź Configurable log levels at both the application layer and filters on the stream after that. â—Ź Flume speaks syslog fluently â—Ź Flume allows us to point the firehose wherever we want. â—Ź It's trivial to ingest the Flume ouput from s3 into Hadoop/Elastic Map Reduce
  • 16.
    Daida, Beanstalkd â—Ź Neededfast worker queue for push messaging and out-of-band computation. â—Ź Considered Redis and Resque â—Ź Considered RabbitMQ/AMPQ â—Ź Beanstalkd was built for work queues. â—Ź Beanstalkd is very simple. â—Ź No real support for HA â—Ź Workers needed to be written in javascript. â—Ź No upfront knowledge about the runtime activities of workers.
  • 17.
    Daida, Beanstalkd cont. â—ŹDevelopers define jobs (payload contains variables needed for job to execute) â—Ź Developers schedule jobs. â—Ź Developers create "strategies" which know how to execute the jobs. â—Ź At runtime using some functional magic Daida closes the developer defined strategy around the payload variables that came with the job. â—Ź This is somewhat similar to the job being run by a worker inside a container with a
  • 18.
    Daida handler example. var handlers = { bar: function(data, cb) { var callback = cb || function() { /* noOp */ }; //if callback wasn't passed console.log('test job passed data: ' + JSON.stringify(data)); callback(); //always make sure to callback!!!! }, foo: function(data, cb) { var callback = cb || function() { /* noOp */ }; console.log('foo job passed name'+ data.name); callback(); //again never forget to callback!!! }, }; exports.handlers = handlers; exports.bar = handlers.bar; exports.foo = handlers.foo; //taken from https://github.com/ngmoco/daida.js
  • 19.
    Ejabberd â—Ź Best multi-user-chatsolution for the money. â—Ź Considered IRC and other more custom solutions. â—Ź Javascript handhelds can use javascript chat client libraries! â—Ź Capable of being run over plain HTTP. (Comet/long-poll/BOSH) â—Ź Widely used. â—Ź Fine grained control over users and rooms. â—Ź A little complex for our needs. â—Ź Erlang/OTP is solid.
  • 20.
    Other Nodejs libraries â—Ź Connect â—Ź Express â—Ź mongoose â—Ź oauth â—Ź connect-auth â—Ź connect-cookie-session
  • 21.
    Megaphone load tester â—ŹWritten in erlang/otp to make use of it's lightweight processes and distributed nature. â—Ź SSL Capable HTTP Reverse proxy. â—Ź Records sessions from handhelds. â—Ź Proxy is transparent and handhelds are stupid. â—Ź Choose which sessions to replay. â—Ź Write small scripts to manipulate req/resp during replay. OAuth handshakes? â—Ź Interact with replay in console. â—Ź Record results of replay.
  • 22.
    Megaphone load testercont. â—Ź Replay in bulk! (Load test). â—Ź Centralized console can spawn http replay processes on many headless machines. Similar to headless Jmeter. â—Ź A single session (some number of individual requests) is sent to the client process when spawned â—Ź Responses are sent back to the centralized databases as clients receive them. â—Ź The same session can be sent to multiple clients and played back concurrently.
  • 23.
    %% This modulecontains the functions to manipulate req/resp for the dcraft_session1 playback -module(dcraft_session1). EX. Session handler -include("blt_otp.hrl"). script for -export([ create_request/1, create_request/2, create_request/3]). -record(request, { url, verb, body_vars}). manipulating -record(response, {request_number, response_obj}). create_request(Request) -> create_request(Request, []). requests at runtime create_request(Request, Responses) -> create_request(Request, Responses, 0). create_request(#request{url="http://127.0.0.1:8080/1.2.1/dragoncraft/player/sanford/mission/"++OldMissionId} = Request, Responses, RequestNumber) -> ?DEBUG_MSG("~p Request for wall Found!~n", [?MODULE]), [LastResponseRecord|RestResponses] = Responses, {{_HttpVer, _ResponseCode, _ResponseDesc}, _Headers, ResponseBodyRaw} = LastResponseRecord#response.response_obj, {ok, ResponseBodyObj} = json:decode(ResponseBodyRaw), ResponseKVs = element(1, ResponseBodyObj), [Response_KV1 | [Response_KV2 | Response_KV_Rest ]] = ResponseKVs, Response_KV2_Key = element(1, Response_KV2), Response_KV2_Val = element(2, Response_KV2), ResponseDataObj = element(1, Response_KV2_Val), [ResponseDataKV | ResponseDataKVRest ] = ResponseDataObj, ResponseData_KV_Key = element(1, ResponseDataKV), %<<"identifier">> ResponseData_KV_Val = element(2, ResponseDataKV), MissionId = binary_to_list(ResponseData_KV_Val), Replaced = re:replace(Request#request.url, OldMissionId, MissionId++"/wall"), [ReHead|ReRest] = Replaced, [ReTail] = ReRest, ?DEBUG_MSG("~p replaced head is ~p and tail ~p ~n", [?MODULE, ReHead, ReTail]), NewUrl = binary_to_list(ReHead)++binary_to_list(ReTail), NewRequest = Request#request{url=NewUrl}; create_request(Request, Responses, RequestNumber) -> Request.
  • 24.
    Other notables â—Ź Recentlystarted using python's fabric library for rolling releases. â—Ź Node cluster for multiprocess node. â—Ź Node ipc with linux signals to raise and lower logging levels and content updates.
  • 25.
  • 26.
  • 27.
    Links â—Ź http://dragoncraftthegame.com/ â—Ź http://freeverse.com/ â—Ź http://blog.ngmoco.com/ â—Ź https://developer.mobage.com/ â—Ź http://dena.jp/intl/
  • 28.
    Repos â—Ź https://github.com/ngmoco/daida.js.git â—Ź https://github.com/ngmoco/daida-beanstalk. git â—Ź https://github.com/ngmoco/daida-local.git â—Ź https://github.com/ngmoco/Megaphone (coming very soon)