How we Build Vox April 4 2007
Six Apart Movable Type TypePad LiveJournal Vox
How we Build Vox: a Web 2.0, Large-scale, Fast, Internationalized website
How we Build Vox: a  Web 2.0 , Large-scale, Fast, Internationalized website
Web 2.0 Overused… But useful
Vox talks to web services
APIs: Tools We use our own custom libraries No Net::Amazon, Net::Flickr, etc Why? We don’t want to load 7 XML parsers All of our tools use XML::LibXML
APIs: Open Media Profile <entry> <title>Foo bar baz</title> <link href=&quot;http://example.com/show/video/123&quot; /> <link rel=&quot;alternate&quot; type=&quot;application/atom+xml&quot;  href=&quot;http://example.com/atom/123&quot; /> <id>tag:example.com,2006:video-123</id> <updated>2003-12-13T18:30:02Z</updated> <content type=&quot;text&quot;> Vox rocks blah blah blah ... </content> <category term=&quot;Vox&quot; scheme=&quot;http://example.com/tags/Vox/&quot; label=&quot;Vox&quot; /> <category term=&quot;cat&quot; scheme=&quot;http://example.com/tags/Cat/&quot; label=&quot;cat&quot; /> <link rel=&quot;license&quot; type=&quot;text/html&quot;  href=&quot;http://creativecommons.org/licenses/by/2.5/&quot; /> <media:content url=&quot;http://example.com/data/123.flv&quot; fileSize=&quot;123456&quot;  type=&quot;video/x-flv&quot; /> <media:player url=&quot;http://example.com/data/123.swf&quot; height=&quot;200&quot; width=&quot;400&quot; /> <media:thumbnail url=&quot;http://example.com/thumb/1223.jpg&quot; width=&quot;75&quot; height=&quot;50&quot; /> </entry>
GData, OpenSearch, Media RSS… GData OpenSearch Media RSS
Open Media Profile GData OpenSearch Media RSS
APIs: Outbound Atom Publishing Protocol Everything is Atom/RSS Cool URIs /library/posts/atom.xml /library/posts/2007/03/atom.xml /library/posts/2007/03/tags/yapc/atom.xml
Ajax JSON serialization Lightweight Normal data types (no need to invent syntax) Catalyst + JSON-RPC Everything is an API Our own core JS libraries http://search.cpan.org/~miyagawa/Catalyst-Plugin-JSONRPC-0.01/ http://code.sixapart.com/svn/js/trunk/
How we Build Vox: a Web 2.0,  Large-scale , Fast, Internationalized website
Large-scale We started with this:
We added some stuff.
Data::ObjectDriver Movable Type and TypePad: custom ORM We wanted more: Built-in caching Built-in partitioning
Data::ObjectDriver: Caching Built-in support for memcached All primary key data maintained for you Completely automatic
One line of code (basically): Data::ObjectDriver::Driver::Cache::Memcached->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback =>  Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ),
Data::ObjectDriver: Partitioning Sharded data Based on arbitrary criteria Completely transparent
One line of code: Data::ObjectDriver::Driver::Cache::Cache->new( cache  => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::SimplePartition->new( using => 'Recipe', ),
Partitioning: traffic
Example: loading user’s posts my $user = ArcheType::M::User->lookup_by_email( 'ben@sixapart.com’ ); my @assets = $user->assets({ type => 'Post' });
Example Loading $user hits the global Loading @assets then does: Get the partition number of $user Connect to that partition Runs a query like: SELECT user_id, asset_id FROM asset WHERE user_id = ? AND type = 6
ID Allocation: Issues Partitioned databases -> no more auto_increment Master/master Were UUIDs the answer? No.
ID Allocation: yuidd IDs unique a datacenter 64-bit integers (fit in a BIGINT column) yuidd is the server Data::YUID::Client is the client asynchronous, non-blocking, simple, fast
Job Queueing Offload processing from Apache It’s big and heavy
Job Queueing: TheSchwartz We’ll probably rename it.  asynchronous, reliable job queue N databases Pool of workers to handle the jobs
How we Build Vox: a Web 2.0, Large-scale,  Fast , Internationalized website
Fast Need both large-scale  and  fast
Catalyst Vox uses Catalyst Does what we want, allows us to do everything else Want to use our own ORM, etc
Is Catalyst fast? A common question on the mailing list! It’s fast enough (more on that later).
Template Toolkit Pretty fast… But we’re probably overloading it.
Template Toolkit: profile [info] Request took 0.244932s (4.083/s) .----------------------------------------------------------------+-----------. | Action  | Time  | +----------------------------------------------------------------+-----------+ | /auto  | 0.005569s | |  -> /set_locale  | 0.000854s | |  -> /set_locale  | 0.000648s | | /home/root  | 0.072194s | |  -> /home/home_loggedout  | 0.071337s | |  -> /home/load_thisisgoods  | 0.009077s | |  -> /home/load_specials  | 0.014897s | |  -> /home/load_featured_voxers  | 0.044168s | | /end  | 0.143675s | |  -> Vox::App::V::TT->process  | 0.140877s | '----------------------------------------------------------------+-----------'
Template Toolkit: profile Wow! Template Toolkit takes 60% of the request time. 4 times as long as 10-15 network requests. Oh well.
Template Toolkit: versioned caching On-disk cache Versioned with application version Automatic cache bust
Versioned caching Template->new({ ..., COMPILE_DIR => '/tmp/tt-cache-' . Vox->VERSION, });
Template Toolkit: syscalls Lots of syscalls for files that don’t exist! But we patched it.
Caching Data caching Automatic caching using Data::ObjectDriver Saves millions of lookups per day from reaching the database
Caching: lists Lists of things: tags on an asset. Tag objects are automatically cached Cache asset => list of tag IDs
Like this: asset<assetid>-tags => [ <tagid1>, <tagid2>, … ]
Caching: lists Grab list of tag IDs from memcached Use get_multi to get back the tags
Get back the tag objects: get_multi <tagid1> <tagid2> …
That’s not all! In bulk: get_multi asset<assetid1>-tags asset<assetid2>-tags …
Caching: lists In a database: N one-to-many queries In memcached: 2 queries Use the right caching strategy
Perlbal Reverse-proxy setup Like Apache 2/mod_proxy in front of mod_perl… But much better!
Perlbal: webserver mode Serves CSS, images, JavaScript Static stuff Really fast
Perlbal: Serving JS and CSS We use a lot of JS and CSS! 20 JS files per page, 10 CSS files per page SLOW
Perlbal: Serving JS and CSS Added file concatenation support in a plugin (it’s now core)
Used to be this: <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM/Proxy.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/JSON.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Timer.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Observer.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Cache.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Client.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Template.jsc&quot;></script> ...
And now it’s this! <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.jsc,/js/DOM/Proxy.jsc,/js/JSON.jsc,/js/Timer.jsc,/js/Observer.jsc,/js/Cache.jsc,/js/Client.jsc,/js/Template.jsc,/js/Autolayout.jsc,/js/Component.jsc,/js/Dialog.jsc,/js/App.jsc,/js/List.jsc,/js/ArcheType.jsc,/js/ArcheType/Client.jsc,/js/ArcheType/Controller.jsc,/js/ArcheType/Autocomplete.jsc&quot;></script>
Perlbal: Concatenation Much faster Much less latency Perlbal handles Last-Modified/If-Modified-Since
Lots of good stuff! Tools to play with: Perlbal MogileFS Memcached TheSchwartz yuidd JavaScript libraries etc.
All available at… http://code.sixapart.com/

How we build Vox

  • 1.
    How we BuildVox April 4 2007
  • 2.
    Six Apart MovableType TypePad LiveJournal Vox
  • 3.
    How we BuildVox: a Web 2.0, Large-scale, Fast, Internationalized website
  • 4.
    How we BuildVox: a Web 2.0 , Large-scale, Fast, Internationalized website
  • 5.
  • 6.
    Vox talks toweb services
  • 7.
    APIs: Tools Weuse our own custom libraries No Net::Amazon, Net::Flickr, etc Why? We don’t want to load 7 XML parsers All of our tools use XML::LibXML
  • 8.
    APIs: Open MediaProfile <entry> <title>Foo bar baz</title> <link href=&quot;http://example.com/show/video/123&quot; /> <link rel=&quot;alternate&quot; type=&quot;application/atom+xml&quot; href=&quot;http://example.com/atom/123&quot; /> <id>tag:example.com,2006:video-123</id> <updated>2003-12-13T18:30:02Z</updated> <content type=&quot;text&quot;> Vox rocks blah blah blah ... </content> <category term=&quot;Vox&quot; scheme=&quot;http://example.com/tags/Vox/&quot; label=&quot;Vox&quot; /> <category term=&quot;cat&quot; scheme=&quot;http://example.com/tags/Cat/&quot; label=&quot;cat&quot; /> <link rel=&quot;license&quot; type=&quot;text/html&quot; href=&quot;http://creativecommons.org/licenses/by/2.5/&quot; /> <media:content url=&quot;http://example.com/data/123.flv&quot; fileSize=&quot;123456&quot; type=&quot;video/x-flv&quot; /> <media:player url=&quot;http://example.com/data/123.swf&quot; height=&quot;200&quot; width=&quot;400&quot; /> <media:thumbnail url=&quot;http://example.com/thumb/1223.jpg&quot; width=&quot;75&quot; height=&quot;50&quot; /> </entry>
  • 9.
    GData, OpenSearch, MediaRSS… GData OpenSearch Media RSS
  • 10.
    Open Media ProfileGData OpenSearch Media RSS
  • 11.
    APIs: Outbound AtomPublishing Protocol Everything is Atom/RSS Cool URIs /library/posts/atom.xml /library/posts/2007/03/atom.xml /library/posts/2007/03/tags/yapc/atom.xml
  • 12.
    Ajax JSON serializationLightweight Normal data types (no need to invent syntax) Catalyst + JSON-RPC Everything is an API Our own core JS libraries http://search.cpan.org/~miyagawa/Catalyst-Plugin-JSONRPC-0.01/ http://code.sixapart.com/svn/js/trunk/
  • 13.
    How we BuildVox: a Web 2.0, Large-scale , Fast, Internationalized website
  • 14.
  • 15.
  • 16.
    Data::ObjectDriver Movable Typeand TypePad: custom ORM We wanted more: Built-in caching Built-in partitioning
  • 17.
    Data::ObjectDriver: Caching Built-insupport for memcached All primary key data maintained for you Completely automatic
  • 18.
    One line ofcode (basically): Data::ObjectDriver::Driver::Cache::Memcached->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::DBI->new( dsn => 'dbi:SQLite:dbname=global.db', ),
  • 19.
    Data::ObjectDriver: Partitioning Shardeddata Based on arbitrary criteria Completely transparent
  • 20.
    One line ofcode: Data::ObjectDriver::Driver::Cache::Cache->new( cache => Cache::Memcached->new({ servers => [ ... ] }), fallback => Data::ObjectDriver::Driver::SimplePartition->new( using => 'Recipe', ),
  • 21.
  • 22.
    Example: loading user’sposts my $user = ArcheType::M::User->lookup_by_email( 'ben@sixapart.com’ ); my @assets = $user->assets({ type => 'Post' });
  • 23.
    Example Loading $userhits the global Loading @assets then does: Get the partition number of $user Connect to that partition Runs a query like: SELECT user_id, asset_id FROM asset WHERE user_id = ? AND type = 6
  • 24.
    ID Allocation: IssuesPartitioned databases -> no more auto_increment Master/master Were UUIDs the answer? No.
  • 25.
    ID Allocation: yuiddIDs unique a datacenter 64-bit integers (fit in a BIGINT column) yuidd is the server Data::YUID::Client is the client asynchronous, non-blocking, simple, fast
  • 26.
    Job Queueing Offloadprocessing from Apache It’s big and heavy
  • 27.
    Job Queueing: TheSchwartzWe’ll probably rename it. asynchronous, reliable job queue N databases Pool of workers to handle the jobs
  • 28.
    How we BuildVox: a Web 2.0, Large-scale, Fast , Internationalized website
  • 29.
    Fast Need bothlarge-scale and fast
  • 30.
    Catalyst Vox usesCatalyst Does what we want, allows us to do everything else Want to use our own ORM, etc
  • 31.
    Is Catalyst fast?A common question on the mailing list! It’s fast enough (more on that later).
  • 32.
    Template Toolkit Prettyfast… But we’re probably overloading it.
  • 33.
    Template Toolkit: profile[info] Request took 0.244932s (4.083/s) .----------------------------------------------------------------+-----------. | Action | Time | +----------------------------------------------------------------+-----------+ | /auto | 0.005569s | | -> /set_locale | 0.000854s | | -> /set_locale | 0.000648s | | /home/root | 0.072194s | | -> /home/home_loggedout | 0.071337s | | -> /home/load_thisisgoods | 0.009077s | | -> /home/load_specials | 0.014897s | | -> /home/load_featured_voxers | 0.044168s | | /end | 0.143675s | | -> Vox::App::V::TT->process | 0.140877s | '----------------------------------------------------------------+-----------'
  • 34.
    Template Toolkit: profileWow! Template Toolkit takes 60% of the request time. 4 times as long as 10-15 network requests. Oh well.
  • 35.
    Template Toolkit: versionedcaching On-disk cache Versioned with application version Automatic cache bust
  • 36.
    Versioned caching Template->new({..., COMPILE_DIR => '/tmp/tt-cache-' . Vox->VERSION, });
  • 37.
    Template Toolkit: syscallsLots of syscalls for files that don’t exist! But we patched it.
  • 38.
    Caching Data cachingAutomatic caching using Data::ObjectDriver Saves millions of lookups per day from reaching the database
  • 39.
    Caching: lists Listsof things: tags on an asset. Tag objects are automatically cached Cache asset => list of tag IDs
  • 40.
    Like this: asset<assetid>-tags=> [ <tagid1>, <tagid2>, … ]
  • 41.
    Caching: lists Grablist of tag IDs from memcached Use get_multi to get back the tags
  • 42.
    Get back thetag objects: get_multi <tagid1> <tagid2> …
  • 43.
    That’s not all!In bulk: get_multi asset<assetid1>-tags asset<assetid2>-tags …
  • 44.
    Caching: lists Ina database: N one-to-many queries In memcached: 2 queries Use the right caching strategy
  • 45.
    Perlbal Reverse-proxy setupLike Apache 2/mod_proxy in front of mod_perl… But much better!
  • 46.
    Perlbal: webserver modeServes CSS, images, JavaScript Static stuff Really fast
  • 47.
    Perlbal: Serving JSand CSS We use a lot of JS and CSS! 20 JS files per page, 10 CSS files per page SLOW
  • 48.
    Perlbal: Serving JSand CSS Added file concatenation support in a plugin (it’s now core)
  • 49.
    Used to bethis: <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM/Proxy.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/JSON.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Timer.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Observer.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Cache.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Client.jsc&quot;></script> <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Template.jsc&quot;></script> ...
  • 50.
    And now it’sthis! <script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.jsc,/js/DOM/Proxy.jsc,/js/JSON.jsc,/js/Timer.jsc,/js/Observer.jsc,/js/Cache.jsc,/js/Client.jsc,/js/Template.jsc,/js/Autolayout.jsc,/js/Component.jsc,/js/Dialog.jsc,/js/App.jsc,/js/List.jsc,/js/ArcheType.jsc,/js/ArcheType/Client.jsc,/js/ArcheType/Controller.jsc,/js/ArcheType/Autocomplete.jsc&quot;></script>
  • 51.
    Perlbal: Concatenation Muchfaster Much less latency Perlbal handles Last-Modified/If-Modified-Since
  • 52.
    Lots of goodstuff! Tools to play with: Perlbal MogileFS Memcached TheSchwartz yuidd JavaScript libraries etc.
  • 53.
    All available at…http://code.sixapart.com/