Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How we Build Vox April 4 2007
Six Apart <ul><li>Movable Type </li></ul><ul><li>TypePad </li></ul><ul><li>LiveJournal </li></ul><ul><li>Vox </li></ul>
How we Build Vox: a Web 2.0, Large-scale, Fast, Internationalized website
How we Build Vox: a  Web 2.0 , Large-scale, Fast, Internationalized website
Web 2.0 <ul><li>Overused… </li></ul><ul><li>But useful </li></ul>
Vox talks to web services
APIs: Tools <ul><li>We use our own custom libraries </li></ul><ul><ul><li>No Net::Amazon, Net::Flickr, etc </li></ul></ul>...
APIs: Open Media Profile <ul><li><entry> </li></ul><ul><li><title>Foo bar baz</title> </li></ul><ul><li><link href=&quot;h...
GData, OpenSearch, Media RSS… GData OpenSearch Media RSS
Open Media Profile GData OpenSearch Media RSS
APIs: Outbound <ul><li>Atom Publishing Protocol </li></ul><ul><li>Everything is Atom/RSS </li></ul><ul><li>Cool URIs </li>...
Ajax <ul><li>JSON serialization </li></ul><ul><ul><li>Lightweight </li></ul></ul><ul><ul><li>Normal data types (no need to...
How we Build Vox: a Web 2.0,  Large-scale , Fast, Internationalized website
Large-scale <ul><li>We started with this: </li></ul>
We added some stuff.
Data::ObjectDriver <ul><li>Movable Type and TypePad: custom ORM </li></ul><ul><li>We wanted more: </li></ul><ul><ul><li>Bu...
Data::ObjectDriver: Caching <ul><li>Built-in support for memcached </li></ul><ul><li>All primary key data maintained for y...
One line of code (basically): <ul><li>Data::ObjectDriver::Driver::Cache::Memcached->new( </li></ul><ul><li>cache => Cache:...
Data::ObjectDriver: Partitioning <ul><li>Sharded data </li></ul><ul><li>Based on arbitrary criteria </li></ul><ul><li>Comp...
One line of code: <ul><li>Data::ObjectDriver::Driver::Cache::Cache->new( </li></ul><ul><li>cache  => Cache::Memcached->new...
Partitioning: traffic
Example: loading user’s posts <ul><li>my $user = ArcheType::M::User->lookup_by_email( </li></ul><ul><li>'ben@sixapart.com’...
Example <ul><li>Loading $user hits the global </li></ul><ul><li>Loading @assets then does: </li></ul><ul><ul><li>Get the p...
ID Allocation: Issues <ul><li>Partitioned databases -> no more auto_increment </li></ul><ul><li>Master/master </li></ul><u...
ID Allocation: yuidd <ul><li>IDs unique a datacenter </li></ul><ul><li>64-bit integers (fit in a BIGINT column) </li></ul>...
Job Queueing <ul><li>Offload processing from Apache </li></ul><ul><li>It’s big and heavy </li></ul>
Job Queueing: TheSchwartz <ul><li>We’ll probably rename it.  </li></ul><ul><li>asynchronous, reliable job queue </li></ul>...
How we Build Vox: a Web 2.0, Large-scale,  Fast , Internationalized website
Fast <ul><li>Need both large-scale  and  fast </li></ul>
Catalyst <ul><li>Vox uses Catalyst </li></ul><ul><li>Does what we want, allows us to do everything else </li></ul><ul><li>...
Is Catalyst fast? <ul><li>A common question on the mailing list! </li></ul><ul><li>It’s fast enough (more on that later). ...
Template Toolkit <ul><li>Pretty fast… </li></ul><ul><li>But we’re probably overloading it. </li></ul>
Template Toolkit: profile <ul><li>[info] Request took 0.244932s (4.083/s) </li></ul><ul><li>.-----------------------------...
Template Toolkit: profile <ul><li>Wow! Template Toolkit takes 60% of the request time. </li></ul><ul><li>4 times as long a...
Template Toolkit: versioned caching <ul><li>On-disk cache </li></ul><ul><li>Versioned with application version </li></ul><...
Versioned caching <ul><li>Template->new({ </li></ul><ul><li>..., </li></ul><ul><li>COMPILE_DIR => '/tmp/tt-cache-' . Vox->...
Template Toolkit: syscalls <ul><li>Lots of syscalls for files that don’t exist! </li></ul><ul><li>But we patched it. </li>...
Caching <ul><li>Data caching </li></ul><ul><li>Automatic caching using Data::ObjectDriver </li></ul><ul><li>Saves millions...
Caching: lists <ul><li>Lists of things: tags on an asset. </li></ul><ul><li>Tag objects are automatically cached </li></ul...
Like this: <ul><li>asset<assetid>-tags => [ <tagid1>, <tagid2>, … ] </li></ul>
Caching: lists <ul><li>Grab list of tag IDs from memcached </li></ul><ul><li>Use get_multi to get back the tags </li></ul>
Get back the tag objects: <ul><li>get_multi <tagid1> <tagid2> … </li></ul>
That’s not all! In bulk: <ul><li>get_multi asset<assetid1>-tags asset<assetid2>-tags … </li></ul>
Caching: lists <ul><li>In a database: N one-to-many queries </li></ul><ul><li>In memcached: 2 queries </li></ul><ul><li>Us...
Perlbal <ul><li>Reverse-proxy setup </li></ul><ul><li>Like Apache 2/mod_proxy in front of mod_perl… </li></ul><ul><li>But ...
Perlbal: webserver mode <ul><li>Serves CSS, images, JavaScript </li></ul><ul><li>Static stuff </li></ul><ul><li>Really fas...
Perlbal: Serving JS and CSS <ul><li>We use a lot of JS and CSS! </li></ul><ul><li>20 JS files per page, 10 CSS files per p...
Perlbal: Serving JS and CSS <ul><li>Added file concatenation support in a plugin </li></ul><ul><li>(it’s now core) </li></ul>
Used to be this: <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></scri...
And now it’s this! <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.js...
Perlbal: Concatenation <ul><li>Much faster </li></ul><ul><li>Much less latency </li></ul><ul><li>Perlbal handles Last-Modi...
Lots of good stuff! <ul><li>Tools to play with: </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>MogileFS </li></u...
All available at… <ul><li>http://code.sixapart.com/ </li></ul>
Upcoming SlideShare
Loading in …5
×

How we build Vox

23,903 views

Published on

Slides about "How we build Vox" in Six Apart, presented by Benjamin Trott, the CTO and co-founder of the company in <a href="http://tokyo2007.yapcasia.org/">YAPC::Asia 2007 in Tokyo</a>. The talk was done in English and interpreted by Tatsuhiko Miyagawa into Japanese.

Published in: Technology

How we build Vox

  1. How we Build Vox April 4 2007
  2. Six Apart <ul><li>Movable Type </li></ul><ul><li>TypePad </li></ul><ul><li>LiveJournal </li></ul><ul><li>Vox </li></ul>
  3. How we Build Vox: a Web 2.0, Large-scale, Fast, Internationalized website
  4. How we Build Vox: a Web 2.0 , Large-scale, Fast, Internationalized website
  5. Web 2.0 <ul><li>Overused… </li></ul><ul><li>But useful </li></ul>
  6. Vox talks to web services
  7. APIs: Tools <ul><li>We use our own custom libraries </li></ul><ul><ul><li>No Net::Amazon, Net::Flickr, etc </li></ul></ul><ul><ul><li>Why? </li></ul></ul><ul><ul><li>We don’t want to load 7 XML parsers </li></ul></ul><ul><ul><li>All of our tools use XML::LibXML </li></ul></ul>
  8. APIs: Open Media Profile <ul><li><entry> </li></ul><ul><li><title>Foo bar baz</title> </li></ul><ul><li><link href=&quot;http://example.com/show/video/123&quot; /> </li></ul><ul><li><link rel=&quot;alternate&quot; type=&quot;application/atom+xml&quot; </li></ul><ul><li>href=&quot;http://example.com/atom/123&quot; /> </li></ul><ul><li><id>tag:example.com,2006:video-123</id> </li></ul><ul><li><updated>2003-12-13T18:30:02Z</updated> </li></ul><ul><li><content type=&quot;text&quot;> </li></ul><ul><li>Vox rocks blah blah blah ... </li></ul><ul><li></content> </li></ul><ul><li><category term=&quot;Vox&quot; scheme=&quot;http://example.com/tags/Vox/&quot; label=&quot;Vox&quot; /> </li></ul><ul><li><category term=&quot;cat&quot; scheme=&quot;http://example.com/tags/Cat/&quot; label=&quot;cat&quot; /> </li></ul><ul><li><link rel=&quot;license&quot; type=&quot;text/html&quot; </li></ul><ul><li>href=&quot;http://creativecommons.org/licenses/by/2.5/&quot; /> </li></ul><ul><li><media:content url=&quot;http://example.com/data/123.flv&quot; fileSize=&quot;123456&quot; </li></ul><ul><li>type=&quot;video/x-flv&quot; /> </li></ul><ul><li><media:player url=&quot;http://example.com/data/123.swf&quot; height=&quot;200&quot; width=&quot;400&quot; /> </li></ul><ul><li><media:thumbnail url=&quot;http://example.com/thumb/1223.jpg&quot; width=&quot;75&quot; height=&quot;50&quot; /> </li></ul><ul><li></entry> </li></ul>
  9. GData, OpenSearch, Media RSS… GData OpenSearch Media RSS
  10. Open Media Profile GData OpenSearch Media RSS
  11. APIs: Outbound <ul><li>Atom Publishing Protocol </li></ul><ul><li>Everything is Atom/RSS </li></ul><ul><li>Cool URIs </li></ul><ul><ul><li>/library/posts/atom.xml </li></ul></ul><ul><ul><li>/library/posts/2007/03/atom.xml </li></ul></ul><ul><ul><li>/library/posts/2007/03/tags/yapc/atom.xml </li></ul></ul>
  12. Ajax <ul><li>JSON serialization </li></ul><ul><ul><li>Lightweight </li></ul></ul><ul><ul><li>Normal data types (no need to invent syntax) </li></ul></ul><ul><li>Catalyst + JSON-RPC </li></ul><ul><li>Everything is an API </li></ul><ul><li>Our own core JS libraries </li></ul><ul><li>http://search.cpan.org/~miyagawa/Catalyst-Plugin-JSONRPC-0.01/ </li></ul><ul><li>http://code.sixapart.com/svn/js/trunk/ </li></ul>
  13. How we Build Vox: a Web 2.0, Large-scale , Fast, Internationalized website
  14. Large-scale <ul><li>We started with this: </li></ul>
  15. We added some stuff.
  16. Data::ObjectDriver <ul><li>Movable Type and TypePad: custom ORM </li></ul><ul><li>We wanted more: </li></ul><ul><ul><li>Built-in caching </li></ul></ul><ul><ul><li>Built-in partitioning </li></ul></ul>
  17. Data::ObjectDriver: Caching <ul><li>Built-in support for memcached </li></ul><ul><li>All primary key data maintained for you </li></ul><ul><li>Completely automatic </li></ul>
  18. One line of code (basically): <ul><li>Data::ObjectDriver::Driver::Cache::Memcached->new( </li></ul><ul><li>cache => Cache::Memcached->new({ servers => [ ... ] }), </li></ul><ul><li>fallback => Data::ObjectDriver::Driver::DBI->new( </li></ul><ul><li>dsn => 'dbi:SQLite:dbname=global.db', </li></ul><ul><li>), </li></ul>
  19. Data::ObjectDriver: Partitioning <ul><li>Sharded data </li></ul><ul><li>Based on arbitrary criteria </li></ul><ul><li>Completely transparent </li></ul>
  20. One line of code: <ul><li>Data::ObjectDriver::Driver::Cache::Cache->new( </li></ul><ul><li>cache => Cache::Memcached->new({ servers => [ ... ] }), </li></ul><ul><li>fallback => Data::ObjectDriver::Driver::SimplePartition->new( </li></ul><ul><li>using => 'Recipe', </li></ul><ul><li>), </li></ul>
  21. Partitioning: traffic
  22. Example: loading user’s posts <ul><li>my $user = ArcheType::M::User->lookup_by_email( </li></ul><ul><li>'ben@sixapart.com’ </li></ul><ul><li>); </li></ul><ul><li>my @assets = $user->assets({ type => 'Post' }); </li></ul>
  23. Example <ul><li>Loading $user hits the global </li></ul><ul><li>Loading @assets then does: </li></ul><ul><ul><li>Get the partition number of $user </li></ul></ul><ul><ul><li>Connect to that partition </li></ul></ul><ul><ul><li>Runs a query like: </li></ul></ul><ul><li>SELECT user_id, asset_id </li></ul><ul><li>FROM asset </li></ul><ul><li>WHERE user_id = ? </li></ul><ul><li>AND type = 6 </li></ul>
  24. ID Allocation: Issues <ul><li>Partitioned databases -> no more auto_increment </li></ul><ul><li>Master/master </li></ul><ul><li>Were UUIDs the answer? No. </li></ul>
  25. ID Allocation: yuidd <ul><li>IDs unique a datacenter </li></ul><ul><li>64-bit integers (fit in a BIGINT column) </li></ul><ul><li>yuidd is the server </li></ul><ul><li>Data::YUID::Client is the client </li></ul><ul><li>asynchronous, non-blocking, simple, fast </li></ul>
  26. Job Queueing <ul><li>Offload processing from Apache </li></ul><ul><li>It’s big and heavy </li></ul>
  27. Job Queueing: TheSchwartz <ul><li>We’ll probably rename it. </li></ul><ul><li>asynchronous, reliable job queue </li></ul><ul><li>N databases </li></ul><ul><li>Pool of workers to handle the jobs </li></ul>
  28. How we Build Vox: a Web 2.0, Large-scale, Fast , Internationalized website
  29. Fast <ul><li>Need both large-scale and fast </li></ul>
  30. Catalyst <ul><li>Vox uses Catalyst </li></ul><ul><li>Does what we want, allows us to do everything else </li></ul><ul><li>Want to use our own ORM, etc </li></ul>
  31. Is Catalyst fast? <ul><li>A common question on the mailing list! </li></ul><ul><li>It’s fast enough (more on that later). </li></ul>
  32. Template Toolkit <ul><li>Pretty fast… </li></ul><ul><li>But we’re probably overloading it. </li></ul>
  33. Template Toolkit: profile <ul><li>[info] Request took 0.244932s (4.083/s) </li></ul><ul><li>.----------------------------------------------------------------+-----------. </li></ul><ul><li>| Action | Time | </li></ul><ul><li>+----------------------------------------------------------------+-----------+ </li></ul><ul><li>| /auto | 0.005569s | </li></ul><ul><li>| -> /set_locale | 0.000854s | </li></ul><ul><li>| -> /set_locale | 0.000648s | </li></ul><ul><li>| /home/root | 0.072194s | </li></ul><ul><li>| -> /home/home_loggedout | 0.071337s | </li></ul><ul><li>| -> /home/load_thisisgoods | 0.009077s | </li></ul><ul><li>| -> /home/load_specials | 0.014897s | </li></ul><ul><li>| -> /home/load_featured_voxers | 0.044168s | </li></ul><ul><li>| /end | 0.143675s | </li></ul><ul><li>| -> Vox::App::V::TT->process | 0.140877s | </li></ul><ul><li>'----------------------------------------------------------------+-----------' </li></ul>
  34. Template Toolkit: profile <ul><li>Wow! Template Toolkit takes 60% of the request time. </li></ul><ul><li>4 times as long as 10-15 network requests. </li></ul><ul><li>Oh well. </li></ul>
  35. Template Toolkit: versioned caching <ul><li>On-disk cache </li></ul><ul><li>Versioned with application version </li></ul><ul><li>Automatic cache bust </li></ul>
  36. Versioned caching <ul><li>Template->new({ </li></ul><ul><li>..., </li></ul><ul><li>COMPILE_DIR => '/tmp/tt-cache-' . Vox->VERSION, </li></ul><ul><li>}); </li></ul>
  37. Template Toolkit: syscalls <ul><li>Lots of syscalls for files that don’t exist! </li></ul><ul><li>But we patched it. </li></ul>
  38. Caching <ul><li>Data caching </li></ul><ul><li>Automatic caching using Data::ObjectDriver </li></ul><ul><li>Saves millions of lookups per day from reaching the database </li></ul>
  39. Caching: lists <ul><li>Lists of things: tags on an asset. </li></ul><ul><li>Tag objects are automatically cached </li></ul><ul><li>Cache asset => list of tag IDs </li></ul>
  40. Like this: <ul><li>asset<assetid>-tags => [ <tagid1>, <tagid2>, … ] </li></ul>
  41. Caching: lists <ul><li>Grab list of tag IDs from memcached </li></ul><ul><li>Use get_multi to get back the tags </li></ul>
  42. Get back the tag objects: <ul><li>get_multi <tagid1> <tagid2> … </li></ul>
  43. That’s not all! In bulk: <ul><li>get_multi asset<assetid1>-tags asset<assetid2>-tags … </li></ul>
  44. Caching: lists <ul><li>In a database: N one-to-many queries </li></ul><ul><li>In memcached: 2 queries </li></ul><ul><li>Use the right caching strategy </li></ul>
  45. Perlbal <ul><li>Reverse-proxy setup </li></ul><ul><li>Like Apache 2/mod_proxy in front of mod_perl… </li></ul><ul><li>But much better! </li></ul>
  46. Perlbal: webserver mode <ul><li>Serves CSS, images, JavaScript </li></ul><ul><li>Static stuff </li></ul><ul><li>Really fast </li></ul>
  47. Perlbal: Serving JS and CSS <ul><li>We use a lot of JS and CSS! </li></ul><ul><li>20 JS files per page, 10 CSS files per page </li></ul><ul><li>SLOW </li></ul>
  48. Perlbal: Serving JS and CSS <ul><li>Added file concatenation support in a plugin </li></ul><ul><li>(it’s now core) </li></ul>
  49. Used to be this: <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM/Proxy.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/JSON.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Timer.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Observer.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Cache.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Client.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Template.jsc&quot;></script> </li></ul><ul><li>... </li></ul>
  50. And now it’s this! <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.jsc,/js/DOM/Proxy.jsc,/js/JSON.jsc,/js/Timer.jsc,/js/Observer.jsc,/js/Cache.jsc,/js/Client.jsc,/js/Template.jsc,/js/Autolayout.jsc,/js/Component.jsc,/js/Dialog.jsc,/js/App.jsc,/js/List.jsc,/js/ArcheType.jsc,/js/ArcheType/Client.jsc,/js/ArcheType/Controller.jsc,/js/ArcheType/Autocomplete.jsc&quot;></script> </li></ul>
  51. Perlbal: Concatenation <ul><li>Much faster </li></ul><ul><li>Much less latency </li></ul><ul><li>Perlbal handles Last-Modified/If-Modified-Since </li></ul>
  52. Lots of good stuff! <ul><li>Tools to play with: </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>MogileFS </li></ul></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><li>TheSchwartz </li></ul></ul><ul><ul><li>yuidd </li></ul></ul><ul><ul><li>JavaScript libraries </li></ul></ul><ul><ul><li>etc. </li></ul></ul>
  53. All available at… <ul><li>http://code.sixapart.com/ </li></ul>

×