Your SlideShare is downloading. ×
How we Build Vox April 4 2007
Six Apart <ul><li>Movable Type </li></ul><ul><li>TypePad </li></ul><ul><li>LiveJournal </li></ul><ul><li>Vox </li></ul>
How we Build Vox: a Web 2.0, Large-scale, Fast, Internationalized website
How we Build Vox: a  Web 2.0 , Large-scale, Fast, Internationalized website
Web 2.0 <ul><li>Overused… </li></ul><ul><li>But useful </li></ul>
Vox talks to web services
APIs: Tools <ul><li>We use our own custom libraries </li></ul><ul><ul><li>No Net::Amazon, Net::Flickr, etc </li></ul></ul>...
APIs: Open Media Profile <ul><li><entry> </li></ul><ul><li><title>Foo bar baz</title> </li></ul><ul><li><link href=&quot;h...
GData, OpenSearch, Media RSS… GData OpenSearch Media RSS
Open Media Profile GData OpenSearch Media RSS
APIs: Outbound <ul><li>Atom Publishing Protocol </li></ul><ul><li>Everything is Atom/RSS </li></ul><ul><li>Cool URIs </li>...
Ajax <ul><li>JSON serialization </li></ul><ul><ul><li>Lightweight </li></ul></ul><ul><ul><li>Normal data types (no need to...
How we Build Vox: a Web 2.0,  Large-scale , Fast, Internationalized website
Large-scale <ul><li>We started with this: </li></ul>
We added some stuff.
Data::ObjectDriver <ul><li>Movable Type and TypePad: custom ORM </li></ul><ul><li>We wanted more: </li></ul><ul><ul><li>Bu...
Data::ObjectDriver: Caching <ul><li>Built-in support for memcached </li></ul><ul><li>All primary key data maintained for y...
One line of code (basically): <ul><li>Data::ObjectDriver::Driver::Cache::Memcached->new( </li></ul><ul><li>cache => Cache:...
Data::ObjectDriver: Partitioning <ul><li>Sharded data </li></ul><ul><li>Based on arbitrary criteria </li></ul><ul><li>Comp...
One line of code: <ul><li>Data::ObjectDriver::Driver::Cache::Cache->new( </li></ul><ul><li>cache  => Cache::Memcached->new...
Partitioning: traffic
Example: loading user’s posts <ul><li>my $user = ArcheType::M::User->lookup_by_email( </li></ul><ul><li>'ben@sixapart.com’...
Example <ul><li>Loading $user hits the global </li></ul><ul><li>Loading @assets then does: </li></ul><ul><ul><li>Get the p...
ID Allocation: Issues <ul><li>Partitioned databases -> no more auto_increment </li></ul><ul><li>Master/master </li></ul><u...
ID Allocation: yuidd <ul><li>IDs unique a datacenter </li></ul><ul><li>64-bit integers (fit in a BIGINT column) </li></ul>...
Job Queueing <ul><li>Offload processing from Apache </li></ul><ul><li>It’s big and heavy </li></ul>
Job Queueing: TheSchwartz <ul><li>We’ll probably rename it.  </li></ul><ul><li>asynchronous, reliable job queue </li></ul>...
How we Build Vox: a Web 2.0, Large-scale,  Fast , Internationalized website
Fast <ul><li>Need both large-scale  and  fast </li></ul>
Catalyst <ul><li>Vox uses Catalyst </li></ul><ul><li>Does what we want, allows us to do everything else </li></ul><ul><li>...
Is Catalyst fast? <ul><li>A common question on the mailing list! </li></ul><ul><li>It’s fast enough (more on that later). ...
Template Toolkit <ul><li>Pretty fast… </li></ul><ul><li>But we’re probably overloading it. </li></ul>
Template Toolkit: profile <ul><li>[info] Request took 0.244932s (4.083/s) </li></ul><ul><li>.-----------------------------...
Template Toolkit: profile <ul><li>Wow! Template Toolkit takes 60% of the request time. </li></ul><ul><li>4 times as long a...
Template Toolkit: versioned caching <ul><li>On-disk cache </li></ul><ul><li>Versioned with application version </li></ul><...
Versioned caching <ul><li>Template->new({ </li></ul><ul><li>..., </li></ul><ul><li>COMPILE_DIR => '/tmp/tt-cache-' . Vox->...
Template Toolkit: syscalls <ul><li>Lots of syscalls for files that don’t exist! </li></ul><ul><li>But we patched it. </li>...
Caching <ul><li>Data caching </li></ul><ul><li>Automatic caching using Data::ObjectDriver </li></ul><ul><li>Saves millions...
Caching: lists <ul><li>Lists of things: tags on an asset. </li></ul><ul><li>Tag objects are automatically cached </li></ul...
Like this: <ul><li>asset<assetid>-tags => [ <tagid1>, <tagid2>, … ] </li></ul>
Caching: lists <ul><li>Grab list of tag IDs from memcached </li></ul><ul><li>Use get_multi to get back the tags </li></ul>
Get back the tag objects: <ul><li>get_multi <tagid1> <tagid2> … </li></ul>
That’s not all! In bulk: <ul><li>get_multi asset<assetid1>-tags asset<assetid2>-tags … </li></ul>
Caching: lists <ul><li>In a database: N one-to-many queries </li></ul><ul><li>In memcached: 2 queries </li></ul><ul><li>Us...
Perlbal <ul><li>Reverse-proxy setup </li></ul><ul><li>Like Apache 2/mod_proxy in front of mod_perl… </li></ul><ul><li>But ...
Perlbal: webserver mode <ul><li>Serves CSS, images, JavaScript </li></ul><ul><li>Static stuff </li></ul><ul><li>Really fas...
Perlbal: Serving JS and CSS <ul><li>We use a lot of JS and CSS! </li></ul><ul><li>20 JS files per page, 10 CSS files per p...
Perlbal: Serving JS and CSS <ul><li>Added file concatenation support in a plugin </li></ul><ul><li>(it’s now core) </li></ul>
Used to be this: <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></scri...
And now it’s this! <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.js...
Perlbal: Concatenation <ul><li>Much faster </li></ul><ul><li>Much less latency </li></ul><ul><li>Perlbal handles Last-Modi...
Lots of good stuff! <ul><li>Tools to play with: </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>MogileFS </li></u...
All available at… <ul><li>http://code.sixapart.com/ </li></ul>
Upcoming SlideShare
Loading in...5
×

How we build Vox

14,185

Published on

Slides about "How we build Vox" in Six Apart, presented by Benjamin Trott, the CTO and co-founder of the company in

6 Comments
47 Likes
Statistics
Notes
No Downloads
Views
Total Views
14,185
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
6
Likes
47
Embeds 0
No embeds

No notes for slide

Transcript of "How we build Vox"

  1. 1. How we Build Vox April 4 2007
  2. 2. Six Apart <ul><li>Movable Type </li></ul><ul><li>TypePad </li></ul><ul><li>LiveJournal </li></ul><ul><li>Vox </li></ul>
  3. 3. How we Build Vox: a Web 2.0, Large-scale, Fast, Internationalized website
  4. 4. How we Build Vox: a Web 2.0 , Large-scale, Fast, Internationalized website
  5. 5. Web 2.0 <ul><li>Overused… </li></ul><ul><li>But useful </li></ul>
  6. 6. Vox talks to web services
  7. 7. APIs: Tools <ul><li>We use our own custom libraries </li></ul><ul><ul><li>No Net::Amazon, Net::Flickr, etc </li></ul></ul><ul><ul><li>Why? </li></ul></ul><ul><ul><li>We don’t want to load 7 XML parsers </li></ul></ul><ul><ul><li>All of our tools use XML::LibXML </li></ul></ul>
  8. 8. APIs: Open Media Profile <ul><li><entry> </li></ul><ul><li><title>Foo bar baz</title> </li></ul><ul><li><link href=&quot;http://example.com/show/video/123&quot; /> </li></ul><ul><li><link rel=&quot;alternate&quot; type=&quot;application/atom+xml&quot; </li></ul><ul><li>href=&quot;http://example.com/atom/123&quot; /> </li></ul><ul><li><id>tag:example.com,2006:video-123</id> </li></ul><ul><li><updated>2003-12-13T18:30:02Z</updated> </li></ul><ul><li><content type=&quot;text&quot;> </li></ul><ul><li>Vox rocks blah blah blah ... </li></ul><ul><li></content> </li></ul><ul><li><category term=&quot;Vox&quot; scheme=&quot;http://example.com/tags/Vox/&quot; label=&quot;Vox&quot; /> </li></ul><ul><li><category term=&quot;cat&quot; scheme=&quot;http://example.com/tags/Cat/&quot; label=&quot;cat&quot; /> </li></ul><ul><li><link rel=&quot;license&quot; type=&quot;text/html&quot; </li></ul><ul><li>href=&quot;http://creativecommons.org/licenses/by/2.5/&quot; /> </li></ul><ul><li><media:content url=&quot;http://example.com/data/123.flv&quot; fileSize=&quot;123456&quot; </li></ul><ul><li>type=&quot;video/x-flv&quot; /> </li></ul><ul><li><media:player url=&quot;http://example.com/data/123.swf&quot; height=&quot;200&quot; width=&quot;400&quot; /> </li></ul><ul><li><media:thumbnail url=&quot;http://example.com/thumb/1223.jpg&quot; width=&quot;75&quot; height=&quot;50&quot; /> </li></ul><ul><li></entry> </li></ul>
  9. 9. GData, OpenSearch, Media RSS… GData OpenSearch Media RSS
  10. 10. Open Media Profile GData OpenSearch Media RSS
  11. 11. APIs: Outbound <ul><li>Atom Publishing Protocol </li></ul><ul><li>Everything is Atom/RSS </li></ul><ul><li>Cool URIs </li></ul><ul><ul><li>/library/posts/atom.xml </li></ul></ul><ul><ul><li>/library/posts/2007/03/atom.xml </li></ul></ul><ul><ul><li>/library/posts/2007/03/tags/yapc/atom.xml </li></ul></ul>
  12. 12. Ajax <ul><li>JSON serialization </li></ul><ul><ul><li>Lightweight </li></ul></ul><ul><ul><li>Normal data types (no need to invent syntax) </li></ul></ul><ul><li>Catalyst + JSON-RPC </li></ul><ul><li>Everything is an API </li></ul><ul><li>Our own core JS libraries </li></ul><ul><li>http://search.cpan.org/~miyagawa/Catalyst-Plugin-JSONRPC-0.01/ </li></ul><ul><li>http://code.sixapart.com/svn/js/trunk/ </li></ul>
  13. 13. How we Build Vox: a Web 2.0, Large-scale , Fast, Internationalized website
  14. 14. Large-scale <ul><li>We started with this: </li></ul>
  15. 15. We added some stuff.
  16. 16. Data::ObjectDriver <ul><li>Movable Type and TypePad: custom ORM </li></ul><ul><li>We wanted more: </li></ul><ul><ul><li>Built-in caching </li></ul></ul><ul><ul><li>Built-in partitioning </li></ul></ul>
  17. 17. Data::ObjectDriver: Caching <ul><li>Built-in support for memcached </li></ul><ul><li>All primary key data maintained for you </li></ul><ul><li>Completely automatic </li></ul>
  18. 18. One line of code (basically): <ul><li>Data::ObjectDriver::Driver::Cache::Memcached->new( </li></ul><ul><li>cache => Cache::Memcached->new({ servers => [ ... ] }), </li></ul><ul><li>fallback => Data::ObjectDriver::Driver::DBI->new( </li></ul><ul><li>dsn => 'dbi:SQLite:dbname=global.db', </li></ul><ul><li>), </li></ul>
  19. 19. Data::ObjectDriver: Partitioning <ul><li>Sharded data </li></ul><ul><li>Based on arbitrary criteria </li></ul><ul><li>Completely transparent </li></ul>
  20. 20. One line of code: <ul><li>Data::ObjectDriver::Driver::Cache::Cache->new( </li></ul><ul><li>cache => Cache::Memcached->new({ servers => [ ... ] }), </li></ul><ul><li>fallback => Data::ObjectDriver::Driver::SimplePartition->new( </li></ul><ul><li>using => 'Recipe', </li></ul><ul><li>), </li></ul>
  21. 21. Partitioning: traffic
  22. 22. Example: loading user’s posts <ul><li>my $user = ArcheType::M::User->lookup_by_email( </li></ul><ul><li>'ben@sixapart.com’ </li></ul><ul><li>); </li></ul><ul><li>my @assets = $user->assets({ type => 'Post' }); </li></ul>
  23. 23. Example <ul><li>Loading $user hits the global </li></ul><ul><li>Loading @assets then does: </li></ul><ul><ul><li>Get the partition number of $user </li></ul></ul><ul><ul><li>Connect to that partition </li></ul></ul><ul><ul><li>Runs a query like: </li></ul></ul><ul><li>SELECT user_id, asset_id </li></ul><ul><li>FROM asset </li></ul><ul><li>WHERE user_id = ? </li></ul><ul><li>AND type = 6 </li></ul>
  24. 24. ID Allocation: Issues <ul><li>Partitioned databases -> no more auto_increment </li></ul><ul><li>Master/master </li></ul><ul><li>Were UUIDs the answer? No. </li></ul>
  25. 25. ID Allocation: yuidd <ul><li>IDs unique a datacenter </li></ul><ul><li>64-bit integers (fit in a BIGINT column) </li></ul><ul><li>yuidd is the server </li></ul><ul><li>Data::YUID::Client is the client </li></ul><ul><li>asynchronous, non-blocking, simple, fast </li></ul>
  26. 26. Job Queueing <ul><li>Offload processing from Apache </li></ul><ul><li>It’s big and heavy </li></ul>
  27. 27. Job Queueing: TheSchwartz <ul><li>We’ll probably rename it. </li></ul><ul><li>asynchronous, reliable job queue </li></ul><ul><li>N databases </li></ul><ul><li>Pool of workers to handle the jobs </li></ul>
  28. 28. How we Build Vox: a Web 2.0, Large-scale, Fast , Internationalized website
  29. 29. Fast <ul><li>Need both large-scale and fast </li></ul>
  30. 30. Catalyst <ul><li>Vox uses Catalyst </li></ul><ul><li>Does what we want, allows us to do everything else </li></ul><ul><li>Want to use our own ORM, etc </li></ul>
  31. 31. Is Catalyst fast? <ul><li>A common question on the mailing list! </li></ul><ul><li>It’s fast enough (more on that later). </li></ul>
  32. 32. Template Toolkit <ul><li>Pretty fast… </li></ul><ul><li>But we’re probably overloading it. </li></ul>
  33. 33. Template Toolkit: profile <ul><li>[info] Request took 0.244932s (4.083/s) </li></ul><ul><li>.----------------------------------------------------------------+-----------. </li></ul><ul><li>| Action | Time | </li></ul><ul><li>+----------------------------------------------------------------+-----------+ </li></ul><ul><li>| /auto | 0.005569s | </li></ul><ul><li>| -> /set_locale | 0.000854s | </li></ul><ul><li>| -> /set_locale | 0.000648s | </li></ul><ul><li>| /home/root | 0.072194s | </li></ul><ul><li>| -> /home/home_loggedout | 0.071337s | </li></ul><ul><li>| -> /home/load_thisisgoods | 0.009077s | </li></ul><ul><li>| -> /home/load_specials | 0.014897s | </li></ul><ul><li>| -> /home/load_featured_voxers | 0.044168s | </li></ul><ul><li>| /end | 0.143675s | </li></ul><ul><li>| -> Vox::App::V::TT->process | 0.140877s | </li></ul><ul><li>'----------------------------------------------------------------+-----------' </li></ul>
  34. 34. Template Toolkit: profile <ul><li>Wow! Template Toolkit takes 60% of the request time. </li></ul><ul><li>4 times as long as 10-15 network requests. </li></ul><ul><li>Oh well. </li></ul>
  35. 35. Template Toolkit: versioned caching <ul><li>On-disk cache </li></ul><ul><li>Versioned with application version </li></ul><ul><li>Automatic cache bust </li></ul>
  36. 36. Versioned caching <ul><li>Template->new({ </li></ul><ul><li>..., </li></ul><ul><li>COMPILE_DIR => '/tmp/tt-cache-' . Vox->VERSION, </li></ul><ul><li>}); </li></ul>
  37. 37. Template Toolkit: syscalls <ul><li>Lots of syscalls for files that don’t exist! </li></ul><ul><li>But we patched it. </li></ul>
  38. 38. Caching <ul><li>Data caching </li></ul><ul><li>Automatic caching using Data::ObjectDriver </li></ul><ul><li>Saves millions of lookups per day from reaching the database </li></ul>
  39. 39. Caching: lists <ul><li>Lists of things: tags on an asset. </li></ul><ul><li>Tag objects are automatically cached </li></ul><ul><li>Cache asset => list of tag IDs </li></ul>
  40. 40. Like this: <ul><li>asset<assetid>-tags => [ <tagid1>, <tagid2>, … ] </li></ul>
  41. 41. Caching: lists <ul><li>Grab list of tag IDs from memcached </li></ul><ul><li>Use get_multi to get back the tags </li></ul>
  42. 42. Get back the tag objects: <ul><li>get_multi <tagid1> <tagid2> … </li></ul>
  43. 43. That’s not all! In bulk: <ul><li>get_multi asset<assetid1>-tags asset<assetid2>-tags … </li></ul>
  44. 44. Caching: lists <ul><li>In a database: N one-to-many queries </li></ul><ul><li>In memcached: 2 queries </li></ul><ul><li>Use the right caching strategy </li></ul>
  45. 45. Perlbal <ul><li>Reverse-proxy setup </li></ul><ul><li>Like Apache 2/mod_proxy in front of mod_perl… </li></ul><ul><li>But much better! </li></ul>
  46. 46. Perlbal: webserver mode <ul><li>Serves CSS, images, JavaScript </li></ul><ul><li>Static stuff </li></ul><ul><li>Really fast </li></ul>
  47. 47. Perlbal: Serving JS and CSS <ul><li>We use a lot of JS and CSS! </li></ul><ul><li>20 JS files per page, 10 CSS files per page </li></ul><ul><li>SLOW </li></ul>
  48. 48. Perlbal: Serving JS and CSS <ul><li>Added file concatenation support in a plugin </li></ul><ul><li>(it’s now core) </li></ul>
  49. 49. Used to be this: <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/DOM/Proxy.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/JSON.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Timer.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Observer.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Cache.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Client.jsc&quot;></script> </li></ul><ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Template.jsc&quot;></script> </li></ul><ul><li>... </li></ul>
  50. 50. And now it’s this! <ul><li><script type=&quot;text/javascript&quot; src=&quot;/.shared:v25.3:vox:en/js/Core.jsc?/js/DOM.jsc,/js/DOM/Proxy.jsc,/js/JSON.jsc,/js/Timer.jsc,/js/Observer.jsc,/js/Cache.jsc,/js/Client.jsc,/js/Template.jsc,/js/Autolayout.jsc,/js/Component.jsc,/js/Dialog.jsc,/js/App.jsc,/js/List.jsc,/js/ArcheType.jsc,/js/ArcheType/Client.jsc,/js/ArcheType/Controller.jsc,/js/ArcheType/Autocomplete.jsc&quot;></script> </li></ul>
  51. 51. Perlbal: Concatenation <ul><li>Much faster </li></ul><ul><li>Much less latency </li></ul><ul><li>Perlbal handles Last-Modified/If-Modified-Since </li></ul>
  52. 52. Lots of good stuff! <ul><li>Tools to play with: </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>MogileFS </li></ul></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><li>TheSchwartz </li></ul></ul><ul><ul><li>yuidd </li></ul></ul><ul><ul><li>JavaScript libraries </li></ul></ul><ul><ul><li>etc. </li></ul></ul>
  53. 53. All available at… <ul><li>http://code.sixapart.com/ </li></ul>

×