0
Flickr and PHP
   Cal Henderson
What’s Flickr

• Photo sharing
• Open APIs
Logical Architecture

     Photo Storage                 Database          Node Service


                     Application...
Physical Architecture

   Static Servers   Database Servers   Node Servers




                     Web Servers



       ...
Where is PHP?

     Photo Storage                Database           Node Service


                     Application Logic
...
Other than PHP?

•   Smarty for templating
•   PEAR for XML and Email parsing
•   Perl for controlling…
•   ImageMagick, f...
Big Application?

•   One programmer, one designer, etc.
•   ~60,000 lines of PHP code
•   ~60,000 lines of templates
•   ...
Thinking outside the web app

• Services
   – Atom/RSS/RDF Feeds
   – APIs
      •   SOAP
      •   XML-RPC
      •   REST...
More cool stuff

• Email interface
    – Postfix
    – PHP
    – PEAR::Mail::mimeDecode
•   FTP
•   Uploading API
•   Auth...
Even more stuff

• Real time application
• Cool flash apps
• Blogging
   –   Blogger API (1 & 2)
   –   Metaweblog API
   ...
APIs are simple!

•   Modeled on XML-RPC (Sort of)
•   Method calls with XML responses
•   SOAP, XML-RPC and REST are just...
XML isn’t simple :(

• PHP 4 doesn’t have good a XML parser
• Expat is cool though (PEAR::XML::Parser)
• Why doesn’t PEAR ...
I love XPath

if ($tree->root->name == 'methodResponse'){
      if (($tree->root->children[0]->name == 'params')
       &&...
Creating API methods

• Stateless method-call APIs are easy to extend
• Adding a method requires no knowledge of the trans...
Red Hot Unicode Action

• UTF-8 pages
• CJKV support
• It’s really cool
Unicode for all

• It’s really easy
   –   Don’t need PHP support
   –   Don’t need MySQL support
   –   Just need the rig...
Scaling the beast

•   Why PHP is great
•   MySQL scaling
•   Search scaling
•   Horizontal scaling
Why PHP is great

• Stateless
   –   We can bounce people around servers
   –   Everything is stored in the database
   – ...
MySQL Scaling

• Our database server started to slow
• Load of 200
• Replication!
MySQL Replication

• But it only gives you more SELECT’s
• Else you need to partition vertically
• Re-architecting sucks :(
Looking at usage

• Snapshot of db1.flickr.com
   –   SELECT’s 44,220,588
   –   INSERT’s 1,349,234
   –   UPDATE’s 1,755,...
Replication is really cool

• A bunch of slave servers handle all the SELECT’s
• A single master just handles I/U/D’s
• It...
Searching

•   A simple text search
•   We were using RLIKE
•   Then switched to LIKE
•   Then disabled it all together
FULLTEXT Indexes

•   MySQL saves the day!
•   But they’re only supported my MyISAM tables
•   We use InnoDb, because it’s...
But wait!

• Partial replication saves the day
• Replicate the portion of the database we want to search.
• But change the...
JOIN’s are slow

•   Normalised data is for sissies
•   Keep multiple copies of data around
•   Makes searching faster
•  ...
Our current setup


                         DB1
                                  I/U/D’s
                         Master...
Horizontal scaling

•   At the core of our design
•   Just add hardware!
•   Inexpensive
•   Not exponential
•   Avoid red...
Talking to the Node Service

• Everyone speaks XML (badly)
• Just TCP/IP - fsockopen()
• We’re issuing commands, not reque...
RSS / Atom / RDF

•   Different formats
•   All quite bad
•   We’re generating a lot of different feeds
•   Abstract the d...
Receiving email

•   Want users to be able to email photos to Flickr
•   Just get postfix to pipe each mail to a PHP scrip...
Upload via FTP

• PHP isn’t so great at being a daemon
• Leaks memory like a sieve
• No threads
• Java to the rescue
• Jav...
Blogs

• Why does everyone loves blogs so much?
• Only a few APIs really
   –   Blogger
   –   Metaweblog
   –   Blogger2
...
It’s all broken

•   Lots of blog software has broken interfaces
•   It’s a support nightmare
•   Manila is tricky
•   But...
Back to those APIs

• We opened up the Flickr APIs a few weeks ago
• Programmers mainly build tools for other programmers
...
Flickr Rainbow
Tag Wallpaper
So what next?

•   Much more scaling
•   PHP 5?
•   MySQL 5?
•   Taking over the world
Flickr and PHP
   Cal Henderson
Any Questions?
Flickr Architecture Presentation
Upcoming SlideShare
Loading in...5
×

Flickr Architecture Presentation

1,552

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
  • Well presented and created, interesting presentation !
    http://www.trainingreiki.com/
    http://www.trainingreiki.com/reiki-training.php
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,552
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Flickr Architecture Presentation"

  1. 1. Flickr and PHP Cal Henderson
  2. 2. What’s Flickr • Photo sharing • Open APIs
  3. 3. Logical Architecture Photo Storage Database Node Service Application Logic Page Logic API Templates Endpoints 3rd Party Apps Flickr Apps Flickr.com Email Users
  4. 4. Physical Architecture Static Servers Database Servers Node Servers Web Servers Users
  5. 5. Where is PHP? Photo Storage Database Node Service Application Logic Page Logic API Templates Endpoints 3rd Party Apps Flickr Apps Flickr.com Email Users
  6. 6. Other than PHP? • Smarty for templating • PEAR for XML and Email parsing • Perl for controlling… • ImageMagick, for image processing • MySQL (4.0 / InnoDb) • Java, for the node service • Apache 2, Redhat, etc. etc.
  7. 7. Big Application? • One programmer, one designer, etc. • ~60,000 lines of PHP code • ~60,000 lines of templates • ~70 custom smarty functions/modifiers • ~25,000 DB transactions/second at peak • ~1000 pages per second at peak
  8. 8. Thinking outside the web app • Services – Atom/RSS/RDF Feeds – APIs • SOAP • XML-RPC • REST • PEAR::XML::Tree
  9. 9. More cool stuff • Email interface – Postfix – PHP – PEAR::Mail::mimeDecode • FTP • Uploading API • Authentication API • Unicode
  10. 10. Even more stuff • Real time application • Cool flash apps • Blogging – Blogger API (1 & 2) – Metaweblog API – Atom – LiveJournal
  11. 11. APIs are simple! • Modeled on XML-RPC (Sort of) • Method calls with XML responses • SOAP, XML-RPC and REST are just transports • PHP endpoints mean we can use the same application logic as the website
  12. 12. XML isn’t simple :( • PHP 4 doesn’t have good a XML parser • Expat is cool though (PEAR::XML::Parser) • Why doesn’t PEAR have XPath? – Because PEAR is stupid! – PHP 4 sucks!
  13. 13. I love XPath if ($tree->root->name == 'methodResponse'){ if (($tree->root->children[0]->name == 'params') && ($tree->root->children[0]->children[0]->name == 'param') && ($tree->root->children[0]->children[0]->children[0]->name == 'value') && ($tree->root->children[0]->children[0]->children[0]->children[0]->name == 'array') && ($tree->root->children[0]->children[0]->children[0]->children[0]->children[0]->name == 'data')){ $rsp = $tree->root->children[0]->children[0]->children[0]->children[0]->children[0]; } if ($tree->root->children[0]->name == 'fault'){ $fault = $tree->root->children[0]; return $fault; } } $nodes = $tree->select_nodes('/methodResponse/params/param[1]/value[1]/array[1]/data[1]/text()'); if (count($nodes)){ $rsp = array_pop($nodes); }else{ list($fault) = $tree->select_nodes('/methodResponse/fault'); return $fault; }
  14. 14. Creating API methods • Stateless method-call APIs are easy to extend • Adding a method requires no knowledge of the transport • Adding a method once makes it available to all the interfaces • Self documenting
  15. 15. Red Hot Unicode Action • UTF-8 pages • CJKV support • It’s really cool
  16. 16. Unicode for all • It’s really easy – Don’t need PHP support – Don’t need MySQL support – Just need the right headers – UTF-8 is 7-bit transparent – (Just don’t mess with high characters) • Don’t use HtmlEntities()! • But bear in mind… • JavaScript has patchy Unicode support • People using your APIs might be stupid
  17. 17. Scaling the beast • Why PHP is great • MySQL scaling • Search scaling • Horizontal scaling
  18. 18. Why PHP is great • Stateless – We can bounce people around servers – Everything is stored in the database – Even the smarty cache – “Shared nothing” – (so long as we avoid PHP sessions)
  19. 19. MySQL Scaling • Our database server started to slow • Load of 200 • Replication!
  20. 20. MySQL Replication • But it only gives you more SELECT’s • Else you need to partition vertically • Re-architecting sucks :(
  21. 21. Looking at usage • Snapshot of db1.flickr.com – SELECT’s 44,220,588 – INSERT’s 1,349,234 – UPDATE’s 1,755,503 – DELETE’s 318,439 – 13 SELECT’s per I/U/D
  22. 22. Replication is really cool • A bunch of slave servers handle all the SELECT’s • A single master just handles I/U/D’s • It can scale horizontally, at least for a while.
  23. 23. Searching • A simple text search • We were using RLIKE • Then switched to LIKE • Then disabled it all together
  24. 24. FULLTEXT Indexes • MySQL saves the day! • But they’re only supported my MyISAM tables • We use InnoDb, because it’s a lot faster • We’re doomed
  25. 25. But wait! • Partial replication saves the day • Replicate the portion of the database we want to search. • But change the table types on the slave to MyISAM • It can keep up because it’s only handling I/U/D’s on a couple of tables • And we can reduce the I/U/D’s with a little bit of vertical partitioning
  26. 26. JOIN’s are slow • Normalised data is for sissies • Keep multiple copies of data around • Makes searching faster • Have to ensure consistency in the application logic
  27. 27. Our current setup DB1 I/U/D’s Master SELECT’s DB2 Main Slave DB3 Main Search Slave Farm slave Search SELECT’s Search Slave Farm
  28. 28. Horizontal scaling • At the core of our design • Just add hardware! • Inexpensive • Not exponential • Avoid redesigns
  29. 29. Talking to the Node Service • Everyone speaks XML (badly) • Just TCP/IP - fsockopen() • We’re issuing commands, not requesting data, so we don’t bother to parse the response – Just substring search for state=“ok” • Don’t rely on it!
  30. 30. RSS / Atom / RDF • Different formats • All quite bad • We’re generating a lot of different feeds • Abstract the difference away using templates • No good way to do private feeds. Why is nobody working on this? (WSSE maybe?)
  31. 31. Receiving email • Want users to be able to email photos to Flickr • Just get postfix to pipe each mail to a PHP script • Parse the mail and find any photos • Cellular phone companies hate you • Lots of mailers are retarded – Photos as text/plain attachments :/
  32. 32. Upload via FTP • PHP isn’t so great at being a daemon • Leaks memory like a sieve • No threads • Java to the rescue • Java just acts as an FTPd and passes all uploaded files to PHP for processing • (This isn’t actually public) • Bricolage does this I think. Maybe Zope?
  33. 33. Blogs • Why does everyone loves blogs so much? • Only a few APIs really – Blogger – Metaweblog – Blogger2 – Movable Type – Atom – Live Journal
  34. 34. It’s all broken • Lots of blog software has broken interfaces • It’s a support nightmare • Manila is tricky • But it all works, more or less • Abstracted in the application logic • We just call blogs_post_message();
  35. 35. Back to those APIs • We opened up the Flickr APIs a few weeks ago • Programmers mainly build tools for other programmers • We have Perl, python, PHP, ActionScript, XMLHTTP and .NET interface libraries • But also a few actual applications
  36. 36. Flickr Rainbow
  37. 37. Tag Wallpaper
  38. 38. So what next? • Much more scaling • PHP 5? • MySQL 5? • Taking over the world
  39. 39. Flickr and PHP Cal Henderson
  40. 40. Any Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×