Building a scalable online backup system in python


Published on

An overview of the design and architecture of the PutPlace online backup system.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Building a scalable online backup system in python

  1. 1. Building a Scalable Online Backup System in Python<br />Joe Drumgoole<br />http://twitter/jdrumgoole<br />
  2. 2. Scaling<br />You probably shouldn’t care<br />Throughput vs response time<br />Scaling is a fractal problem<br />The database is what will get ya!<br />Amazing what a well tuned DB will support<br /><br />2<br />
  3. 3. PutPlace Architecture<br /><br />3<br />
  4. 4. Online Backup<br />Not really a Web 2.0 play<br />More like client server<br />Larger vision of PutPlace<br />Map of your Digital World<br /><br />4<br />
  5. 5. Online Backup : Client<br />Installation and support of Windows 20**<br />Mac Support<br />Open file/locked file handling<br />Bandwidth throttling<br />CPU Throttling<br />Upload restarts<br />Feedback<br /><br />5<br />
  6. 6. Online Backup : Server<br />Don’t loose any files<br />De-duplication<br />Thumbnail generation for images<br />Flickr Backup<br />Client Feedback<br />Bulk download<br />File relationships<br /><br />6<br />
  7. 7. Online Backup - Secrets<br />People Don’t backup<br />Compute dominates<br />Restores represent 0.01% of bandwidth and load<br />Writing web clones of Windows Explorer is hard<br />The browser sucks as a client side app container (for now)<br /><br />7<br />
  8. 8. Scaling<br />For online backup the challenge is to receive shed loads of data from lots of clients<br />Clients upload in 1MB chunks<br />Chunks must be stored coalesced and push to stable backup (S3)<br />Clients must get acknowledgement<br />Web page must update<br />Quota management<br /><br />8<br />
  9. 9. Load Balancer<br />Load Balancer : Perlbal<br /><br />Can handle 100 x millions of requests per day<br />Event based<br />(sshhh : Don’t tell anyone, but its Perl!)<br />It does fall over occasionally<br />Otherwise works perfectly<br /><br />9<br />
  10. 10. App Server<br />Our app servers:<br />Handle login<br />Deliver web pages<br />Handle uploads from clients<br />Hand off heavy duty processing to task servers<br />Thumbnail generation<br />File coalescing<br />Checksum generation<br />Hand off is via a database queue<br /><br />10<br />
  11. 11. App Server<br />Just Django Instances<br />Templates deliver web pages<br />Views handle chunks/login etc.<br />Models update the database<br />Task Servers do the heavy lifting<br /><br />11<br />
  12. 12. Task Server<br />Run off a database queue (table)<br />Four main task servers:<br />Assemble completed file uploads<br />Create thumbnails<br />Remove deleted files<br />Generate user statistics<br />Servers are multi-threaded<br /><br />12<br />
  13. 13. Refactoring<br />Originally N blacknight servers writing to NFS<br />Then N blacknight servers writing to S3<br />Then N EC2 servers writing to S3<br />The N EC2 servers writing to MogileFS/S3<br />Lots of uploading optimisations along the way<br /><br />13<br />
  14. 14. Results<br />System has successfully uploaded over 100k files in a single day<br />Regularily does 50k files a day<br />Have about 2k registered users<br />Continues to get registrations<br />Runs in lights out mode (no daily/weekly/monthly housekeeping)<br /><br />14<br />
  15. 15. What worked<br />Python proved extremely flexible<br />Standard library saved us lots of work<br />Django provided a lot of glue<br />Easy to migrate from dedicated host on NFS to Cloud Hosting and S3 storage<br />Nagios/Monitis monitoring<br /><br />15<br />
  16. 16. What Didn’t Work<br />Would use MySQL rather than Postgres<br />Easier to cluster, more knowledge available<br />Native Windows Client<br />Unecessary, Python client was good enough<br />Would use an off the shelf queueing system<br />RabbitMQ, ActiveMQ, SQS<br />Kludgey client side API<br />Threading The Client<br /><br />16<br />
  17. 17. Tool Chain<br /> : Subversion and Trac<br />DynDNS: Dynamic DNS<br />Python/Django: Dev Stack<br />Postgres: Database<br />Hudson : Build Server<br />Perlbal: Load Balancing<br />MogileFS : Distributed File System<br />Memcached : Caching<br />Nagios, Monitis: Monitoring<br />Hamachi : VPN through Firewall<br />Google Apps : Email, Calendar, Docs, Wiki<br />AuthSMTP : Validated SMTP<br />Zendesk: Support Desk<br />Amazon : Storage, Compute, Bandwidth<br />Paypal : Billing<br /><br />17<br />
  18. 18. Costs<br />Capital Expenditure<br />One server 5k euro<br />One laptop per developer 2.5k (7 devs)<br />One Linksys WIFI/Firewall (won at Raffle)<br />Two 24 port switches 1.6k<br />Total: ~24k<br />Running Costs for Grid and Storage<br />~1800 euro a month (8 instances)<br /><br />18<br />
  19. 19. If I Were Doing it Again<br />Stick with native python client<br />Look at eventing ala Node.js for server<br />Use MySQL<br />Use Google App Engine as Front End/Load Balancer<br />Use a commercial queueing package<br /><br />19<br />
  20. 20. Thanks<br />Q&A<br /><br />20<br />