Building a scalable online backup system in python

  • 2,761 views
Uploaded on

An overview of the design and architecture of the PutPlace online backup system.

An overview of the design and architecture of the PutPlace online backup system.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,761
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
35
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building a Scalable Online Backup System in Python
    Joe Drumgoole
    http://twitter/jdrumgoole
  • 2. Scaling
    You probably shouldn’t care
    Throughput vs response time
    Scaling is a fractal problem
    The database is what will get ya!
    Amazing what a well tuned DB will support
    http://twitter.com/jdrumgoole
    2
  • 3. PutPlace Architecture
    http://twitter.com/jdrumgoole
    3
  • 4. Online Backup
    Not really a Web 2.0 play
    More like client server
    Larger vision of PutPlace
    Map of your Digital World
    http://twitter.com/jdrumgoole
    4
  • 5. Online Backup : Client
    Installation and support of Windows 20**
    Mac Support
    Open file/locked file handling
    Bandwidth throttling
    CPU Throttling
    Upload restarts
    Feedback
    http://twitter.com/jdrumgoole
    5
  • 6. Online Backup : Server
    Don’t loose any files
    De-duplication
    Thumbnail generation for images
    Flickr Backup
    Client Feedback
    Bulk download
    File relationships
    http://twitter.com/jdrumgoole
    6
  • 7. Online Backup - Secrets
    People Don’t backup
    Compute dominates
    Restores represent 0.01% of bandwidth and load
    Writing web clones of Windows Explorer is hard
    The browser sucks as a client side app container (for now)
    http://twitter.com/jdrumgoole
    7
  • 8. Scaling
    For online backup the challenge is to receive shed loads of data from lots of clients
    Clients upload in 1MB chunks
    Chunks must be stored coalesced and push to stable backup (S3)
    Clients must get acknowledgement
    Web page must update
    Quota management
    http://twitter.com/jdrumgoole
    8
  • 9. Load Balancer
    Load Balancer : Perlbal
    http://www.danga.com/perlbal/
    Can handle 100 x millions of requests per day
    Event based
    (sshhh : Don’t tell anyone, but its Perl!)
    It does fall over occasionally
    Otherwise works perfectly
    http://twitter.com/jdrumgoole
    9
  • 10. App Server
    Our app servers:
    Handle login
    Deliver web pages
    Handle uploads from clients
    Hand off heavy duty processing to task servers
    Thumbnail generation
    File coalescing
    Checksum generation
    Hand off is via a database queue
    http://twitter.com/jdrumgoole
    10
  • 11. App Server
    Just Django Instances
    Templates deliver web pages
    Views handle chunks/login etc.
    Models update the database
    Task Servers do the heavy lifting
    http://twitter.com/jdrumgoole
    11
  • 12. Task Server
    Run off a database queue (table)
    Four main task servers:
    Assemble completed file uploads
    Create thumbnails
    Remove deleted files
    Generate user statistics
    Servers are multi-threaded
    http://twitter.com/jdrumgoole
    12
  • 13. Refactoring
    Originally N blacknight servers writing to NFS
    Then N blacknight servers writing to S3
    Then N EC2 servers writing to S3
    The N EC2 servers writing to MogileFS/S3
    Lots of uploading optimisations along the way
    http://twitter.com/jdrumgoole
    13
  • 14. Results
    System has successfully uploaded over 100k files in a single day
    Regularily does 50k files a day
    Have about 2k registered users
    Continues to get registrations
    Runs in lights out mode (no daily/weekly/monthly housekeeping)
    http://twitter.com/jdrumgoole
    14
  • 15. What worked
    Python proved extremely flexible
    Standard library saved us lots of work
    Django provided a lot of glue
    Easy to migrate from dedicated host on NFS to Cloud Hosting and S3 storage
    Nagios/Monitis monitoring
    http://twitter.com/jdrumgoole
    15
  • 16. What Didn’t Work
    Would use MySQL rather than Postgres
    Easier to cluster, more knowledge available
    Native Windows Client
    Unecessary, Python client was good enough
    Would use an off the shelf queueing system
    RabbitMQ, ActiveMQ, SQS
    Kludgey client side API
    Threading The Client
    http://twitter.com/jdrumgoole
    16
  • 17. Tool Chain
    Wush.net : Subversion and Trac
    DynDNS: Dynamic DNS
    Python/Django: Dev Stack
    Postgres: Database
    Hudson : Build Server
    Perlbal: Load Balancing
    MogileFS : Distributed File System
    Memcached : Caching
    Nagios, Monitis: Monitoring
    Hamachi : VPN through Firewall
    Google Apps : Email, Calendar, Docs, Wiki
    AuthSMTP : Validated SMTP
    Zendesk: Support Desk
    Amazon : Storage, Compute, Bandwidth
    Paypal : Billing
    http://twitter.com/jdrumgoole
    17
  • 18. Costs
    Capital Expenditure
    One server 5k euro
    One laptop per developer 2.5k (7 devs)
    One Linksys WIFI/Firewall (won at Raffle)
    Two 24 port switches 1.6k
    Total: ~24k
    Running Costs for Grid and Storage
    ~1800 euro a month (8 instances)
    http://twitter.com/jdrumgoole
    18
  • 19. If I Were Doing it Again
    Stick with native python client
    Look at eventing ala Node.js for server
    Use MySQL
    Use Google App Engine as Front End/Load Balancer
    Use a commercial queueing package
    http://twitter.com/jdrumgoole
    19
  • 20. Thanks
    Q&A
    http://twitter.com/jdrumgoole
    20