SlideShare a Scribd company logo
1 of 20
Building a Scalable Online Backup System in Python Joe Drumgoole http://twitter/jdrumgoole
Scaling You probably shouldn’t care Throughput vs response time Scaling is a fractal problem The database is what will get ya! Amazing what a well tuned DB will support http://twitter.com/jdrumgoole 2
PutPlace Architecture http://twitter.com/jdrumgoole 3
Online Backup Not really a Web 2.0 play More like client server Larger vision of PutPlace Map of your Digital World http://twitter.com/jdrumgoole 4
Online Backup : Client Installation and support of Windows 20** Mac Support Open file/locked file handling Bandwidth throttling CPU Throttling Upload restarts Feedback http://twitter.com/jdrumgoole 5
Online Backup : Server Don’t loose any files De-duplication Thumbnail generation for images Flickr Backup Client Feedback Bulk download File relationships http://twitter.com/jdrumgoole 6
Online Backup - Secrets People Don’t backup Compute dominates Restores represent 0.01% of bandwidth and load Writing web clones of Windows Explorer is hard The browser sucks as a client side app container (for now) http://twitter.com/jdrumgoole 7
Scaling For online backup the challenge is to receive shed loads of data from lots of clients Clients upload in 1MB chunks Chunks must be stored coalesced and push to stable backup (S3) Clients must get acknowledgement Web page must update Quota management http://twitter.com/jdrumgoole 8
Load Balancer Load Balancer : Perlbal http://www.danga.com/perlbal/ Can handle 100 x millions of requests per day Event based (sshhh : Don’t tell anyone, but its Perl!) It does fall over occasionally Otherwise works perfectly http://twitter.com/jdrumgoole 9
App Server Our app servers: Handle login Deliver web pages Handle uploads from clients Hand off heavy duty processing to task servers Thumbnail generation File coalescing Checksum generation Hand off is via a database queue http://twitter.com/jdrumgoole 10
App Server Just Django Instances Templates deliver web pages Views handle chunks/login etc. Models update the database Task Servers do the heavy lifting http://twitter.com/jdrumgoole 11
Task Server Run off a database queue (table) Four main task servers: Assemble completed file uploads Create thumbnails Remove deleted files Generate user statistics Servers are multi-threaded http://twitter.com/jdrumgoole 12
Refactoring Originally N blacknight servers writing to NFS Then N blacknight servers writing to S3 Then N EC2 servers writing to S3 The N EC2 servers writing to MogileFS/S3 Lots of uploading optimisations along the way http://twitter.com/jdrumgoole 13
Results System has successfully uploaded over 100k files in a single day Regularily does 50k files a day Have about 2k registered users Continues to get registrations Runs in lights out mode (no daily/weekly/monthly housekeeping) http://twitter.com/jdrumgoole 14
What worked Python proved extremely flexible Standard library saved us lots of work Django provided a lot of glue Easy to  migrate from dedicated host on NFS to Cloud Hosting and S3 storage Nagios/Monitis monitoring http://twitter.com/jdrumgoole 15
What Didn’t Work Would use MySQL rather than Postgres Easier to cluster, more knowledge available Native Windows Client Unecessary, Python client was good enough Would use an off the shelf queueing system RabbitMQ, ActiveMQ, SQS Kludgey client side API Threading The Client http://twitter.com/jdrumgoole 16
Tool Chain Wush.net 		: Subversion and Trac DynDNS: Dynamic DNS Python/Django: Dev Stack Postgres: Database Hudson		: Build Server Perlbal: Load Balancing MogileFS		: Distributed File System Memcached 		: Caching Nagios, Monitis: Monitoring Hamachi		: VPN through Firewall Google Apps	: Email, Calendar, Docs, Wiki AuthSMTP		: Validated SMTP Zendesk: Support Desk Amazon		: Storage, Compute, Bandwidth Paypal		: Billing http://twitter.com/jdrumgoole 17
Costs Capital Expenditure One server 5k euro One laptop per developer 2.5k (7 devs) One Linksys WIFI/Firewall (won at Raffle) Two 24 port switches 1.6k Total: ~24k Running Costs for Grid and Storage ~1800 euro a month (8 instances) http://twitter.com/jdrumgoole 18
If I Were Doing it Again Stick with native python client Look at eventing ala Node.js for server Use MySQL Use Google App Engine as Front End/Load Balancer Use a commercial queueing package http://twitter.com/jdrumgoole 19
Thanks Q&A http://twitter.com/jdrumgoole 20

More Related Content

What's hot

Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeFastly
 
4.2. Web analyst fiddler
4.2. Web analyst fiddler4.2. Web analyst fiddler
4.2. Web analyst fiddlerdefconmoscow
 
HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016Daniel Stenberg
 
I got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneI got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneAdrian Cole
 
Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Matthew Wise
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP ApisAdrian Cole
 
Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Kok Hoor Chew
 
Debugging with Fiddler
Debugging with FiddlerDebugging with Fiddler
Debugging with FiddlerIdo Flatow
 
Websockets at tossug
Websockets at tossugWebsockets at tossug
Websockets at tossugclkao
 
HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?Alessandro Nadalin
 
HTTP/2 for Developers
HTTP/2 for DevelopersHTTP/2 for Developers
HTTP/2 for DevelopersSvetlin Nakov
 
Introduction to HTTP/2
Introduction to HTTP/2Introduction to HTTP/2
Introduction to HTTP/2Ido Flatow
 
HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? Sigma Software
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017ElifTech
 
HTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenHTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenCodemotion
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 IntroductionWalter Liu
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itOtto Kekäläinen
 
Getting the most out of WebPageTest
Getting the most out of WebPageTestGetting the most out of WebPageTest
Getting the most out of WebPageTestPatrick Meenan
 
Scaling my sql_in_3d
Scaling my sql_in_3dScaling my sql_in_3d
Scaling my sql_in_3dsarahnovotny
 

What's hot (20)

Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 
4.2. Web analyst fiddler
4.2. Web analyst fiddler4.2. Web analyst fiddler
4.2. Web analyst fiddler
 
HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016
 
I got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneI got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't one
 
Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP Apis
 
Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015
 
Debugging with Fiddler
Debugging with FiddlerDebugging with Fiddler
Debugging with Fiddler
 
Websockets at tossug
Websockets at tossugWebsockets at tossug
Websockets at tossug
 
HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?
 
HTTP/2 for Developers
HTTP/2 for DevelopersHTTP/2 for Developers
HTTP/2 for Developers
 
Introduction to HTTP/2
Introduction to HTTP/2Introduction to HTTP/2
Introduction to HTTP/2
 
HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know?
 
Web sockets in Java
Web sockets in JavaWeb sockets in Java
Web sockets in Java
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
HTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenHTTP 2.0 Why, How and When
HTTP 2.0 Why, How and When
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 Introduction
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
 
Getting the most out of WebPageTest
Getting the most out of WebPageTestGetting the most out of WebPageTest
Getting the most out of WebPageTest
 
Scaling my sql_in_3d
Scaling my sql_in_3dScaling my sql_in_3d
Scaling my sql_in_3d
 

Similar to Building a scalable online backup system in python

Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowAaron Peters
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?Andy Davies
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - StockholmAndy Davies
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzAlessandro Nadalin
 
Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Binary Studio
 
The case for HTTP/2
The case for HTTP/2The case for HTTP/2
The case for HTTP/2GreeceJS
 
The Case for HTTP/2 - GreeceJS - June 2016
The Case for HTTP/2 -  GreeceJS - June 2016The Case for HTTP/2 -  GreeceJS - June 2016
The Case for HTTP/2 - GreeceJS - June 2016Andy Davies
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2jeperkins4
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontendtkramar
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedAndy Kucharski
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upJonathan Lee
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt DowleSri Ambati
 
Meetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceMeetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceJean Tunis
 
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Holger Bartel
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedPromet Source
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentreSteve Loughran
 

Similar to Building a scalable online backup system in python (20)

Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and How
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in Mainz
 
Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"
 
The case for HTTP/2
The case for HTTP/2The case for HTTP/2
The case for HTTP/2
 
The Case for HTTP/2 - GreeceJS - June 2016
The Case for HTTP/2 -  GreeceJS - June 2016The Case for HTTP/2 -  GreeceJS - June 2016
The Case for HTTP/2 - GreeceJS - June 2016
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontend
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-up
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowle
 
Meetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceMeetup Tech Talk on Web Performance
Meetup Tech Talk on Web Performance
 
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentre
 

More from Joe Drumgoole

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignJoe Drumgoole
 
The Rise of Microservices
The Rise of MicroservicesThe Rise of Microservices
The Rise of MicroservicesJoe Drumgoole
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
Server discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBServer discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBJoe Drumgoole
 
Event sourcing the best ubiquitous pattern you have never heard off
Event sourcing   the best ubiquitous pattern you have never heard offEvent sourcing   the best ubiquitous pattern you have never heard off
Event sourcing the best ubiquitous pattern you have never heard offJoe Drumgoole
 
EuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverEuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverJoe Drumgoole
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLJoe Drumgoole
 
Introduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingIntroduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingJoe Drumgoole
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLJoe Drumgoole
 
Cloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionCloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionJoe Drumgoole
 
Enterprise mobility for fun and profit
Enterprise mobility for fun and profitEnterprise mobility for fun and profit
Enterprise mobility for fun and profitJoe Drumgoole
 
How to run a company for 2k a year
How to run a company for 2k a yearHow to run a company for 2k a year
How to run a company for 2k a yearJoe Drumgoole
 
Mobile monday mhealth
Mobile monday mhealthMobile monday mhealth
Mobile monday mhealthJoe Drumgoole
 
Harness the web and grow your business
Harness the web and grow your businessHarness the web and grow your business
Harness the web and grow your businessJoe Drumgoole
 

More from Joe Drumgoole (20)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
The Rise of Microservices
The Rise of MicroservicesThe Rise of Microservices
The Rise of Microservices
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB Application
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
Server discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBServer discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDB
 
Event sourcing the best ubiquitous pattern you have never heard off
Event sourcing   the best ubiquitous pattern you have never heard offEvent sourcing   the best ubiquitous pattern you have never heard off
Event sourcing the best ubiquitous pattern you have never heard off
 
EuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverEuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo Driver
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingIntroduction to CQRS and Event Sourcing
Introduction to CQRS and Event Sourcing
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in Documents
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQL
 
Cloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionCloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolution
 
Enterprise mobility for fun and profit
Enterprise mobility for fun and profitEnterprise mobility for fun and profit
Enterprise mobility for fun and profit
 
How to run a company for 2k a year
How to run a company for 2k a yearHow to run a company for 2k a year
How to run a company for 2k a year
 
Mobile monday mhealth
Mobile monday mhealthMobile monday mhealth
Mobile monday mhealth
 
Cloudsplit original
Cloudsplit originalCloudsplit original
Cloudsplit original
 
Harness the web and grow your business
Harness the web and grow your businessHarness the web and grow your business
Harness the web and grow your business
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Building a scalable online backup system in python

  • 1. Building a Scalable Online Backup System in Python Joe Drumgoole http://twitter/jdrumgoole
  • 2. Scaling You probably shouldn’t care Throughput vs response time Scaling is a fractal problem The database is what will get ya! Amazing what a well tuned DB will support http://twitter.com/jdrumgoole 2
  • 4. Online Backup Not really a Web 2.0 play More like client server Larger vision of PutPlace Map of your Digital World http://twitter.com/jdrumgoole 4
  • 5. Online Backup : Client Installation and support of Windows 20** Mac Support Open file/locked file handling Bandwidth throttling CPU Throttling Upload restarts Feedback http://twitter.com/jdrumgoole 5
  • 6. Online Backup : Server Don’t loose any files De-duplication Thumbnail generation for images Flickr Backup Client Feedback Bulk download File relationships http://twitter.com/jdrumgoole 6
  • 7. Online Backup - Secrets People Don’t backup Compute dominates Restores represent 0.01% of bandwidth and load Writing web clones of Windows Explorer is hard The browser sucks as a client side app container (for now) http://twitter.com/jdrumgoole 7
  • 8. Scaling For online backup the challenge is to receive shed loads of data from lots of clients Clients upload in 1MB chunks Chunks must be stored coalesced and push to stable backup (S3) Clients must get acknowledgement Web page must update Quota management http://twitter.com/jdrumgoole 8
  • 9. Load Balancer Load Balancer : Perlbal http://www.danga.com/perlbal/ Can handle 100 x millions of requests per day Event based (sshhh : Don’t tell anyone, but its Perl!) It does fall over occasionally Otherwise works perfectly http://twitter.com/jdrumgoole 9
  • 10. App Server Our app servers: Handle login Deliver web pages Handle uploads from clients Hand off heavy duty processing to task servers Thumbnail generation File coalescing Checksum generation Hand off is via a database queue http://twitter.com/jdrumgoole 10
  • 11. App Server Just Django Instances Templates deliver web pages Views handle chunks/login etc. Models update the database Task Servers do the heavy lifting http://twitter.com/jdrumgoole 11
  • 12. Task Server Run off a database queue (table) Four main task servers: Assemble completed file uploads Create thumbnails Remove deleted files Generate user statistics Servers are multi-threaded http://twitter.com/jdrumgoole 12
  • 13. Refactoring Originally N blacknight servers writing to NFS Then N blacknight servers writing to S3 Then N EC2 servers writing to S3 The N EC2 servers writing to MogileFS/S3 Lots of uploading optimisations along the way http://twitter.com/jdrumgoole 13
  • 14. Results System has successfully uploaded over 100k files in a single day Regularily does 50k files a day Have about 2k registered users Continues to get registrations Runs in lights out mode (no daily/weekly/monthly housekeeping) http://twitter.com/jdrumgoole 14
  • 15. What worked Python proved extremely flexible Standard library saved us lots of work Django provided a lot of glue Easy to migrate from dedicated host on NFS to Cloud Hosting and S3 storage Nagios/Monitis monitoring http://twitter.com/jdrumgoole 15
  • 16. What Didn’t Work Would use MySQL rather than Postgres Easier to cluster, more knowledge available Native Windows Client Unecessary, Python client was good enough Would use an off the shelf queueing system RabbitMQ, ActiveMQ, SQS Kludgey client side API Threading The Client http://twitter.com/jdrumgoole 16
  • 17. Tool Chain Wush.net : Subversion and Trac DynDNS: Dynamic DNS Python/Django: Dev Stack Postgres: Database Hudson : Build Server Perlbal: Load Balancing MogileFS : Distributed File System Memcached : Caching Nagios, Monitis: Monitoring Hamachi : VPN through Firewall Google Apps : Email, Calendar, Docs, Wiki AuthSMTP : Validated SMTP Zendesk: Support Desk Amazon : Storage, Compute, Bandwidth Paypal : Billing http://twitter.com/jdrumgoole 17
  • 18. Costs Capital Expenditure One server 5k euro One laptop per developer 2.5k (7 devs) One Linksys WIFI/Firewall (won at Raffle) Two 24 port switches 1.6k Total: ~24k Running Costs for Grid and Storage ~1800 euro a month (8 instances) http://twitter.com/jdrumgoole 18
  • 19. If I Were Doing it Again Stick with native python client Look at eventing ala Node.js for server Use MySQL Use Google App Engine as Front End/Load Balancer Use a commercial queueing package http://twitter.com/jdrumgoole 19