Super Sizing Youtube with Python


Published on

by Mike Solomon.

See more scalability tales at:

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Super Sizing Youtube with Python

  1. Super-sizing YouTube with Python Mike Solomon
  2. Welcome this is about scaling a web application there are a lot of things left out - mostly mistakes and implementation details this may generate more questions than it answers my goal is to give you ideas for solving your own problems
  3. Architecture this is the core of scalability systems change over time, so will your architecture impossible to predict the optimal approach start simple aim for local maxima python enables flexibility
  4. YouTube's Early Days web boxes do everything servlets, images, thumbnails, search shoehorn everything into Apache, MySQL very simple this survives longer than you'd think
  5. hw load balancer httpd mod_python db objects search thumbnails biz logic servlets templates Early Web Stack db master circa January ‘06 db replicas
  6. Early Key Factors in Engineering really small team we python logical separation in code discipline and honor - not linguistically enforced (don’t waste time writing code to restrict people)* grown by systematically removing bottlenecks easy to know when something is a `win`
  7. Running Without Tripping user demand can grow 50% in a day removing one bottleneck can immediately reveal another (usually more heinous) replace and migrate components as they become problems good (python) components make this easy obviously, pick your battles
  8. Good Components (Hypothetical) minimize dependencies* accept some latency localize failures - don’t let them spread you are only down if it looks like you are applies to both systems and software
  9. Balance Machine Resources more efficient resource utilization via specialized deployment balance based on CPU, RAM, network and disk usage patterns overlay orthogonal loads disjoint tasks running on the same physical hardware
  10. Migratory Patterns of the Norwegian Blue move from mod_python to mod_fastcgi move thumbnails to their own machines make search to a remote service running on separate machines run transcoder processes on video servers do more with the same hardware
  11. Serenity Now Can you spot where we turned on transcoding processes?
  12. SQL Shenanigans if you have a relational database, it will be abused difficult to track the true source series of object proxies for DB-API enable logging encode a portion of call stack as a query comment* (more about this later)
  13. Object Caching take pressure off of relational db can save additional resources if your objects require significant computation to set up memcached makes a good home for this need good client to make this into a truly useful service ‡ pools and better failure handling
  14. Software Optimization fast vs fast enough strive for machine efficiency - don't obsess be scientific - collect data and understand it can yield some surprising results don't assume code optimization techniques from another language are relevant just like carpentry, measure twice cut once
  15. Python Optimization pure python HMAC was 40% of web cpu write a few lines of C threaded comments fiasco overly complex algorithm to compute the display object tree simplify query, simplify algorithm
  16. Python Optimization psyco - specializing compiler for Python 'hot' functions are psyco-ized there is a 'context switch' penalty so you need to experiment to see if it helps previous threaded comments algorithm -closure +psyco = 400% boost
  17. Reasonable Efficiency pruned all the obvious leaf services dynamic web requests are one `service` web service is easy to scale, so it stresses out other resources - probably a DB DB’s are hard(er) to scale tricks of escalating cleverness‡ eventually, no cards left to play
  18. Scaling MySQL pretty much have to go horizontal choose your partition plan carefully understand your data access patterns what queries do you run most often? do you have joins? do you need transactional consistency? why? does an 'entity' emerge?
  19. Partition By Entity entities are 'transactional' allow joins across properties of an entity entities are migratory cross entity is more complicated weaken guarantees to make it easier minimize activity by design
  20. EMD, a TLA not an ORM! connection and transaction management lookup service query factory minimalist table abstraction ORM can be (is?) evil make common behaviors simple, while leaving some transparency to the actual database
  21. Seismic Retrofit apply this fundamental change to a large and growing site make it relatively painless with python multiple inheritance decorators AST plugins for validation and testing
  22. Resulting API all the scale-aware code nicely opaque to application developers base use cases are painless User.select_by_username(db_context, username) Video.select_by_id(db_context, video_id) Video.select_by_user_id(db_context, user_id)
  23. Bulk Entity Migration hijack mysql replication to partition on the fly while the live site is running all DML gets tagged with an entity id read master binlog and selectively replay it into a set of new mini-masters update lookup service to point to new resources
  24. Recurring Themes the elegance of simplicity take reliable open software and customize it `pythonic veneer` DIY - filing a ticket for a bugfix doesn’t give me a warm feeling - take matters into your own hands*
  25. Questions?