Super Sizing Youtube with Python

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    9 Favorites

    Super Sizing Youtube with Python - Presentation Transcript

    1. Super-sizing YouTube with Python Mike Solomon mike@youtube.com
    2. Welcome this is about scaling a web application there are a lot of things left out - mostly mistakes and implementation details this may generate more questions than it answers my goal is to give you ideas for solving your own problems
    3. Architecture this is the core of scalability systems change over time, so will your architecture impossible to predict the optimal approach start simple aim for local maxima python enables flexibility
    4. YouTube's Early Days web boxes do everything servlets, images, thumbnails, search shoehorn everything into Apache, MySQL very simple this survives longer than you'd think
    5. hw load balancer httpd mod_python db objects search thumbnails biz logic servlets templates Early Web Stack db master circa January ‘06 db replicas
    6. Early Key Factors in Engineering really small team we python logical separation in code discipline and honor - not linguistically enforced (don’t waste time writing code to restrict people)* grown by systematically removing bottlenecks easy to know when something is a `win`
    7. Running Without Tripping user demand can grow 50% in a day removing one bottleneck can immediately reveal another (usually more heinous) replace and migrate components as they become problems good (python) components make this easy obviously, pick your battles
    8. Good Components (Hypothetical) minimize dependencies* accept some latency localize failures - don’t let them spread you are only down if it looks like you are applies to both systems and software
    9. Balance Machine Resources more efficient resource utilization via specialized deployment balance based on CPU, RAM, network and disk usage patterns overlay orthogonal loads disjoint tasks running on the same physical hardware
    10. Migratory Patterns of the Norwegian Blue move from mod_python to mod_fastcgi move thumbnails to their own machines make search to a remote service running on separate machines run transcoder processes on video servers do more with the same hardware
    11. Serenity Now Can you spot where we turned on transcoding processes?
    12. SQL Shenanigans if you have a relational database, it will be abused difficult to track the true source series of object proxies for DB-API enable logging encode a portion of call stack as a query comment* (more about this later)
    13. Object Caching take pressure off of relational db can save additional resources if your objects require significant computation to set up memcached makes a good home for this need good client to make this into a truly useful service ‡ pools and better failure handling
    14. Software Optimization fast vs fast enough strive for machine efficiency - don't obsess be scientific - collect data and understand it can yield some surprising results don't assume code optimization techniques from another language are relevant just like carpentry, measure twice cut once
    15. Python Optimization pure python HMAC was 40% of web cpu write a few lines of C threaded comments fiasco overly complex algorithm to compute the display object tree simplify query, simplify algorithm
    16. Python Optimization psyco - specializing compiler for Python 'hot' functions are psyco-ized there is a 'context switch' penalty so you need to experiment to see if it helps previous threaded comments algorithm -closure +psyco = 400% boost
    17. Reasonable Efficiency pruned all the obvious leaf services dynamic web requests are one `service` web service is easy to scale, so it stresses out other resources - probably a DB DB’s are hard(er) to scale tricks of escalating cleverness‡ eventually, no cards left to play
    18. Scaling MySQL pretty much have to go horizontal choose your partition plan carefully understand your data access patterns what queries do you run most often? do you have joins? do you need transactional consistency? why? does an 'entity' emerge?
    19. Partition By Entity entities are 'transactional' allow joins across properties of an entity entities are migratory cross entity is more complicated weaken guarantees to make it easier minimize activity by design
    20. EMD, a TLA not an ORM! connection and transaction management lookup service query factory minimalist table abstraction ORM can be (is?) evil make common behaviors simple, while leaving some transparency to the actual database
    21. Seismic Retrofit apply this fundamental change to a large and growing site make it relatively painless with python multiple inheritance decorators AST plugins for validation and testing
    22. Resulting API all the scale-aware code nicely opaque to application developers base use cases are painless User.select_by_username(db_context, username) Video.select_by_id(db_context, video_id) Video.select_by_user_id(db_context, user_id)
    23. Bulk Entity Migration hijack mysql replication to partition on the fly while the live site is running all DML gets tagged with an entity id read master binlog and selectively replay it into a set of new mini-masters update lookup service to point to new resources
    24. Recurring Themes the elegance of simplicity take reliable open software and customize it `pythonic veneer` DIY - filing a ticket for a bugfix doesn’t give me a warm feeling - take matters into your own hands*
    25. Questions?

    + didipdidip, 2 years ago

    custom

    4103 views, 9 favs, 2 embeds more stats

    by Mike Solomon.

    See more scalability tales at: more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 4103
      • 4063 on SlideShare
      • 40 from embeds
    • Comments 0
    • Favorites 9
    • Downloads 131
    Most viewed embeds
    • 38 views on http://rapd.wordpress.com
    • 2 views on https://be.dimensional.com

    more

    All embeds
    • 38 views on http://rapd.wordpress.com
    • 2 views on https://be.dimensional.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories