PyGrunn2013 High Performance Web Applications with TurboGears

High Performance
Web Applications
with Python and TurboGears
Alessandro Molina - @__amol__ - amol@turbogears.org

About me
● Python Developer since 2001
● Work @ on Python and iOS
● development team member since 2.1

What's this talk about?
● Some general rules which apply to any framework
● Some quick-wins for TurboGears
● Some real cases reported
● My personal experiences and preferences, feel free
to disagree!

Raw Speed
● People seem obsessed with webservers
● The truth is that it doesn't matter so much
○ You are not going to serve "Hello World"s
(If you are asking my personal stack is
nginx+mod_wsgi or nginx+Circus-Chaussette-gevent)
● Avoid the great idea of serving mostly empty pages
performing hundreds of ajax requests
○ Browser have limited concurrency
○ HTTP has overhead
○ You will actually slow down things
● Learn your framework for real

About TurboGears
● Framework for rapid development encouraging
flexibility
● Created in 2005, 2.0 was a major rewrite in 2009 to
embrace the WSGI standard.
● Object Dispatch based. Regular expressions can get
messy, write them only when you must.
● By default an XML template engine with error detection
● Declarative Models with transactional unit of work
● Built in Validation, Authentication, Authorization,
Caching, Sessions, Migrations, MongoDB Support and
many more.

Looking at the code
class RootController(BaseController):
@expose('myproj.templates.movie')
@expose('json')
@validate({'movie':SQLAEntityConverter(model.Movie)}
def movie(self, movie, **kw):
return dict(movie=movie, user=request.identity and request.identity['user'])
Serving /movie/3 as a webpage and /movie/3.json as a
json encoded response

Features vs Speed
● TurboGears is a full-stack framework. That makes it
quite slow by default!
● The team invested effort to constantly speed it up since
2.1 release.
● Still keeping all the
features around has
its price
● To cope with this
minimal mode got
introduced

Use only what you need
● Only use what you really need. Disabling some features
can make a big difference:
○ full featured -> ~900 req/sec
○ browser language detection -> ~1000 req/sec
○ widgets support -> ~1200 req/sec
○ sessions -> ~1300 req/sec
○ transaction manager -> ~1400 req/sec
○ minimal mode -> ~2100 req/sec
Measures are on wsgiref, purpose is only to show delta

Avoid serving statics
Cascading files serving is a common pattern:
static_app = StaticURLParser('statics')
app = Cascade([static_app, app])
What it is really happening is a lot:
○ path gets parsed to avoid ../../../etc/passwd
○ path gets checked on file system
○ a 404 response is generated
○ The 404 response is caught by the Cascade
middleware that forwards the requests to your app

Using Caching
● Caching means preorganizing your data the way you are
going to use it, if you already did that during design
phase you are already half done. Let the template drive
your data, not the opposite.
● Frameworks usually provide various type of caching.
TurboGears specifically provides:
○ @cached_property
○ tg.cache object for custom caching
○ @beaker_cache for controllers caching
○ Template Caching
○ Entity Caching

Use HTML5 & JS
● If only small portions of your page change, cache the
page and use JS to perform minor changes.
○ Invalidating your whole cache to say: "Welcome back
Mister X" is not a great idea.
● If you are using Varnish, nginx or any other frontend
cache consider using JS+localstorage instead of cookies
for trivial customizations. Cookies will skip frontend
caching

Template Caching
● Template caching means prerendering your
template based on controller computation results.
● It's common for template to access related entities,
those will be cached for you.
● If correctly organized it's the caching behavior with
best trade-off
○ Simple to implement
○ Guarantees correctly updates results
● An updated_at field on models is often all you need

WikiPage Caching
● WikiPage caching is the standard template caching
example in TurboGears documentation
@expose('wikir.templates.page')
@validate({'page':SQLAEntityConverter(model.WikiPage, slugified=True)},
error_handler=fail_with(404))
def _default(self, page, *args, **kw):
cache_key = '%s-%s' % (page.uid,
page.updated_at.strftime('%Y%m%d%H%M%S'))
return dict(wikipage=page, tg_cache={'key':cache_key,
'expire':24*3600,
'type':'memory'})

Caching Partials
● Case study: Notifications
○ Page delivered in 2.16s
○ Query took only 2ms
○ Most of the work was
actually in rendering
each notification
● Caching was useless as
notifications happened
often, constantly changing
content.

Entity Caching
Map each object to a partial: @entitycached decorator
makes easy to cache each notification by itself.
from tgext.datahelpers.caching import entitycached
@entitycached('notification')
def render_post(notification):
return Markup(notification.as_html)
● Page with cached notifications is now delivered in 158ms
● A single notification can be cached up for days, it will
never change.

Caching can be harmful
If you content changes
too often, caching on
first reuqest can
actually be harmful.
If you have multiple
processes and a lot of
requests you can end
up having a race
condition on cache
update.

Cache Stampede
● During football matches there were thousands of users
constantly pressing "refresh" button to reload page.
● Content constantly changed due to match being reported
on real time.
● After each update, all the running processes decided that
the cache was not valid anymore at the same time,
starting to regenerate the cache.
● Sometimes the content changed even while processes were
still updating cache for previous update.

Proactive Update
● To solve cache stampede the cache generation has been
bound to an hook on the article update so it happened only
once.
● Easy to achieve using Template caching in together with
tg.template_render on article id as cache key
● SQLALchemy @event.listens_for supports even notifications
on relationships, so it's reasy to update page cache even
when related comments, tags, and so on change.

A real solution
● The source of the issue were users pressing "reload" button
like there's no tomorrow.
● Solutions has been to push updates to the users through a
box that updates in real-time.
○ No more insane reloads
○ Users were actually more satisfied
○ Was a lot easier to maintain
○ Not only solved match article issues but also reduced
the load on other parts of the website

Think Different
● If you are struggling too much at improving performances,
you are probabling doing something your application is not
meant to do.
● Lesson learnt?
○ Soccer fans are eager for updates (no... for real!)
○ There is only one thing that gets more visits than a
football match: Rumors on football players trading

Offload Work
● The only person that know that something changes is the
author of the change itself.
○ Only update the core cache to provide author with an
immediate feedback
○ Don't be afraid of updating related caches
asynchronously. Author usually understands that it
might take some time before his changes propagate
and other users don't know that a change happened
yet.
● You can often provide an answer to user with little instant
computation, messages and notifications are a typical
example.

Maste-Slave replication is easy
● SQLAlchemy unit of work pattern makes easy for
frameworks to do the right thing 90% of the time
○ Read from slaves unless we are flushing the session
● Won't require changes to your code for most common cases
● Exceptions are as easy as @with_engine('master')
● As easy as
sqlalchemy.master.url = mysql://masterhost/dbname
sqlalchemy.slaves.slave1.url = mysql://slavehost/dbname
sqlalchemy.slaves.slave2.url = mysql://slavehost/dbname

Fast enough
● Speed should not be your primary focus, but it makes
sense to care a lot about it, users tend to get frustrated
by slow responses.
● New Relic App Speed Index reports an average of 5.0
seconds of response time for accepted experience.
● That is End-User time, not request time, to achieve 5
seconds you have to aim a lot lower
● Mean Opinion Score degrades quickly when surpassing
200ms. Less than 200ms is perceived as "right now".
http://newrelic.com/ASI

Development Tools
● It's easy to introduce changes with heavy impacts on
performances without noticing. Development tools can
help keeping under control impact of changes
● The DebugBar provides core utilities to track your
application speed while developing:
○ Controller Profiling
○ Template & Partials Timing
○ Query Reporting
○ Query Logging for AJAX/JSON requests

Check even after release
● Users use your application more widely than you
might have expected
● Fast now, doesn't mean fast forever. Like test units
avoid breaking things, rely on a speed feedback to
keep acceptable speed.
● Keep your Apdex T index updated, user
expectations evolve!

There is no silver bullet
● Sorry, there is no silver bullet.
● Every application is a separate case, general and
framework optimizations can usually provide little
benefit when compared to domain specific
optimizations
● Understanding how users interact with your application
is the golden rule of optimization
● Don't understimate how its easy to do something really
slow unconsciously: development tools can help
catching those.

PyGrunn2013 High Performance Web Applications with TurboGears

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to PyGrunn2013 High Performance Web Applications with TurboGears

Similar to PyGrunn2013 High Performance Web Applications with TurboGears (20)

More from Alessandro Molina

More from Alessandro Molina (16)

Recently uploaded

Recently uploaded (20)

PyGrunn2013 High Performance Web Applications with TurboGears