1. Django Master Class
Jeremy Dunck • Jacob Kaplan-Moss • Simon Willison
Handouts for the tutorial given at OSCON, July 23th, 2007. Available online at
So here’s what we’ve got on the plate:
1. Unit testing (Simon). First because it’s important, dammit!
2. Stupid middleware tricks (Jacob). Make middleware work for
you, not against you.
3. Signals (Jeremy). Get notified when important things happen.
4. Forms & AJAX (Simon). Django’s !quot;#$%&'( library rocks,
and it goes great with this whole “AJAX” thing. Now you can
finally show your face in those cool “Web 2.0” cliques.
5. Template tag patterns (Jacob). Save time writing those repetitive
tags by factoring out common tasks.
6. Custom fields (Jeremy). Because not every piece of data is a
simple, primitive type.
7. OpenID (Simon). Learn the straight dope about OpenID, and see
how to integrate it into your Django site.
8. The “rest” of the stack (Jacob). Also known as “how to scale
your website by copying LiveJournal.”
9. GIS (Jeremy). Store data about our planet. Or another one.
First up: testing. If you pay attention to only one part of this tutorial,
make it this one.
Test Driven Development By Example (by Kent Beck) is the bible
If you have the discipline for it, this is a really rewarding way of
programming. It works particularly well if you are pair programming
with someone who can keep you on the straight and narrow.
The other end of the spectrum. Write tests only when you need them.
This is a really great way to tackle tricky bugs, where the hardest
problem is often replicating them. Replicate them with a test, then
solve them. The test will guarantee they don't come back to haunt
you again later.
I speak from experience here. I had a project with a beautiful test
suite. I let things lapse while dashing for a deadline. I still haven’t got
all the tests working again, which discourages me from running the
tests at all, which massively devalues the test suite.
Until I saw Ruby on Rails, I had basically resigned to the fact that
testing web apps was too hard to be worth doing, thanks to the
difficulties involved in testing something with external persistent
state (a database) and most interactions happening over HTTP.
Rails used fixtures to tackle the database testing problem, and
included a bunch of clever hooks for making everything else easy.
Django has since evolved a similar set of features, albeit with a
distinctly Pythonic flavour.
Doctests are used extensively by Django itself for unit testing the
ORM - they have the nice side-effect of doubling as documentation,
automatically generated for the website:
You are encouraged to use them for testing your own models as well;
Django's built in test runner will detect and execute them.
A naive approach.
It doesn't work for the edge cases. That's why tests should always
target the edge cases. You can try using using 365.25 instead, but it
still won't pass every test.
This passes all the tests. I'm ashamed to admit how long it took me to
get here; the tests were invaluable.
http://www.kottke.org/04/10/normalized-data discusses the quote in
more detail. Cal is the lead engineer on Flickr, and knows exactly
what it takes to build a system that scales to millions of users.
Denormalisation is an excellent way to speed up your queries - at a
cost of added complexity in your application code. It's an ideal case
study for unit testing.
Many online forums have a view which shows the most recent 20 or
so threads along with a count of the number of replies to each. This
can be a pretty expensive SQL query, and can be dramatically sped
up by denormalising the data.
Here, !)'*"+,-quot;( is the denormalised field. It stores the number
of replies that are attached to that thread - information that already
exists in the database (and is now stored twice).
Fixtures provide a way of pre-populating a database with test data -
great for writing tests against. These fixtures are saved in a file called
$%&)'.$-/0)"(.01"234%)!056(%!. You can easily
generate your own fixtures using this command:
To pretty-print the JSON, use this:
And for XML instead, do this:
If you’ve got PyYAML installed, you can also use ::$%&'20
This test case clears the database and loads our
01"234%)!056(%! fixtures before each test. It contains two
tests: one that adds a new reply to a thread, and another that deletes a
reply. The tests check that !)'*"+,-quot;( accurately reflects the
number of replies associated with a thread.
Our tests fail, because we don't have a mechanism for keeping the
counter in sync with the actual data yet.
By over-riding the save and delete methods on the <quot;+,8, we can
update the !)'*"+,-quot;( of the parent thread when a reply is
added or deleted.
The tests pass! It says 7 because I ran the test runner against a
project, which included a couple of other simple applications.
Bonus slide: here's an alternative way of solving the denormalised
counter problem, this time using signals instead of custom delete()
and save() methods.
If you’ve not yet learned about signals, don’t fret; Jeremy is going to
cover them in a little bit. Mark this slide and come back to it then...
Two tests here. The first simple checks that Django's trailing slash
adding middleware is configured correctly, and that ."7-(0quot;&.
returns a 200 (“OK”) status code.
The second checks that ."7-(0quot;&. uses the "7-(0quot;&510',
template, demonstrating both the verbose way of doing this and the
A more complex example. This illustrates two useful concepts:
POSTing to a form using 4,-quot;!05+%(0?@, and intercepting sent
e-mails using '2-,5%)0A%/.
Test cases that inherit from 362!7%50quot;(05=quot;(0B2(quot;
automatically hook in to Django's email framework and intercept
messages, so that instead of being sent out via SMTP they are stored
in a queue and made available for assertion testing.
A good rule for tests is that they should never interact with external
services, unless the services themselves are being tested.
More on testing with Django:
I've also used BeautifulSoup for running tests against the structure of
my HTML before, but I generally find this counter-productive as
HTML frequently changes during development without having much
of an impact on the functionality of the application.
Part the second: Middleware.
Most people understand a request/response cycle along these lines.
This is correct, of course, but it’s also overly simplistic; there are a
number of steps that this simple Request/View/Response
understanding leaves out.
In particular, it suggests that if we want certain behavior to happen
on each request, we’re forced to write it into a view (since the view is
the only part of this cycle that Django doesn’t control internally).
Often we want to perform tasks on each and every request -- think
about return gzipped content, for example. If Django was really this
simplistic, something along those lines would be basically
So this is how a request “really” works (and actually even this outline
simplifies things somewhat).
I don’t have time to go over all the intricate details here, but notice
the pieces of “middleware” that let you hook in at various points in
the cycle and override the default behavior. For example, you can see
that the request middleware can return a response and “short-circuit”
the entire view phase. This is how the caching framework is able to
work so fast: if the page is in the cache, the view doesn’t even need
to be called.
A note on terminology: we call this feature “middleware”, though
this term can be a bit misleading to folks with an “enterprisy”
background. Ruby on Rails calls its similar feature “filters”, which
would work well for Django (if not for the conflict with the naming
of template filters).
If you’re confused, think of middleware as essentially callbacks at
particular moments in a single request cycle.
Here’s a simple piece of middleware (modified from a post to
DjangoSnippets by “Leonidas”). In particular, this is a piece of
There’s really nothing special about middleware; a piece of
middleware is just a Python class that defines a particular API. Here,
by defining +&%4quot;((*"C)quot;(0?@, this object can be used as a
piece of request middleware.
A piece of middleware can define multiple handlers. It can also save
state as instance attributes (i.e. (quot;,$5$%%9D9E#120quot;Fquot;&G), but
note that for performance a single middleware instance is reused for
“Installing” a piece of middleware is as simple as registering it in
HIJJKLMN<L*BKNOOLO. This example shows some built-in
Django middleware along with the piece of middleware from the
The order of HIJJKLMN<L*BKNOOLO is important; middleware is
processed “top-down” during the request phase, and “bottom-up”
during the response phase.
Here’s another way of looking at it. Middleware is the onion skin
around the view; you can think of each middleware class as a “layer”
that wraps the view and can intercept data on its way in or out.
The four types of middleware callbacks.
It’s nasty and slimy, but a great example of request middleware is a
three-click paywall like some news sites use. That is, you get free
access to the site, but after your third page you get redirected to a
Pretty straightforward, but note that request middleware may return
an P00+<quot;(+%!(quot; or suitable subclass. If so, the rest of the request
is short-circuited and the view is never handled. However, the
middleware may also return Q%!quot;R, which signals that the normal
request cycle should be continued.
View middleware... isn’t really very useful, honestly. It’s mostly
there for a debugging hook -- it’s a nice place to hook in if you’d like
to wrap and profile a view, for example.
I’m going to skip showing an example, because you probably won’t
ever need to use it.
You’ll use response middleware any time you need to modify the
output before it gets sent to the browser.
Like view middleware, exception middleware isn’t all that useful in
end-user code; it’s mostly there as a hook for doing frameworky
So I’ve cheated and taken an example from Django itself: the built-in
=&2!(240-%!H-33,quot;#2" that handles keeping each request in
its own transaction. Here we can see the rollback step taken by the
exception hook. (There’s of course a similar commit step in the
response middleware, but that’s not shown here.)
Jacob’s discussion of middleware showed that it provides hooks for
additional processing of HTTP requests and responses. I’m going to
cover signals, which provide similar hooks in Django’s lifecycle and
You’ve probably used something like Django’s signaling tools before
in the form of Observer from the book, “Design Patternsquot; (a.k.a the
Gang of Four), or from Qt, Java, or .Net programming.
Something so popular has got to be useful, right?
When you’re first starting out with a toolset, it’s common to just
make things work.
But, when your codebase grows or you wish to start combining and
layering components, directly referencing other modules and
applications leads to circular dependencies, tight coupling, difficulties
in testing, and, yes, sadness.
You can use Django’s stock signals to hook into other apps and to
customize ORM behavior. You can also provide your own signals for
use in other applications.
The core idea is that signals provide a way to communicate and
coordinate without directly expressing dependencies.
Note that it’s possible to have multiple handlers per signal. They’ll
run sequentially, but their order is undefined. You shouldn’t write
signal handlers with the expectation that an earlier handler has
Here’s a simple example from the Django codebase.
We want Django’s ORM to be useful without the HTTP handler, and
vice versa. But we also want to make sure that when an HTTP
request is finished, the DB connection is closed.
The core.request_finished signal is used to notify the ORM that the
connection is no longer needed.
Using signals starts with choosing a one. You can either use a stock
Django one or publish your own.
Defining your own signal is as simple as creating an object to
Once you’ve chosen your signal, you’ll write a handler based on the
arguments the signal’s sender provides.
Finally, you’ll connect your handler to the signal.
Django includes a number of signals which it uses internally.
362!7%54%"5(-7!2,( is home to a couple more request-
"C)quot;(0*(02&0quot;3 is sent when the request handler first begins
processing, and is used internally to reset
3A54%!!quot;40-%!5C)quot;&-quot;(, a list of all queries executed by
Django’s ORM which is kept when (quot;00-!7(5JLS>T9DD
7%0*"C)quot;(0*quot;/4quot;+0-%! is used to indicate an exception
occurred while processing a request. It’s used internally to roll back
any pending database transaction as well as for exception reporting in
Now we get to the good stuff.
The 4,2((*+"+2"3 signal indicates that a H%3quot;, class has
been constructed. It’s used internally for some housekeeping such as
ensuring that every model has a H2!27quot;& and resolving recursive
model relationships. This signal is very early in the life of a H%3quot;,,
so some pretty radical features are possible.
The pre and post init signals allow signal handlers to munge data just
as a model instance is created. We’ll see an example in
Tquot;!quot;&-4U%"-7!Vquot;8 a bit later.
The pre and post save signals allow a signal handler to do additional
processing in response to the model being saved. The pre and post
delete signals serve a similar purpose.
+%(0*(8!43A is sent by 362!7%54%"5'2!27quot;'quot;!0 just
after an app’s models have been added to the database. It’s used for
interactive prompting, as seen in auth’s initial superuser prompt.
One nice use of signals is to add additional functionality to existing
Suppose we want to get an email any time a model is saved with a
pub_date attribute set in the future.
Note that if you just want this type of handling on a single model
which you control, you’d probably be better off overriding the save
method in your model definition rather than using a signal.
But in this case, we want to handle multiple models. We’ll need to
listen to a save signal. We can use either pre- or post-save in this
case, since the signal will not be manipulating the data about to be
saved. We’ll use +"*(2Fquot;.
Django dispatches the +"*(2Fquot; signal with the keyword
arguments (quot;!3quot;& (the model class) and -!(02!4quot; (the model
We’ll need to define a signal handler to use these parameters.
Connecting to a signal is pretty simple-- just call
3-(+2041quot;&54%!!quot;40, passing in the handler and the signal for
which it should be called.
Recall that +"*(2Fquot; offers both the model class and instance as
parameters. In this case, we care about the model instance, but not the
Django’s dispatching system will match up the published arguments
with the subscribed handlers. There’s no need to accept all
parameters explicitly in the handlers.
Since we’re trying to handle many different models, we’ll have to
assume some common interface.
Here, we check whether the model has the attributes we expect, and
if not, we stop processing the signal.
Now, whenever a model instance is saved, mail_on_future will be
Another use of signals is to adapt from one form in an API call to
Tquot;!quot;&-4U%"-7!Vquot;8 makes it possible to refer to any kind of
related instance using U%"-7!Vquot;8-like semantics. It does this by
storing the related instance’s content type and primary key value.
But there’s a hitch-- models with regular U%"-7!Vquot;8 fields can
be constructed with references to the related model instance. In this
example, we’re assigning an author to a story.
Tquot;!quot;&-4U%"-7!Vquot;8, however, requires both a content type and
a foreign key. The API would be more consistent with
U%"-7!Vquot;8 if we had a way to hide that complexity. In this
example, we’d like to assign a target object for a B%''quot;!0.
To accomplish this, Tquot;!quot;&-4U%"-7!Vquot;8 listens for the
+"*-!-0 signal and alters the model construction call from the
nice form to the ugly (but necessary) form.
In the pre_init handler, GenericForeignKey inspects the constructor
kwargs for the desired usage.
And then it replaces the the given model instance with its related
content type and primary key.
This reduces the lines of code needed to use the GenericForeignKey,
and makes the API more like a standard ForeignKey.
You can find further information on signals as implemented in
Django with these links:
These projects, available on http://code.google.com/, all use signals.
362!7%:'),0-,-!7)2,, in particular, is very ambitious; it uses
signals to dynamically create models featuring parallel texts for
originally-specified models. Additionally, it substitutes its own
custom (oldforms) manipulators in to facilitate data entry of
Have a look and have fun.
This view has three return values: the empty string, if it was given an
empty username; the text 'Unavailable', if it was given a username
that is unavailable; and the text 'Available' for usernames that are
The 6W)quot;&8 function takes a CSS selector as its first argument; here
we are passing a selector for the span element with -3DX'(7X, but
it supports all sorts of advanced selectors including ones from CSS 2
and 3, XPath and a few that are unique to jQuery.
The function returns a wrapper object around the collection of
elements matched by the selector. jQuery methods can then be called
on the wrapper; in this case we are calling the ,%23 method, which
uses Ajax to retrieve a fragment of HTML from a URL and then
injects it in to the element(s) on which it was called.
For convenience, jQuery sets up Y?@ as an alias to itself. 6W)quot;&8
and Y are the only two symbols it adds to your global namespace, and
you can revert Y back to what it was before if you want to (for
compatibility with Protoype, for example).
Here we're binding a function to the Zquot;8)+ event of the input field.
Every time a key is released it performs the Ajax request.
Finally, we set the whole thing to run when the page has finished
loading. This ensures that the input element has been loaded in to the
browser's DOM. $(document).ready() fires after the DOM has been
loaded but before all of the images have been loaded - this means it's
window.onload, which can take a lot longer to fire.
The $ function also acts as a shortcut for $(document).ready, if you
pass it a function instead of a selector string.
All Web applications need server-side validation, to ensure the
integrity (and security) of data submitted by the client. Application
but this often leads to duplicated validation logic - the same rules
With Ajax, we can reuse the server-side code for client-side
Django's !quot;#$%&'( library allows us to define form validation logic
in a similar way to Django models - declaratively, using a subclass of
Here's the server-side code that goes with that form. If the form has
been POSTed, it checks if it is valid. If it is, it sends an e-mail (in
this case) and redirects the user. If the form is invalid or has not yet
been submitted, the contact page is displayed.
The template looks like this. $%&'52(*+ provides a simple default
layout for the form; the template can be extended to define exactly
how the form should look if a custom display is required.
Let's add client-side validation, reusing our B%!0240U%&' for
validation. This view expects to be POSTed either the whole form or
just one of the fields; if just one field is provided, the field= GET
variable is used to specify which one.
The view returns a Python dictionary rendered to JSON, a useful data
It makes use of a custom [(%!<quot;(+%!(quot; class, which knows how
to render a Python object as JSON.
Here's [(%!<quot;(+%!(quot;. I often include this utility class in my
applications when I'm working with JSON. Note that it sets the
correct Content-Type header, quot;application/jsonquot;. This can make
debugging difficult as the browser will attempt to download the
content directly; an improved version could check for
(quot;00-!7(5JLS>T and serve using quot;text/plainquot;.
function is called for an input field, and performs an HTTP POST
(using jQuery's Ajax features) against the view we just defined.
It makes use of the 6C)quot;&85$%&'56( plugin, which adds the
$%&'=%N&&28?@ method to the jQuery object. jQuery plugins
provide a clever mechanism for extending jQuery's functionality
without needing to increase the size of the main jquery.js file.
The F2,-320quot;I!+)0 function is attached to every input field on
the page, using jQuery's handy custom -!+)0 selector.
Here's the (1%#L&&%&( function, which displays any errors in the
quot;&&%&,-(0 associated with the form element.
",20quot;3L&&%&K-(0?@ uses jQuery's DOM traversal functions to
find the error list associated with the input element, and creates one if
there isn't one already.
Bonus slide: here’s that F2,-320quot;*4%!0240 method repackaged
as a generic view.
Custom template tags are supremely useful. Write ‘em for a while,
however, and you start to discover some patterns you use over and
over again. In this part, I’ll go over five common needs, and the
patterns I use to handle them.
The first use case: simple data (i.e. a list, text, etc.) in, simple data
When you’ve got one of these tasks, think “filter!”
An example filter to “piratize” text. Filters really are damn simple, so
there’s not much more to say about this.
Use case #2: you’ve got some programatically-generated data (i.e.
from the results of a database lookup, or system call, or ...) that you’d
like to render into the template.
In this case, the ](-'+,quot;*027 decorator is your friend.
Here’s a pretty simple example: display a server’s uptime. Not a very
useful tag, but shows the basic pattern pretty well.
Use case #3: you’ve got something you want to display in a template
tag, but it’s expensive and you don’t want template authors killing
The solution is to cache the results of template tags.
I’ve written a set of node subclasses that illustrate one way you could
use caching with template tags. It’s a useful idea even if you don’t
use these specific bits.
This is a use case that doesn’t come up very often, some some times
you need to do pretty complex stuff.
Here’s an example (also available at djangosnippets.com) of what I’m
talking about. These tags depend on each other, and you’ll need to
handle the child tokens “inside” the switch tag correctly.
The import parts to notice here are the three commented lines. First
we gather all the child nodes until the ^_9quot;!3(#-0419_` tag;
then we delete that ^_9quot;!3(#-0419_` tag; then we pull out just
^_942(quot;9_` nodes. From there, it’s a matter of returning the node
The 42(quot; handler is very similar; it just doesn’t have to do the
Here’s (the render method of) the switch node. Notice all that it does
is delegate rendering off to the case node after doing some checks.
Finally, this is the interesting part of the case node. Pretty simple:
check for equality, and (when requested) render all the child nodes
Again, the full code’s available online at
This is a common complaint: “I’ve got this cool tag, but I hate having
to ^_9,%239_` it everywhere!” The solution is to make it a
And here’s how. You can stick this code anywhere that’ll get loaded
on startup; I suggest installing it in a top-ish-level **-!-0**5+8.
» http://djangoproject.com/documentation/templates_python/ —
the official template documentation.
» http://code.google.com/p/django-template-utils/ — James’
template utils have some good examples.
» http://www.djangosnippets.org/ — There are lots of good
There are two different kinds of fields in Django:
!quot;#$%&'(5U-quot;,3 (which Simon covered earlier), and
3A5'%3quot;,(5U-quot;,3. Here I’ll cover model fields.
Model fields provide a way to customize the behavior of the ORM
and to provide a richer interface when dealing with model instances.
There are many model fields that come with Django. Here are a few
that run spectrum of sophistication.
A B12&U-quot;,3 requires a '2/,quot;!701 argument, and otherwise
supports common validation parameters like blank, null, and default.
I’m sure you’ve used one before.
Note that each of those parameters could be implemented as a
validator given in F2,-320%&*,-(0. They’re included in the
B12&U-quot;,3 implementation because they are so commonly useful.
Next on the spectrum is ><KU-quot;,3, which is a B12&U-quot;,3 with a
larger default '2/,quot;!701 and an additional option to validate that
the resource identified actually exists.
A U-,quot;U-quot;,3 goes further by contributing helper functions, such as
7quot;0*UILKJ*)&,, to the associated model.
As I covered earlier, Tquot;!quot;&-4U%"-7!Vquot;8 provides an
abstraction layer over the B%!0quot;!0=8+quot; package in order to make
model instances refer to any other model.
Developers using Django can tap into this power, too.
We’ll start with a validating ISBNField.
An ISBN is a unique identifier assigned to each edition (or
sometimes printing) of any book. They come in 10 and 13 digit
varieties; 13 digits is the new standard.
The last digit is a check digit and can be used to verify validity.
We need to subclass an existing Field class. The base Field class
provides hooks needed for Django to manage persistance.
We’ll usually want to override the Field.__init__ in order to set
constraints, and we need to map our Field into a database column.
Before we get to the actual field, a little warning about validation.
Form processing is in flux on trunk right now. Oldforms is being
replaced with Newforms. Oldforms used manipulators, which
validated, in part, using a field’s F2,-320%&*,-(0.
There’s some debate right now whether validation logic belongs in
models, forms, or both.
Rather than get sidelined with that debate and the many ways to
currently do it, I’m going to cheat and not use forms here. Instead, I’ll
rely on H%3quot;,5F2,-320quot;, which, at least on trunk right now, calls
validate for each of the fields on the model.
Watch this space.
Let’s get started.
We’ll inherit from B12&U-quot;,3 to start with, since ISBNs are a
string of characters.
Here’s our custom validator. If you’re not familiar, validators must
raise an a2,-320-%!L&&%& exception to indicate failure.
In the IOSQU-quot;,3G(9**-!-0**, we’ll force '2/,quot;!701 to be
13, since all ISBNs are at most that many characters.
We also add the -(IOSQ validator to validator_list, as an example of
how we could support oldforms.
Finally we add 7quot;0*-!0quot;&!2,*08+quot;90%90quot;,,9J62!7%90%
'2+9RRIOSQU-quot;,3 to the B12&U-quot;,3 database column type.
Now we can use the ISBNField like any stock field.
We can give it a valid ISBN and have it pass, or a bad ISBN and
have it fail.
Given an ISBN, it’s common to want related information about a
book such as the title.
Let’s change IOSQU-quot;,3 so that it contributes a B12&U-quot;,3 for
the title in addition to its own field.
So, I’ve written a method that, given an ISBN, returns the title of that
I’ve also tweaked the IOSQU-quot;,35**-!-0** to take an optional
title_field argument. This is used to determine the name of the title
field on this model.
Every U-quot;,3 has a 4%!0&-A)0quot;*0%*4,2(( method, which
Django uses to help define the H%3quot;, class.
In the last example, we just let the standard
U-quot;,354%!0&-A)0quot;*0%*4,2(( do its thing, but now we want
to alter the model class definition to include an extra U-quot;,3 for the
The tricky part here is incrementing the creation counter. The
creation counter is used to maintain field order when one Django
model inherits from another one. But it also affects the order of field
value assignment in the model’s constructor.
We want ISBN to be set after the title field so that we can fill in the
title based on the ISBN value. If the ISBN field occurred before the
title in the model definition, the title set by the IOSQU-quot;,3 might
Finally, we contribute the new title field to the model we’re helping
Actually, there’s one more step to the contribution.
We’d like the the title attribute to be derived from the given ISBN.
If you want control what happens on an attribute access, you
typically use a property.
In Django, the U-quot;,3 instance is attached to the H%3quot;, class. This
is important to realize, because a single U-quot;,3 instance can’t
manage the model instances. Instead, we need to use a “descriptor”.
Descriptors are objects that take a class or instance as a parameter,
and resolve attribute lookup using both that reference and internal
See Guido’s discussion here:
Since serving the attribute resolution is tightly related to the Field
itself, I’ve made the Field instance itself serve as the descriptor for
the Model class.
Here’s the descriptor “set” method for setting the value of the field on
We insure that the call is for a model instance rather than the model
class. This prevents overriding the field on the class in outside code.
Then, if the ISBN is a string or Q%!quot;, the ISBN is stashed in the
model instance’s dictionary, and the title is set to correspond to the
Finally, when the ISBNField’s attribute is accessed, we return the
value from the model instance’s dictionary. This is the descriptors
There we have it: an ISBNField that manages a related title field.
More resources on Django’s model creation lifecycle:
The =27U-quot;,3 that’s part of django-tagging
(http://code.google.com/p/django-tagging/) is a good example.
And more information about the python magic that lets this work:
It solves the “too many passwords” problem - with OpenID, you
don’t have to come up with a brand new username and password on
every site that you need an account.
It’s decentralised, which means that there’s no central entity
controlling everyone’s identity - unlike Microsoft Passport or Six
It’s an open standard, supported by Open Source libraries. For a
much more detailed introduction, watch the video of my Google Tech
Talk (or read through the slides):
These are some of mine. It’s perfectly normal for people to have
more than one (people have maintained multiple online personas
since the early days of the Internet), but in practise most people will
pick one and use it on most sites.
If you have a LiveJournal or AOL account, you have an OpenID
already. If you don’t have one, there are plenty of places that you can
get one: http://openid.net/wiki/index.php/OpenIDServers
You can watch a screencast of OpenID in action here:
If you view the HTML source of a page that is an OpenID, you’ll
find this in the <head> section.
This tells the OpenID consumer (the site you are signing in to) where
your provider’s server is. This is the URL that you will be redirected
to to “prove” that you own that OpenID. Proof is often done by
signing in to that site with a username and password, but other forms
of authentication are possible as well.
The consumer also establishes a shared secret with the provider, if
they haven’t communicated before. This lets them communicate
securely despite your browser handing the information back and forth
between the two of them.
This essentially acts as a way of helping you to pre-fill a registration
form. As part of the OpenID sign in process, the consumer can ask
your provider for this information. Your provider will explicitly ask
your permission before passing it back. There are no guarantees that
complete (or indeed any) information will be passed back at all, so
consumers can’t rely on this working.
More here: http://simonwillison.net/2007/Jun/30/sreg/
The reference implementation is the JanRain OpenID library:
http://www.openidenabled.com/openid/libraries/python/. It’s a great
library, and really isn’t that hard to use. But there is an easier way...
The models are used by the JanRain library for persistence; you don’t
have to worry about them at all.
Full instructions here: http://django-
The full middleware line is
, but that didn't fit on the slide. You need to add this somewhere after
the session middleware, which must be activated for the OpenID
functionality to work.
The first URL will be your sign-in page, where users are directed to
begin signing in with OpenID.
The second is the URL that the user will be redirected back to upon
successful sign in with their OpenID provider.
The third is the signout page, which users can use to sign out of your
It may not be instantly obvious why it is useful to have users sign in
with more than one OpenID at once. There are a number of reasons,
but the most interesting is that sites may well start to offer API
services around the OpenIDs they provide - for example, a last.fm
OpenID may be used to retrieve that user's last.fm music preferences,
while an Upcoming.org OpenID could provide access to their
calendar. Supporting multiple OpenIDs allows services to be
developed that can take advantage of these site-specific APIs.
By quot;coming soonquot;, I mean really soon. There's a small chance I'll
have released the first of these before giving this tutorial.
» http://openid.net/ — the oficial OpenID site; also home to the
OpenID mailing lists.
» http://www.openidenabled.com/ — a directory of OpenID-
» http://simonwillison.net/tags/openid/ — All of Simon’s writings
» http://code.google.com/p/django-openid/ — Home of the django-
So: diagrammed loosely, this is what a typical website looks like,
This is more like it.
This is LiveJournal’s current architecture, as taken from some slides
on LiveJournal’s architecture given by Brad Fitzpatrick. Yes,
LiveJournal is a big site, but 90% of good scaling is foresight.
Planning ahead to an architecture like this is the only way we’ll
actually get there without too much trouble.
The thing is, this is the only part of that cluster that’s LiveJournal-
specific. In any big application, there’s a bunch of other code that
does infrastructure-related activities, and all that is reusable.
In fact, poke under the hood at most big web sites — MySpace,
Facebook, Slashdot, etc. — and you’ll find many tools crop up over
and over again. The wonders of the LAMP-ish stack these days is
that you can use the same tools the big boys use. The fact that
MySpace gets 6000 hits/second out of Memcached makes me not
worry at all about my 60.
I’m going to go over a few of these tools that’ll give you the most
“bang for your buck.”
The first tool I’ll look at is Perlbal. Perlbal is a “reverse proxy load
balancer and web server”, which is a fancy way of describing a tool
that mediates between web browsers and backend web servers.
Perlbal can do a whole bunch more, actually — including acting as a
part of MogileFS, which is awesome but which I can’t cover in this
tutorial — but I’ll just focus on its role as a reverse proxy.
There are, of course, other load balancers -- Apache’s '%3*+&%/8
and nginx come to mind -- and much of the following applies to
them. I use Perlbal, so that’s what I’m gonna talk about.
So why use a reverse proxy at all?
Well, even if you’ve only got a single web server, Perlbal can still
save your butt. Although it takes only fractions of a second to
generate a page, a slow client can take a relatively long time to
download that content. In most situations even your faster clients
have far smaller pipes than your server; this leaves the server to
spend the majority of its town “spoonfeeding” rendered data down to
clients. Perlbal (and other reverse proxies) will cache a certain
amount of content and trickle it down to clients, leaving your
backend free to handle more requests.
Second, if all your requests go through a proxy, it’s amazingly easy
to swap out backend web servers, add more as traffic increases, or
otherwise move things around. Without a proxy, you’d spend a bunch
of time rebinding IP addresses, and possibly end up locked into a
server you don’t like.
Finally, if you’re lucky you’ll get to the point that a single server
won’t handle all the traffic you’re throwing at it. Perlbal makes it
incredibly easy to add more backend servers if and when that
Unfortunately, Perlbal isn’t documented all that well. The docs in
SVN are pretty good, and the mailing list is a great place to get help.
I’ll also show some example configs over the next few slides.
Here’s a stripped down version of the Perlbal config for ljworld.com.
We’re using the virtual host plugin to delegate based on domain
name. The domain name points to a “service”, which (since it’s a
proxy) points to a “pool” of servers.
We’re using a cute trick for the poll here; instead of listing the servers
in the config file, we point to a “nodefile” of backend web servers.
This is that node file; one Id+%&0 per line. The clever thing is that
Perlbal notices if this file changes and automatically reconfigures the
pool; this means that changing the pool is as simple as changing this
A couple of tricks we’ve learned over a few years of using Perlbal:
» Because you’re now behind a proxy, <LHc=L*Id won’t be
correct (it’ll always be set to the IP of Perlbal itself). Django’s
included eU%&3quot;3U%&H-33,quot;#2" will correctly set
<LHc=L*Id for you.
» Perlbal has some neat tricks; check out e:<quot;+&%/8:U-,quot; and
» It’s often useful to know which backend server actually handled
a request. We use a special X-header to keep track of that (e:
» If you’ve got a change you’re not sure about, you can always
deploy it to a single server and let Perlbal hand just a portion of
requests to that server.
The next tool on our little micro-tour is memcached. It’s a in-
memory object caching system, and it’s the secret to making your
sites run fast. Django’s caching framework will use memcached, and
for any serious production-quality site you should let it.
Really, there’s no reason not to use memcached, so I’m not going to
spend much time advocating it. If you choose a different cache
backend you deserve what you get.
This is how easy it is to start memcached.
And this is all you need to do to make Django use it (well, besides
installing the memcached client library, which is pure Python and
will run anywhere). Since it confuses some people, the second line
shows how to use multiple cache backends.
» More memcached servers generally equals better performance
(i.e. four 1 GB servers will perform better than 1 4GB server).
That’s because the memcached protocol hashes twice: once on
the client to determine the server, and once on the server. This
leaves an equal distribution of keys across servers, and hence
better performance. You do want roughly equal cache sizes on
each server so that key expiration isn’t abnormal.
» You want to make sure to use unique keys if you’re running
multiple sites against the same cache. Otherwise
%!quot;5quot;/2'+,quot;54%'.N. could get the same key as
0#%5quot;/2'+,quot;54%'.N., and that’s bad. We use
J[NQTc*OL==IQTO*HcJ>KL as the key prefix, and it works
» Memcached has no namespaces, so try to design keys that don’t
need ‘em. In a bind, you can use some external value that you
increment when you need a “new” namespace.
The final tool I’ll look at is Capistrano. Although it’s classified as a
deployment utility, you can really think of Capistrano as a tool to run
the same command on a bunch of servers at once. The most useful
command is (F!9)+320quot;, but you can really run anything.
Once you end up with multiple web servers, keeping ‘em in sync is
hard, and NFS is failure-prone. Deployment tools keep sanity.
Yes, it’s Ruby :)
The Capistrano DSL, though, is pretty sweet; here I’m defining a
remote command I can easily run with 42+9)+7&23quot;*+&%6quot;40.
I can’t really show much more code examples since each site will be
different, but I suggest just reading through the manual and playing
around; it’s really not very hard.
A couple of tricks we’ve learned:
» If you’ve got a “restart” task (to reload Apache or whatever),
make sure to stagger the restarts so you don’t have any
» Capistrano is great to combine with a build process. We use it
62F2(4&-+0 combines the build process and the roll-out
» It’s also a good idea to bake cache-busting into your code
django-apps/ has a good introduction to using Capistrano with