Cowboy development
   with Django
     Simon Willison
    DjangoCon 2009
http://www.youtube.com/watch?v=nZx9sNXv9h0
Just one problem... we
didn’t have cowboys in
        England
The Napoleonic Wars
A Napoleonic Sea Fort




http://en.wikipedia.org/wiki/File:Alderney_-_Fort_Clonque_02.jpg
Super Evil Dev Fort
http://www.anotherurl.com/travel/fort_clonque/handbook.htm
Photos by Cindy Li




                     http://www.flickr.com/photos/cindyli/sets/72157610369683426/
WildLifeNearYou.com
    (Built in 1 week and 10 months)
DEMO
Search uses the geospatial branch of Xapian

Species database comes from Freebase

Photos can be imported from Flickr

“Su...
/dev/fort
Cohort 3: Winter 2009                                                             What is /dev/fort?

The trip  ...
Cowboy development at work
MP expenses
Heather Brooke
January 2005
 The FOI request
February 2008
The Information Tribunal
“Transparency will
damage democracy”
January 2009
The exemption law
March 2009
  The mole
“All of the receipts of 650-odd MPs,
redacted and unredacted, are for sale at a
price of £300,000, so I am told. The price...
8th May, 2009
The Daily Telegraph
At the Guardian...
April: “Expenses are due out in
 a couple of months, is there
    anything we can do?”
June: “Expenses have been
bumped forward, they’re out
        next week!”
Thursday 11th June
  The proof-of-concept
Monday 15th June
 The tentative go-ahead
Tuesday 16th June
Designer + client-side engineer
Wednesday 17th June
   Operations engineer
Thursday 18th June
    Launch day!
How we built it
$ convert Frank_Comm.pdf pages.png
Frictionless registration
Page filters
page_filters = (
    # Maps name of filter to dictionary of kwargs to doc.pages.filter()
    ('reviewed', {
        'votes__i...
pages = doc.pages.all()
if page_filter:
   kwargs = page_filters_lookup.get(page_filter)
   if kwargs is None:
      raise Ht...
Matching names
http://github.com/simonw/datamatcher
On the day
def get_mp_pages():
  "Returns list of (mp-name, mp-page-url) tuples"
  soup = Soup(urllib.urlopen(INDEX_URL))
  mp_links ...
def get_pdfs(mp_url):
  "Returns list of (description, years, pdf-url, size) tuples"
  soup = Soup(urllib.urlopen(mp_url))...
“Drop Everything”
Photoshop + AppleScript
           v.s.
     Java + IntelliJ
Images on our docroot (S3
upload was taking too long)
Blitz QA
Launch! (on EC2)
Crash #1: more Apache
 children than MySQL
      connections
unreviewed_count = Page.objects.filter(
   votes__isnull = True
).distinct().count()
SELECT
  COUNT(DISTINCT `expenses_page`.`id`)
FROM
  `expenses_page` LEFT OUTER JOIN `expenses_vote` ON (
     `expenses_p...
unreviewed_count = cache.get('homepage:unreviewed_count')
if unreviewed_count is None:
   unreviewed_count = Page.objects....
With 70,000 pages and a LOT of votes...

 DB takes up 135% of CPU

Cache the count in memcached...

 DB drops to %35 of CPU
unreviewed_count = Page.objects.filter(
   votes__isnull = True
).distinct().count()

reviewed_count = Page.objects.filter(
...
unreviewed_count = Page.objects.filter(
   is_reviewed = False
).count()
Migrating to InnoDB on a
     separate server
ssh mps-live "mysqldump mp_expenses" |
sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' |
  sed 's/CHARSET=latin1/CHARSET=utf8/g' |
 ...
Reigning in the cowboy
Reigning in the cowboy


An RSS to JSON proxy service
Pair programming
Comprehensive unit tests, with mocks
Continuous int...
Points of embarrassment


Database required to run the test suite
Logging? What logging?
Tests get deployed alongside the ...
A final thought
Web development in 2005
       Relational
                       Cache
       Database



       Application   Admin tools...
Web development in 2009
 Relational                  Search       Datastructure         External web     Non-relational
  ...
Thank you
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Cowboy development with Django
Upcoming SlideShare
Loading in...5
×

Cowboy development with Django

8,534

Published on

Keynote for DjangoCon 2009, presented on the 8th of September 2009. Covers two cowboy projects - WildLifeNearYou.com and MP expenses - and talks about ways of "reigning in the cowboy" and developing in a more sustainable way.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,534
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
82
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Cowboy development with Django"

  1. 1. Cowboy development with Django Simon Willison DjangoCon 2009
  2. 2. http://www.youtube.com/watch?v=nZx9sNXv9h0
  3. 3. Just one problem... we didn’t have cowboys in England
  4. 4. The Napoleonic Wars
  5. 5. A Napoleonic Sea Fort http://en.wikipedia.org/wiki/File:Alderney_-_Fort_Clonque_02.jpg
  6. 6. Super Evil Dev Fort
  7. 7. http://www.anotherurl.com/travel/fort_clonque/handbook.htm
  8. 8. Photos by Cindy Li http://www.flickr.com/photos/cindyli/sets/72157610369683426/
  9. 9. WildLifeNearYou.com (Built in 1 week and 10 months)
  10. 10. DEMO
  11. 11. Search uses the geospatial branch of Xapian Species database comes from Freebase Photos can be imported from Flickr “Suggest changes” to our Zoo information uses model objects representing proposed changes to other model objects
  12. 12. /dev/fort Cohort 3: Winter 2009 What is /dev/fort? The trip Imagine a place of no distractions, no The third /dev/fort will run from 9th to 16th November on the Kintyre IM, no Twitter — in fact, no Peninsula in Scotland. internet. Within, a group of a dozen or more developers, designers, thinkers and doers. And a lot of a food. Cohort 2: Summer 2009 Now imagine that place is a fort. The trip The second /dev/fort ran from 30th May to 6th June 2009 at Knockbrex Castle in Scotland. As with the first cohort, we have a few remaining problems still to iron out (thorny issues inside Django we were hoping to avoid, that sort of thing). We hope to have the site in alpha by the end of the summer. Cohort members Ryan Alexander, Steven Anderson, James Aylett, Hannah Donovan, Natalie Downe, Mark Norman Francis, Matthew Hasler, Steve Marshall, Richard Pope, Gareth Rushgrove, Simon Willison. The idea behind /dev/fort is to throw Cohort 1: Winter 2008 a group of people together, cut them off from the rest of the world, and http://devfort.com/
  13. 13. Cowboy development at work
  14. 14. MP expenses
  15. 15. Heather Brooke
  16. 16. January 2005 The FOI request
  17. 17. February 2008 The Information Tribunal
  18. 18. “Transparency will damage democracy”
  19. 19. January 2009 The exemption law
  20. 20. March 2009 The mole
  21. 21. “All of the receipts of 650-odd MPs, redacted and unredacted, are for sale at a price of £300,000, so I am told. The price is going up because of the interest in the subject.” Sir Stuart Bell, MP Newsnight, 30th March
  22. 22. 8th May, 2009 The Daily Telegraph
  23. 23. At the Guardian...
  24. 24. April: “Expenses are due out in a couple of months, is there anything we can do?”
  25. 25. June: “Expenses have been bumped forward, they’re out next week!”
  26. 26. Thursday 11th June The proof-of-concept
  27. 27. Monday 15th June The tentative go-ahead
  28. 28. Tuesday 16th June Designer + client-side engineer
  29. 29. Wednesday 17th June Operations engineer
  30. 30. Thursday 18th June Launch day!
  31. 31. How we built it
  32. 32. $ convert Frank_Comm.pdf pages.png
  33. 33. Frictionless registration
  34. 34. Page filters
  35. 35. page_filters = ( # Maps name of filter to dictionary of kwargs to doc.pages.filter() ('reviewed', { 'votes__isnull': False }), ('unreviewed', { 'votes__isnull': True }), ('with line items', { 'line_items__isnull': False }), ('interesting', { 'votes__interestingvote__status': 'yes' }), ('interesting but known', { 'votes__interestingvote__status': 'known' ... ) page_filters_lookup = dict(page_filters)
  36. 36. pages = doc.pages.all() if page_filter: kwargs = page_filters_lookup.get(page_filter) if kwargs is None: raise Http404, 'Invalid page filter: %s' % page_filter pages = pages.filter(**kwargs).distinct() # Build the filters filters = [] for name, kwargs in page_filters: filters.append({ 'name': name, 'count': doc.pages.filter(**kwargs).distinct().count(), })
  37. 37. Matching names
  38. 38. http://github.com/simonw/datamatcher
  39. 39. On the day
  40. 40. def get_mp_pages(): "Returns list of (mp-name, mp-page-url) tuples" soup = Soup(urllib.urlopen(INDEX_URL)) mp_links = [] for link in soup.findAll('a'): if link.get('title', '').endswith("'s allowances"): mp_links.append( (link['title'].replace("'s allowances", ''), link['href']) ) return mp_links
  41. 41. def get_pdfs(mp_url): "Returns list of (description, years, pdf-url, size) tuples" soup = Soup(urllib.urlopen(mp_url)) pdfs = [] trs = soup.findAll('tr')[1:] # Skip the first, it's the table header for tr in trs: name_td, year_td, pdf_td = tr.findAll('td') name = name_td.string year = year_td.string pdf_url = pdf_td.find('a')['href'] size = pdf_td.find('a').contents[-1].replace('(', '').replace(')', '') pdfs.append( (name, year, pdf_url, size) ) return pdfs
  42. 42. “Drop Everything”
  43. 43. Photoshop + AppleScript v.s. Java + IntelliJ
  44. 44. Images on our docroot (S3 upload was taking too long)
  45. 45. Blitz QA
  46. 46. Launch! (on EC2)
  47. 47. Crash #1: more Apache children than MySQL connections
  48. 48. unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count()
  49. 49. SELECT COUNT(DISTINCT `expenses_page`.`id`) FROM `expenses_page` LEFT OUTER JOIN `expenses_vote` ON ( `expenses_page`.`id` = `expenses_vote`.`page_id` ) WHERE `expenses_vote`.`id` IS NULL
  50. 50. unreviewed_count = cache.get('homepage:unreviewed_count') if unreviewed_count is None: unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() cache.set('homepage: unreviewed_count', unreviewed_count, 60)
  51. 51. With 70,000 pages and a LOT of votes... DB takes up 135% of CPU Cache the count in memcached... DB drops to %35 of CPU
  52. 52. unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() reviewed_count = Page.objects.filter( votes__isnull = False ).distinct().count()
  53. 53. unreviewed_count = Page.objects.filter( is_reviewed = False ).count()
  54. 54. Migrating to InnoDB on a separate server
  55. 55. ssh mps-live "mysqldump mp_expenses" | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' | sed 's/CHARSET=latin1/CHARSET=utf8/g' | ssh mysql-big "mysql -u root mp_expenses"
  56. 56. Reigning in the cowboy
  57. 57. Reigning in the cowboy An RSS to JSON proxy service Pair programming Comprehensive unit tests, with mocks Continuous integration (Team City) Deployment scripts against CI build numbers
  58. 58. Points of embarrassment Database required to run the test suite Logging? What logging? Tests get deployed alongside the code (!) ... but generally pretty smooth sailing
  59. 59. A final thought
  60. 60. Web development in 2005 Relational Cache Database Application Admin tools Templates XML feeds
  61. 61. Web development in 2009 Relational Search Datastructure External web Non-relational Cache Database index servers services database Admin tools Application Message queue Offline workers Monitoring and reporting Templates XML feeds API Webhooks
  62. 62. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×