Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cowboy development with Django


Published on

Keynote for DjangoCon 2009, presented on the 8th of September 2009. Covers two cowboy projects - and MP expenses - and talks about ways of "reigning in the cowboy" and developing in a more sustainable way.

Published in: Technology, Business
  • Be the first to comment

Cowboy development with Django

  1. Cowboy development with Django Simon Willison DjangoCon 2009
  3. Just one problem... we didn’t have cowboys in England
  4. The Napoleonic Wars
  5. A Napoleonic Sea Fort
  6. Super Evil Dev Fort
  8. Photos by Cindy Li
  9. (Built in 1 week and 10 months)
  10. DEMO
  11. Search uses the geospatial branch of Xapian Species database comes from Freebase Photos can be imported from Flickr “Suggest changes” to our Zoo information uses model objects representing proposed changes to other model objects
  12. /dev/fort Cohort 3: Winter 2009 What is /dev/fort? The trip Imagine a place of no distractions, no The third /dev/fort will run from 9th to 16th November on the Kintyre IM, no Twitter — in fact, no Peninsula in Scotland. internet. Within, a group of a dozen or more developers, designers, thinkers and doers. And a lot of a food. Cohort 2: Summer 2009 Now imagine that place is a fort. The trip The second /dev/fort ran from 30th May to 6th June 2009 at Knockbrex Castle in Scotland. As with the first cohort, we have a few remaining problems still to iron out (thorny issues inside Django we were hoping to avoid, that sort of thing). We hope to have the site in alpha by the end of the summer. Cohort members Ryan Alexander, Steven Anderson, James Aylett, Hannah Donovan, Natalie Downe, Mark Norman Francis, Matthew Hasler, Steve Marshall, Richard Pope, Gareth Rushgrove, Simon Willison. The idea behind /dev/fort is to throw Cohort 1: Winter 2008 a group of people together, cut them off from the rest of the world, and
  13. Cowboy development at work
  14. MP expenses
  15. Heather Brooke
  16. January 2005 The FOI request
  17. February 2008 The Information Tribunal
  18. “Transparency will damage democracy”
  19. January 2009 The exemption law
  20. March 2009 The mole
  21. “All of the receipts of 650-odd MPs, redacted and unredacted, are for sale at a price of £300,000, so I am told. The price is going up because of the interest in the subject.” Sir Stuart Bell, MP Newsnight, 30th March
  22. 8th May, 2009 The Daily Telegraph
  23. At the Guardian...
  24. April: “Expenses are due out in a couple of months, is there anything we can do?”
  25. June: “Expenses have been bumped forward, they’re out next week!”
  26. Thursday 11th June The proof-of-concept
  27. Monday 15th June The tentative go-ahead
  28. Tuesday 16th June Designer + client-side engineer
  29. Wednesday 17th June Operations engineer
  30. Thursday 18th June Launch day!
  31. How we built it
  32. $ convert Frank_Comm.pdf pages.png
  33. Frictionless registration
  34. Page filters
  35. page_filters = ( # Maps name of filter to dictionary of kwargs to doc.pages.filter() ('reviewed', { 'votes__isnull': False }), ('unreviewed', { 'votes__isnull': True }), ('with line items', { 'line_items__isnull': False }), ('interesting', { 'votes__interestingvote__status': 'yes' }), ('interesting but known', { 'votes__interestingvote__status': 'known' ... ) page_filters_lookup = dict(page_filters)
  36. pages = doc.pages.all() if page_filter: kwargs = page_filters_lookup.get(page_filter) if kwargs is None: raise Http404, 'Invalid page filter: %s' % page_filter pages = pages.filter(**kwargs).distinct() # Build the filters filters = [] for name, kwargs in page_filters: filters.append({ 'name': name, 'count': doc.pages.filter(**kwargs).distinct().count(), })
  37. Matching names
  39. On the day
  40. def get_mp_pages(): "Returns list of (mp-name, mp-page-url) tuples" soup = Soup(urllib.urlopen(INDEX_URL)) mp_links = [] for link in soup.findAll('a'): if link.get('title', '').endswith("'s allowances"): mp_links.append( (link['title'].replace("'s allowances", ''), link['href']) ) return mp_links
  41. def get_pdfs(mp_url): "Returns list of (description, years, pdf-url, size) tuples" soup = Soup(urllib.urlopen(mp_url)) pdfs = [] trs = soup.findAll('tr')[1:] # Skip the first, it's the table header for tr in trs: name_td, year_td, pdf_td = tr.findAll('td') name = name_td.string year = year_td.string pdf_url = pdf_td.find('a')['href'] size = pdf_td.find('a').contents[-1].replace('(', '').replace(')', '') pdfs.append( (name, year, pdf_url, size) ) return pdfs
  42. “Drop Everything”
  43. Photoshop + AppleScript v.s. Java + IntelliJ
  44. Images on our docroot (S3 upload was taking too long)
  45. Blitz QA
  46. Launch! (on EC2)
  47. Crash #1: more Apache children than MySQL connections
  48. unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count()
  49. SELECT COUNT(DISTINCT `expenses_page`.`id`) FROM `expenses_page` LEFT OUTER JOIN `expenses_vote` ON ( `expenses_page`.`id` = `expenses_vote`.`page_id` ) WHERE `expenses_vote`.`id` IS NULL
  50. unreviewed_count = cache.get('homepage:unreviewed_count') if unreviewed_count is None: unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() cache.set('homepage: unreviewed_count', unreviewed_count, 60)
  51. With 70,000 pages and a LOT of votes... DB takes up 135% of CPU Cache the count in memcached... DB drops to %35 of CPU
  52. unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() reviewed_count = Page.objects.filter( votes__isnull = False ).distinct().count()
  53. unreviewed_count = Page.objects.filter( is_reviewed = False ).count()
  54. Migrating to InnoDB on a separate server
  55. ssh mps-live "mysqldump mp_expenses" | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' | sed 's/CHARSET=latin1/CHARSET=utf8/g' | ssh mysql-big "mysql -u root mp_expenses"
  56. Reigning in the cowboy
  57. Reigning in the cowboy An RSS to JSON proxy service Pair programming Comprehensive unit tests, with mocks Continuous integration (Team City) Deployment scripts against CI build numbers
  58. Points of embarrassment Database required to run the test suite Logging? What logging? Tests get deployed alongside the code (!) ... but generally pretty smooth sailing
  59. A final thought
  60. Web development in 2005 Relational Cache Database Application Admin tools Templates XML feeds
  61. Web development in 2009 Relational Search Datastructure External web Non-relational Cache Database index servers services database Admin tools Application Message queue Offline workers Monitoring and reporting Templates XML feeds API Webhooks
  62. Thank you