Mashing Up The Guardian

1,435 views

Published on

Mashing up the guardian talk as given at UKUUG 2009.

Published in: Technology, News & Politics
  • Be the first to comment

Mashing Up The Guardian

  1. 1. Mashing up Michael Brunton-Spall michael.brunton-spall@guardian.co.uk @mibgames
  2. 2. About Us Online since 1995 250M+ pages per month 30M+ visitors per month
  3. 3. 1995 - Guardian Online
  4. 4. 1999 - Guardian Unlimited
  5. 5. 1999 - Removed the registration wall
  6. 6. 2007 - Rebuild and Redesign
  7. 7. 2007 - Rebuild and Redesign
  8. 8. 2007 - Rebuild and Redesign
  9. 9. 2009 - MPs Expenses
  10. 10. Developing The Guardian
  11. 11. The Hackable Guardian Url Hacking Keyword Combiners RSS Feeds
  12. 12. Url Hacking http://www.guardian.co.uk/ [section]/[keyword] technology/internet [section]/all environment/all [publication]/[date]/[newspapersection] theguardian/2009/jun/11/technologyguardian profile/[name] profile/bobbiejohnson
  13. 13. RSS Feeds RSS Everywhere! [section]/[keyword]/rss technology/internet/rss [section]/all /rss environment/all/rss [publication]/[date]/[newspapersection]/rss theguardian/2009/jun/11/technologyguardian/rss profile/[name]/rss profile/bobbiejohnson/rss
  14. 14. Full Fat! http://www.flickr.com/photos/snapperwolf/
  15. 15. Url Combiners A Logical AND Almost any combination from before technology/internet+profile/bobbiejohnson theguardian/technologyguardian+technology/internet Except things that aren't actual tags: Dated newspaper sections theguardian/2009/jun/11... + anything = 404 the all page .../all + anything = 404
  16. 16. Problems with this approach Fragile No Discovery Not well documented Copyright Issues
  17. 17. Build applications with the Guardian
  18. 18. Open Platform Opening up how we work with people both internally and externally A suite of services enabling partners to build applications with the Guardian Content API Data Store
  19. 19. Data Store A directory of useful data curated by Guardian editors
  20. 20. Data Store A directory of useful data curated by Guardian editors
  21. 21. Data Store
  22. 22. Data Store
  23. 23. Content API A service for selecting and collecting content from the Guardian for re-use
  24. 24. Content API A service for selecting and collecting content from the Guardian for re-use
  25. 25. Content API Search content
  26. 26. Content API Xml and Json
  27. 27. Content API Tag Metadata
  28. 28. Content API Full Article Body
  29. 29. Content API Filters
  30. 30. So why? guardian.co.uk has an amazing amount of quality content Incredible amounts of meta-data, curated by editors Aim to allow the guardian to become a rich source of facts and journalism for the web
  31. 31. How? Backed of our search platform Provides access to 10 years of article content and metadata Supports multiple output formats: XML, JSON, ATOM Supported free text search across content Search for keywords Guardian supported api projects in Java, PHP, Python and Ruby Community supported api projects in Perl, ActionScript and Coldfusion (and probably more?)
  32. 32. Pricing
  33. 33. Pricing FREE!
  34. 34. How free is free? You can publish full articles from the guardian on your website 5k queries per day limit 24 hour maximum cache lifetime Online support Partner with us on advertising
  35. 35. Beta trial Limited number of keys Collecting feedback Will open more widely at end of beta program
  36. 36. How do I use it? Home page http://www.guardian.co.uk/open-platform Sign up for an API key at http://guardian.mashery.com Use the API Explorer at http://api.guardianapis.com/docs/ Use the python library at http://code.google.com/p/openplatform-python/
  37. 37. What can I do with it? Search our tag hierarchy Find content by tag Find content by search terms Display on your website!
  38. 38. MP's Expenses
  39. 39. MP's Expenses Written in Django Hosted on EC2 Easy to modify Written in 2 weeks
  40. 40. MP's Expenses An MP's Page
  41. 41. View def mp(request, id): mp = get_object_or_404(MP, pk = id) ... return render(request, 'mp.html', { 'mp': mp, 'documents': mp.documents.all(), 'top_users': top_users[:5], })
  42. 42. View def mp(request, id): mp = get_object_or_404(MP, pk = id) ... return render(request, 'mp.html', { 'mp': mp, 'documents': mp.documents.all(), 'top_users': top_users[:5], })
  43. 43. View def mp(request, id): mp = get_object_or_404(MP, pk = id) ... return render(request, 'mp.html', { 'mp': mp, 'documents': mp.documents.all(), 'top_users': top_users[:5], })
  44. 44. Using the Guardian API Getting all articles about an MP results = client.search(q='"%S"' % (name)) returns a paginating iterator for x in results: print x['headline'] Shows only first 10 by default
  45. 45. Get the client def mp(request, id): mp = get_object_or_404(MP, pk = id) ... client = Client(settings.GUARDIAN_APIKEY) return render(request, 'mp.html', { 'mp': mp, 'documents': mp.documents.all(), 'top_users': top_users[:5], })
  46. 46. Make the request def mp(request, id): mp = get_object_or_404(MP, pk = id) ... client = Client(settings.GUARDIAN_APIKEY) return render(request, 'mp.html', { 'mp': mp, 'documents': mp.documents.all(), 'top_users': top_users[:5], 'articles': client.search(q='"%s"'%(mp.name)), })
  47. 47. Update the template <div id="about-mp"> <ul> {% for article in articles %} <li> <a href="{{article.webUrl}}">{{article.headline}}</a> </li> {% endfor %} </ul> </div>
  48. 48. And the result!
  49. 49. Additions? Use Memcached to cache the response from guardian API Don't create a new client each time Article bodies
  50. 50. Who else is using it?
  51. 51. Who else is using it?
  52. 52. Who else is using it?
  53. 53. Who else is using it?
  54. 54. Public API Key? jbynv3fwdp8ju5625mt2axw3 Only 5k queries a day Will be closed on Friday 14th August Will be closed if abused Strongly encourage you to sign up for your own key
  55. 55. Questions?
  56. 56. Mashing up Michael Brunton-Spall michael.brunton-spall@guardian.co.uk twitter: @mibgames

×