From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

  • 9,232 views
Uploaded on

Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content. …

Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content.

This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy.

We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • 미디어 변천사
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
9,232
On Slideshare
0
From Embeds
0
Number of Embeds
8

Actions

Shares
Downloads
334
Comments
1
Likes
27

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 From publisher to platform How the guardian used content, search, and open source to build a powerful new business model Stephen Dunn, Guardian News and Media Apache Lucene EuroCon 21 May 2010
  • 2. The publishing era Apache Lucene EuroCon 21 May 2010 2
  • 3. We started a long time ago: Apache Lucene EuroCon 21 May 2010
  • 4. “To secure the financial and editorial independence of To secure the financial and editorial The Guardian in perpetuity.” independence of the Guardian in perpetuity. Topromote freedom in the press press and liberal “To promote freedom in the and liberal journalism journalism globally. globally.” Apache Lucene EuroCon 21 May 2010
  • 5. 2010 Apache Lucene EuroCon 21 May 2010
  • 6. 2010 Keyword page Live blogs iPhone app Mobile site Twitter updates Swine flu Comment Content partnerships Newspapers Audio Video Data API Apache Lucene EuroCon 21 May 2010
  • 7. 1996 Apache Lucene EuroCon 21 May 2010 6
  • 8. 1999 Apache Lucene EuroCon 21 May 2010 7
  • 9. 1999 Apache Lucene EuroCon 21 May 2010 7
  • 10. 01-> 06 Apache Lucene EuroCon 21 May 2010 8
  • 11. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  • 12. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  • 13. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  • 14. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  • 15. Part of the Web Apache Lucene EuroCon 21 May 2010 10
  • 16. 1. Permanent http://www.flickr.com/photos/fstorr/ • “A cool URI is one that does not change” Tim Berners-Lee 1998 • 1.5 million resources redirected to new scheme Apache Lucene EuroCon 21 May 2010 11
  • 17. 2. Addressable ★ Resources are “about” something - ready for the social web. ★ We live in “the age of point-at-things” (Coates 2005) Apache Lucene EuroCon 21 May 2010 12
  • 18. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  • 19. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  • 20. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  • 21. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  • 22. Apache Lucene EuroCon 21 May 2010 14
  • 23. The hackable guardian.co.uk http://www.guardian.co.uk/.... Apache Lucene EuroCon 21 May 2010
  • 24. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange Apache Lucene EuroCon 21 May 2010
  • 25. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange +business/globaleconomy Apache Lucene EuroCon 21 May 2010
  • 26. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange +business/globaleconomy Apache Lucene EuroCon 21 May 2010
  • 27. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet/rss /technology/all/rss /environment/climatechange +business/globaleconomy/rss Apache Lucene EuroCon 21 May 2010
  • 28. Results... Apache Lucene EuroCon 21 May 2010 16
  • 29. Site traffic growth Final Release Unique Users First release Apache Lucene EuroCon 21 May 2010 17
  • 30. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  • 31. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users Pre - project 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  • 32. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users Pre - project 18,750,000 15,000,000 11,250,000 36M 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  • 33. However... Apache Lucene EuroCon 21 May 2010 18
  • 34. 1 Billion+ Internet Users! Apache Lucene EuroCon 21 May 2010 19
  • 35. Apache Lucene EuroCon 21 May 2010 20
  • 36. Apache Lucene EuroCon 21 May 2010 21
  • 37. Apache Lucene EuroCon 21 May 2010 22
  • 38. ....”How I stopped worrying about my website and learned to love the whole Internet.” Matt McAlister Apache Lucene EuroCon 21 May 2010 23
  • 39. The Open Strategy OPEN IN OPEN OUT Bring in data and Enable partners to apps from the build applications Internet using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 24
  • 40. Apache Lucene EuroCon 21 May 2010 25
  • 41. Apache Lucene EuroCon 21 May 2010 26
  • 42. Apache Lucene EuroCon 21 May 2010 27
  • 43. "Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.” Apache Lucene EuroCon 21 May 2010 28
  • 44. TA BE The Open Platform Apache Lucene EuroCon 21 May 2010 29
  • 45. TA BE OPEN IN OPEN OUT Bring in data and apps Allow partners to build from the Internet applications using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 30
  • 46. TA BE OPEN IN OPEN OUT Bring in data and apps Allow partners to build from the Internet applications using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 30
  • 47. TA BE The suite of services enabling partners to build applications with the Guardian Apache Lucene EuroCon 21 May 2010 31
  • 48. TA BE Apache Lucene EuroCon 21 May 2010
  • 49. TA BE CONTENT API DATA STORE POLITICS API A service for A directory of Open database of selecting and useful data candidates, voting collecting content curated by records, from the Guardian Guardian constituencies, editors election results, for re-use live data on election day Apache Lucene EuroCon 21 May 2010
  • 50. TA BE Your App Here! CONTENT API A service for selecting REST API and collecting content from the Guardian for re-use Search engine CMS Guardian database Apache Lucene EuroCon 21 May 2010
  • 51. TA BE Apache Lucene EuroCon 21 May 2010 34
  • 52. • Stamen Design - APIMaps.org Apache Lucene EuroCon 21 May 2010 35
  • 53. Text Apache Lucene EuroCon 21 May 2010 36
  • 54. TA BE DATA STORE A directory of useful data curated by Guardian editors Apache Lucene EuroCon 21 May 2010
  • 55. TA BE POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010
  • 56. TA BE POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010 39
  • 57. TA BE Open for Business Apache Lucene EuroCon 21 May 2010 40
  • 58. Open for Business Apache Lucene EuroCon 21 May 2010 40
  • 59. 1 3 Tiers of access, 3 Revenue models BESPOKE: Take, reformat, augment our content. Same access as Guardian. Revenue model to be negotiated. Combination of Media, Fees, Downloads. APPROVED: Take our full article content, with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue KEYLESS: Take our headlines. You keep associated revenues Apache Lucene EuroCon 21 May 2010 41
  • 60. Apache Lucene EuroCon 21 May 2010 42
  • 61. What this means OPEN OUT: Developers can now access our full content APIs on demand with keys post-approved. We are now positioning the platform as a place to do business with us. So, rapid scalability, reliability, performance, are now core requirements Apache Lucene EuroCon 21 May 2010 43
  • 62. 2 Open In CONTENT API DATA STORE POLITICS API A service for selecting A directory of Open database of and collecting content useful data curated candidates, voting from the Guardian for by Guardian records, re-use editors constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010
  • 63. 2 Open In CONTENT API DATA STORE POLITICS API MICROAPPS A service for selecting A directory of Open database of A framework for and collecting content useful data curated candidates, voting integrating 3rd party from the Guardian for by Guardian records, applications into re-use editors constituencies, guardian.co.uk. election results, live data on election day Apache Lucene EuroCon 21 May 2010
  • 64. OPEN OUT OPEN IN Allow partners to build Bring in data and apps applications using from the Internet Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 45
  • 65. Apache Lucene EuroCon 21 May 2010 46
  • 66. Apache Lucene EuroCon 21 May 2010 47
  • 67. App showcase Apache Lucene EuroCon 21 May 2010 48
  • 68. What this means Open In: Partners can now more easily integrate into our core The Open Platform will become key to our commercial future. Apache Lucene EuroCon 21 May 2010 49
  • 69. Evolving the architecture Apache Lucene EuroCon 21 May 2010 50
  • 70. From Publisher to Platform ★Seeking massive growth, but no longer only broadcasting content ★User/partner engagement & contribution on ★journalism ★data ★software ★applications ★revenue and ads ★ Support developers and partners with data and APIs, need scalability, reliability, speed Apache Lucene EuroCon 21 May 2010 51
  • 71. Web server Web server Web server App server App server App server Memcached Oracle CMS Apache Lucene EuroCon 21 May 2010
  • 72. Web server Web server Web server Why RDBMS? App server App server App server 5 years ago, fewer alternatives Understand operations procedures Memcached Can easily recruit DBAs / devs Developer/ops tools Oracle Business critical system: a safe choice CMS Data feeds Apache Lucene EuroCon 21 May 2010
  • 73. Scaling Apache Lucene EuroCon 21 May 2010 54
  • 74. Unique Users Apache Lucene EuroCon 21 May 2010 55
  • 75. 30,000,000 Unique Users 26,250,000 22,500,000 Unique Users 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 55
  • 76. Unique Users Apache Lucene EuroCon 21 May 2010 56
  • 77. 28,000,000 25,750,000 Unique Users 23,500,000 21,250,000 19,000,000 16,750,000 14,500,000 12,250,000 May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 56
  • 78. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  • 79. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  • 80. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  • 81. “Related content” from search engine Apache Lucene EuroCon 21 May 2010 58
  • 82. 5 Apache Lucene EuroCon 21 May 2010
  • 83. Your App Here! CONTENT API A service for selecting REST API and collecting content from the Guardian for re-use Search engine CMS Guardian database Apache Lucene EuroCon 21 May 2010
  • 84. Apache Lucene EuroCon 21 May 2010 61
  • 85. We used Solr/Lucene Can perform complex queries, including full text search We can change the schema with no downtime. On our dataset most queries are of a similar cost Scales very well horizontally Replication makes it easy to work in the cloud Apache Lucene EuroCon 21 May 2010 62
  • 86. Core Web servers App server Memcached rdbms CMS Apache Lucene EuroCon 21 May 2010 63
  • 87. Core Content API Web servers Solr App server Solr Memcached Solr rdbms Solr Solr Solr CMS Cloud, EC2 Apache Lucene EuroCon 21 May 2010 63
  • 88. Open in? Simple REST/ HTTP framework MICROAPPS allows lightweight development A framework for Applications proxied for integrating 3rd party performance applications into guardian.co.uk. Apps generally hosted in the cloud, hot deployment into production Apache Lucene EuroCon 21 May 2010
  • 89. Open in? Simple REST/ HTTP framework MICROAPPS allows lightweight development A framework for Applications proxied for integrating 3rd party performance applications into guardian.co.uk. Apps generally hosted in the cloud, hot deployment into production Apache Lucene EuroCon 21 May 2010
  • 90. Core Apps Web servers Proxy App App server App Memcached App App rdbms App App CMS external hosting app engine etc Apache Lucene EuroCon 21 May 2010 65
  • 91. OPEN IN OPEN OUT Web servers Solr Proxy App App servers App Memcached Solr App Solr App CMS Solr Solr App Solr App rdbms Cloud, EC2 external hosting app engine etc Apache Lucene EuroCon 21 May 2010
  • 92. C I O CONTENT r external Clo C I O ??????? r external Clo Apache Lucene EuroCon 21 May 2010
  • 93. Thank you http://www.guardian.co.uk/open-platform Twitter: @openplatform @cuica (Stephen Dunn) Apache Lucene EuroCon 21 May 2010 68