1




  From publisher to platform
   How the guardian used content, search, and open source to
   build a powerful new bu...
The publishing era




Apache Lucene EuroCon   21 May 2010      2
We started a long
    time ago:




Apache Lucene EuroCon   21 May 2010
“To secure the financial and editorial independence of
   To secure the financial and editorial
  The Guardian in perpetuit...
2010




Apache Lucene EuroCon   21 May 2010
2010
                                                           Keyword page

                                            ...
1996




Apache Lucene EuroCon   21 May 2010   6
1999




Apache Lucene EuroCon   21 May 2010   7
1999




Apache Lucene EuroCon   21 May 2010   7
01-> 06




Apache Lucene EuroCon   21 May 2010   8
2009
     1.5M pages
     and counting

     250M+ pages/
     month

     30M visitors/
     month

     4x Webby
     aw...
2009
     1.5M pages
     and counting

     250M+ pages/
     month

     30M visitors/
     month

     4x Webby
     aw...
2009
     1.5M pages
     and counting

     250M+ pages/
     month

     30M visitors/
     month

     4x Webby
     aw...
2009
     1.5M pages
     and counting

     250M+ pages/
     month

     30M visitors/
     month

     4x Webby
     aw...
Part of the Web




Apache Lucene EuroCon   21 May 2010             10
1. Permanent




                                            http://www.flickr.com/photos/fstorr/


  •     “A cool URI is ...
2. Addressable
              ★ Resources are “about” something - ready for the
                social web.

              ...
3. Discoverable


      ★ Multiple routes
        to content

      ★ Tagging drives
        discovery




Apache Lucene E...
3. Discoverable


      ★ Multiple routes
        to content

      ★ Tagging drives
        discovery




Apache Lucene E...
3. Discoverable


      ★ Multiple routes
        to content

      ★ Tagging drives
        discovery




Apache Lucene E...
3. Discoverable


      ★ Multiple routes
        to content

      ★ Tagging drives
        discovery




Apache Lucene E...
Apache Lucene EuroCon   21 May 2010   14
The hackable guardian.co.uk
http://www.guardian.co.uk/....




Apache Lucene EuroCon   21 May 2010
The hackable guardian.co.uk
http://www.guardian.co.uk/....




/technology/internet

/technology/all

/environment/climate...
The hackable guardian.co.uk
http://www.guardian.co.uk/....




/technology/internet

/technology/all

/environment/climate...
The hackable guardian.co.uk
http://www.guardian.co.uk/....




/technology/internet

/technology/all

/environment/climate...
The hackable guardian.co.uk
http://www.guardian.co.uk/....




/technology/internet/rss

/technology/all/rss

/environment...
Results...



Apache Lucene EuroCon   21 May 2010                16
Site traffic growth                                     Final Release

                                        Unique Users...
Site traffic growth                                                                        Final Release

                 ...
Site traffic growth                                                                        Final Release

                 ...
Site traffic growth                                                                        Final Release

                 ...
However...


Apache Lucene EuroCon   21 May 2010            18
1 Billion+
                                       Internet
                                        Users!




Apache Lucen...
Apache Lucene EuroCon   21 May 2010   20
Apache Lucene EuroCon   21 May 2010   21
Apache Lucene EuroCon   21 May 2010   22
....”How I stopped
  worrying about
  my website and
  learned to love
  the whole
  Internet.”

  Matt McAlister

Apache ...
The Open Strategy
            OPEN IN                   OPEN OUT

            Bring in data and         Enable partners to...
Apache Lucene EuroCon   21 May 2010   25
Apache Lucene EuroCon   21 May 2010   26
Apache Lucene EuroCon   21 May 2010   27
"Our most interesting experiments lie in combining
what we know with the experience, opinions and
expertise of the people ...
TA
                        BE




               The Open Platform


Apache Lucene EuroCon    21 May 2010   29
TA
                        BE




             OPEN IN                   OPEN OUT

             Bring in data and apps    ...
TA
                        BE




             OPEN IN                   OPEN OUT

             Bring in data and apps    ...
TA
                        BE




            The suite of services
          enabling partners to build
            appli...
TA
                        BE




Apache Lucene EuroCon    21 May 2010
TA
                        BE




             CONTENT API               DATA STORE       POLITICS API

                A ...
TA
                        BE
                                             Your App Here!

         CONTENT API
          ...
TA
                        BE




Apache Lucene EuroCon    21 May 2010   34
• Stamen Design - APIMaps.org
Apache Lucene EuroCon   21 May 2010   35
Text




Apache Lucene EuroCon   21 May 2010          36
TA
                        BE



              DATA STORE
                A directory of
              useful data curated...
TA
                        BE


   POLITICS API
      Open database of
      candidates, voting
   records, constituencies...
TA
                        BE


 POLITICS API
    Open database of
    candidates, voting
 records, constituencies,
   ele...
TA
                        BE




                        Open for Business



Apache Lucene EuroCon    21 May 2010       ...
Open for Business



Apache Lucene EuroCon   21 May 2010         40
1          3 Tiers of access, 3 Revenue models

                BESPOKE: Take, reformat, augment our content. Same access ...
Apache Lucene EuroCon   21 May 2010   42
What this means
            OPEN OUT: Developers can now access our full content APIs
            on demand with keys post...
2              Open In
    CONTENT API                       DATA STORE        POLITICS API
  A service for selecting     ...
2              Open In
    CONTENT API                       DATA STORE        POLITICS API              MICROAPPS
  A ser...
OPEN OUT
             OPEN IN
                                      Allow partners to build
             Bring in data and...
Apache Lucene EuroCon   21 May 2010   46
Apache Lucene EuroCon   21 May 2010   47
App showcase




Apache Lucene EuroCon   21 May 2010                  48
What this means
        Open In: Partners can now more easily integrate
        into our core

        The Open Platform w...
Evolving the
                             architecture


Apache Lucene EuroCon   21 May 2010         50
From Publisher to Platform
       ★Seeking massive growth, but no longer only
       broadcasting content

       ★User/pa...
Web server      Web server   Web server



                        App server      App server   App server



            ...
Web server         Web server        Web server

                                 Why RDBMS?
                        App s...
Scaling




Apache Lucene EuroCon   21 May 2010             54
Unique Users




Apache Lucene EuroCon   21 May 2010                  55
30,000,000
                                                        Unique Users
                    26,250,000

          ...
Unique Users




Apache Lucene EuroCon   21 May 2010                  56
28,000,000
                                      25,750,000          Unique Users
                                      23...
Whatʼs going on?
      ★We tag our content
          (multifaceted)

      ★Guardian.co.uk is a faceted
          browse t...
Whatʼs going on?
      ★We tag our content
          (multifaceted)

      ★Guardian.co.uk is a faceted
          browse t...
Whatʼs going on?
      ★We tag our content
          (multifaceted)

      ★Guardian.co.uk is a faceted
          browse t...
“Related content” from search engine




Apache Lucene EuroCon   21 May 2010          58
5
Apache Lucene EuroCon   21 May 2010
Your App Here!

         CONTENT API
           A service for selecting            REST API
           and collecting cont...
Apache Lucene EuroCon   21 May 2010   61
We used Solr/Lucene
         Can perform complex queries, including full text search

         We can change the schema wi...
Core



                           Web servers


                                      App server



                     ...
Core


                                                                  Content API
                           Web server...
Open in?

                                       Simple REST/ HTTP framework
         MICROAPPS                     allows...
Open in?

                                       Simple REST/ HTTP framework
         MICROAPPS                     allows...
Core


                   Apps
                                               Web servers




                            ...
OPEN IN                                                     OPEN OUT

                                                Web ...
C
                                  I               O


                                       CONTENT


                 ...
Thank you
http://www.guardian.co.uk/open-platform
Twitter: @openplatform
         @cuica (Stephen Dunn)




Apache Lucene ...
Upcoming SlideShare
Loading in …5
×

From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

14,042 views
13,506 views

Published on

Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content.

This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy.

We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.

Published in: Technology
1 Comment
29 Likes
Statistics
Notes
  • 미디어 변천사
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
14,042
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
346
Comments
1
Likes
29
Embeds 0
No embeds

No notes for slide

From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

  1. 1 From publisher to platform How the guardian used content, search, and open source to build a powerful new business model Stephen Dunn, Guardian News and Media Apache Lucene EuroCon 21 May 2010
  2. The publishing era Apache Lucene EuroCon 21 May 2010 2
  3. We started a long time ago: Apache Lucene EuroCon 21 May 2010
  4. “To secure the financial and editorial independence of To secure the financial and editorial The Guardian in perpetuity.” independence of the Guardian in perpetuity. Topromote freedom in the press press and liberal “To promote freedom in the and liberal journalism journalism globally. globally.” Apache Lucene EuroCon 21 May 2010
  5. 2010 Apache Lucene EuroCon 21 May 2010
  6. 2010 Keyword page Live blogs iPhone app Mobile site Twitter updates Swine flu Comment Content partnerships Newspapers Audio Video Data API Apache Lucene EuroCon 21 May 2010
  7. 1996 Apache Lucene EuroCon 21 May 2010 6
  8. 1999 Apache Lucene EuroCon 21 May 2010 7
  9. 1999 Apache Lucene EuroCon 21 May 2010 7
  10. 01-> 06 Apache Lucene EuroCon 21 May 2010 8
  11. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  12. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  13. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  14. 2009 1.5M pages and counting 250M+ pages/ month 30M visitors/ month 4x Webby award winner (best newspaper site) Apache Lucene EuroCon 21 May 2010 9
  15. Part of the Web Apache Lucene EuroCon 21 May 2010 10
  16. 1. Permanent http://www.flickr.com/photos/fstorr/ • “A cool URI is one that does not change” Tim Berners-Lee 1998 • 1.5 million resources redirected to new scheme Apache Lucene EuroCon 21 May 2010 11
  17. 2. Addressable ★ Resources are “about” something - ready for the social web. ★ We live in “the age of point-at-things” (Coates 2005) Apache Lucene EuroCon 21 May 2010 12
  18. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  19. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  20. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  21. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery Apache Lucene EuroCon 21 May 2010 13
  22. Apache Lucene EuroCon 21 May 2010 14
  23. The hackable guardian.co.uk http://www.guardian.co.uk/.... Apache Lucene EuroCon 21 May 2010
  24. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange Apache Lucene EuroCon 21 May 2010
  25. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange +business/globaleconomy Apache Lucene EuroCon 21 May 2010
  26. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet /technology/all /environment/climatechange +business/globaleconomy Apache Lucene EuroCon 21 May 2010
  27. The hackable guardian.co.uk http://www.guardian.co.uk/.... /technology/internet/rss /technology/all/rss /environment/climatechange +business/globaleconomy/rss Apache Lucene EuroCon 21 May 2010
  28. Results... Apache Lucene EuroCon 21 May 2010 16
  29. Site traffic growth Final Release Unique Users First release Apache Lucene EuroCon 21 May 2010 17
  30. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  31. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users Pre - project 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  32. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users Pre - project 18,750,000 15,000,000 11,250,000 36M 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 17
  33. However... Apache Lucene EuroCon 21 May 2010 18
  34. 1 Billion+ Internet Users! Apache Lucene EuroCon 21 May 2010 19
  35. Apache Lucene EuroCon 21 May 2010 20
  36. Apache Lucene EuroCon 21 May 2010 21
  37. Apache Lucene EuroCon 21 May 2010 22
  38. ....”How I stopped worrying about my website and learned to love the whole Internet.” Matt McAlister Apache Lucene EuroCon 21 May 2010 23
  39. The Open Strategy OPEN IN OPEN OUT Bring in data and Enable partners to apps from the build applications Internet using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 24
  40. Apache Lucene EuroCon 21 May 2010 25
  41. Apache Lucene EuroCon 21 May 2010 26
  42. Apache Lucene EuroCon 21 May 2010 27
  43. "Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.” Apache Lucene EuroCon 21 May 2010 28
  44. TA BE The Open Platform Apache Lucene EuroCon 21 May 2010 29
  45. TA BE OPEN IN OPEN OUT Bring in data and apps Allow partners to build from the Internet applications using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 30
  46. TA BE OPEN IN OPEN OUT Bring in data and apps Allow partners to build from the Internet applications using Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 30
  47. TA BE The suite of services enabling partners to build applications with the Guardian Apache Lucene EuroCon 21 May 2010 31
  48. TA BE Apache Lucene EuroCon 21 May 2010
  49. TA BE CONTENT API DATA STORE POLITICS API A service for A directory of Open database of selecting and useful data candidates, voting collecting content curated by records, from the Guardian Guardian constituencies, editors election results, for re-use live data on election day Apache Lucene EuroCon 21 May 2010
  50. TA BE Your App Here! CONTENT API A service for selecting REST API and collecting content from the Guardian for re-use Search engine CMS Guardian database Apache Lucene EuroCon 21 May 2010
  51. TA BE Apache Lucene EuroCon 21 May 2010 34
  52. • Stamen Design - APIMaps.org Apache Lucene EuroCon 21 May 2010 35
  53. Text Apache Lucene EuroCon 21 May 2010 36
  54. TA BE DATA STORE A directory of useful data curated by Guardian editors Apache Lucene EuroCon 21 May 2010
  55. TA BE POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010
  56. TA BE POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010 39
  57. TA BE Open for Business Apache Lucene EuroCon 21 May 2010 40
  58. Open for Business Apache Lucene EuroCon 21 May 2010 40
  59. 1 3 Tiers of access, 3 Revenue models BESPOKE: Take, reformat, augment our content. Same access as Guardian. Revenue model to be negotiated. Combination of Media, Fees, Downloads. APPROVED: Take our full article content, with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue KEYLESS: Take our headlines. You keep associated revenues Apache Lucene EuroCon 21 May 2010 41
  60. Apache Lucene EuroCon 21 May 2010 42
  61. What this means OPEN OUT: Developers can now access our full content APIs on demand with keys post-approved. We are now positioning the platform as a place to do business with us. So, rapid scalability, reliability, performance, are now core requirements Apache Lucene EuroCon 21 May 2010 43
  62. 2 Open In CONTENT API DATA STORE POLITICS API A service for selecting A directory of Open database of and collecting content useful data curated candidates, voting from the Guardian for by Guardian records, re-use editors constituencies, election results, live data on election day Apache Lucene EuroCon 21 May 2010
  63. 2 Open In CONTENT API DATA STORE POLITICS API MICROAPPS A service for selecting A directory of Open database of A framework for and collecting content useful data curated candidates, voting integrating 3rd party from the Guardian for by Guardian records, applications into re-use editors constituencies, guardian.co.uk. election results, live data on election day Apache Lucene EuroCon 21 May 2010
  64. OPEN OUT OPEN IN Allow partners to build Bring in data and apps applications using from the Internet Guardian content and services for other digital platforms Apache Lucene EuroCon 21 May 2010 45
  65. Apache Lucene EuroCon 21 May 2010 46
  66. Apache Lucene EuroCon 21 May 2010 47
  67. App showcase Apache Lucene EuroCon 21 May 2010 48
  68. What this means Open In: Partners can now more easily integrate into our core The Open Platform will become key to our commercial future. Apache Lucene EuroCon 21 May 2010 49
  69. Evolving the architecture Apache Lucene EuroCon 21 May 2010 50
  70. From Publisher to Platform ★Seeking massive growth, but no longer only broadcasting content ★User/partner engagement & contribution on ★journalism ★data ★software ★applications ★revenue and ads ★ Support developers and partners with data and APIs, need scalability, reliability, speed Apache Lucene EuroCon 21 May 2010 51
  71. Web server Web server Web server App server App server App server Memcached Oracle CMS Apache Lucene EuroCon 21 May 2010
  72. Web server Web server Web server Why RDBMS? App server App server App server 5 years ago, fewer alternatives Understand operations procedures Memcached Can easily recruit DBAs / devs Developer/ops tools Oracle Business critical system: a safe choice CMS Data feeds Apache Lucene EuroCon 21 May 2010
  73. Scaling Apache Lucene EuroCon 21 May 2010 54
  74. Unique Users Apache Lucene EuroCon 21 May 2010 55
  75. 30,000,000 Unique Users 26,250,000 22,500,000 Unique Users 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 55
  76. Unique Users Apache Lucene EuroCon 21 May 2010 56
  77. 28,000,000 25,750,000 Unique Users 23,500,000 21,250,000 19,000,000 16,750,000 14,500,000 12,250,000 May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 2009 Apache Lucene EuroCon 21 May 2010 56
  78. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  79. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  80. Whatʼs going on? ★We tag our content (multifaceted) ★Guardian.co.uk is a faceted browse through our tag- space, with editorial teams “spotlighting” key resources on selected nodes. ★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS Apache Lucene EuroCon 21 May 2010 57
  81. “Related content” from search engine Apache Lucene EuroCon 21 May 2010 58
  82. 5 Apache Lucene EuroCon 21 May 2010
  83. Your App Here! CONTENT API A service for selecting REST API and collecting content from the Guardian for re-use Search engine CMS Guardian database Apache Lucene EuroCon 21 May 2010
  84. Apache Lucene EuroCon 21 May 2010 61
  85. We used Solr/Lucene Can perform complex queries, including full text search We can change the schema with no downtime. On our dataset most queries are of a similar cost Scales very well horizontally Replication makes it easy to work in the cloud Apache Lucene EuroCon 21 May 2010 62
  86. Core Web servers App server Memcached rdbms CMS Apache Lucene EuroCon 21 May 2010 63
  87. Core Content API Web servers Solr App server Solr Memcached Solr rdbms Solr Solr Solr CMS Cloud, EC2 Apache Lucene EuroCon 21 May 2010 63
  88. Open in? Simple REST/ HTTP framework MICROAPPS allows lightweight development A framework for Applications proxied for integrating 3rd party performance applications into guardian.co.uk. Apps generally hosted in the cloud, hot deployment into production Apache Lucene EuroCon 21 May 2010
  89. Open in? Simple REST/ HTTP framework MICROAPPS allows lightweight development A framework for Applications proxied for integrating 3rd party performance applications into guardian.co.uk. Apps generally hosted in the cloud, hot deployment into production Apache Lucene EuroCon 21 May 2010
  90. Core Apps Web servers Proxy App App server App Memcached App App rdbms App App CMS external hosting app engine etc Apache Lucene EuroCon 21 May 2010 65
  91. OPEN IN OPEN OUT Web servers Solr Proxy App App servers App Memcached Solr App Solr App CMS Solr Solr App Solr App rdbms Cloud, EC2 external hosting app engine etc Apache Lucene EuroCon 21 May 2010
  92. C I O CONTENT r external Clo C I O ??????? r external Clo Apache Lucene EuroCon 21 May 2010
  93. Thank you http://www.guardian.co.uk/open-platform Twitter: @openplatform @cuica (Stephen Dunn) Apache Lucene EuroCon 21 May 2010 68

×