SlideShare a Scribd company logo
From publisher to platform:
    How the Guardian embraced the internet
    using content, search, and Open Source
                           Stephen Dunn, Guardian News and Media
                        stephen.dunn@guardian.co.uk, 25th May, 2011
                               Twitter: @cuica, @openplatform




Thursday, 26 May 2011
1




       From publisher to platform
       How the Guardian embraced the Internet using
       content, search, and Open Source
       Stephen Dunn, Guardian News and Media

                                                      2


Thursday, 26 May 2011
The publishing era




                                             3


Thursday, 26 May 2011
We started a long
          time ago:




Thursday, 26 May 2011
Keyword page

                                                                                        Live blogs
             Apps                    Mobile site




                  Twitter updates
                                                           Swine flu                           Comment



              Content partnerships



                                                                                                     Newspapers

                           Audio


                                                   Video            Open platform API




Thursday, 26 May 2011
To secure the financial and editorial
  To secure the financial and editorial independence
  independence of the Guardian in perpetuity.
  To promote freedom in thein perpetuity
          of the Guardian press and liberal
  journalism globally.

        To promote freedom in the press and liberal
      To become the world's leading liberal voice.

                        journalism globally


Thursday, 26 May 2011
Open Web Principles




                                              7


Thursday, 26 May 2011
2009




                        8


Thursday, 26 May 2011
1. Permanent




                                                      http://www.flickr.com/photos/fstorr/




             •      “A cool URI is one that does not change”                    Tim Berners-Lee 1998
             •      1.5 million resources redirected to new scheme
                                                                                                  9




Thursday, 26 May 2011
2. Addressable
                        ★ Resources are “about” something - ready for the
                          social web.

                        ★ We live in “the age of point-at-things” (Coates 2005)




                                                                                  10


Thursday, 26 May 2011
3. Discoverable


                 ★ Multiple routes
                   to content

                 ★ Tagging drives
                   discovery




                                              11


Thursday, 26 May 2011
4. Open




                                  12


Thursday, 26 May 2011
Example: The Hackable Guardian


            http://
    www.guardian.co.uk/....

        /technology/internet /rss

        /technology/all /rss

        /environment/climatechange +business/globaleconomy/rss


Thursday, 26 May 2011
Results...




                                     14


Thursday, 26 May 2011
Site traffic growth                                      Final Release

                                                                  Unique Users
                         30,000,000

                         26,250,000                             First release


                         22,500,000
          Unique Users




                                             Pre - project
                         18,750,000

                         15,000,000

                         11,250,000
                                                                                            40M
                          7,500,000

                          3,750,000



                                  Sep 2005                   Oct 2006            Nov 2007             Dec 2008

                                                                                                                 15


Thursday, 26 May 2011
However...


                                     16


Thursday, 26 May 2011
1 Billion+
                         Internet
                          Users!




                                     17


Thursday, 26 May 2011
18


Thursday, 26 May 2011
19


Thursday, 26 May 2011
20


Thursday, 26 May 2011
...“How I
       stopped
       worrying about
       my website and
       learned to love
       the whole
       internet.”
       Matt McAlister

                         21


Thursday, 26 May 2011
The Open Strategy

                  OPEN IN                  OPEN OUT

                  Bring in data and apps   Enable partners to
                  from the Internet        build applications
                                           using Guardian
                                           content and services
                                           for other platforms


                                                                  22




Thursday, 26 May 2011
23


Thursday, 26 May 2011
"Our most interesting experiments lie in combining
    what we know with the experience, opinions and
    expertise of the people who want to participate
    rather than passively receive.”
                                                     24


Thursday, 26 May 2011
25


Thursday, 26 May 2011
26


Thursday, 26 May 2011
27


Thursday, 26 May 2011
28


Thursday, 26 May 2011
29


Thursday, 26 May 2011
30


Thursday, 26 May 2011
31


Thursday, 26 May 2011
32


Thursday, 26 May 2011
33


Thursday, 26 May 2011
Jack Shenker
   “The Guardian alongside Al Jazeera was the one news source
   that everybody on the streets in Tahrir - not just in Cairo but in
   surrounding cities and major centers of revolutionary activity -
   that people were talking about.”
                                                                 34


Thursday, 26 May 2011
The Open Strategy

                  OPEN IN                  OPEN OUT

                  Bring in data and apps   Enable partners to
                  from the Internet        build applications
                                           using Guardian
                                           content and services
                                           for other platforms


                                                                  35
                                                                  22




Thursday, 26 May 2011
The Open Platform



                                            36


Thursday, 26 May 2011
The suite of services enabling
      partners to build applications with
                 the Guardian


                                        37


Thursday, 26 May 2011
OPEN IN                  OPEN OUT

                  Bring in data and apps   Enable partners to
                  from the Internet        build applications
                                           using Guardian
                                           content and services
                                           for other platforms


                                                                  38
                                                                  22




Thursday, 26 May 2011
CONTENT API      DATA STORE       POLITICS API
                         A service for   A directory of   Open database
                         selecting and    useful data      of candidates,
                           collecting     curated by      voting records,
                         content from      Guardian       constituencies,
                         the Guardian       editors       election results,
                           for re-use                       live data on
                                                            election day




Thursday, 26 May 2011
Mutualised news!




                                           40


Thursday, 26 May 2011
Mutualised news!




                                           41


Thursday, 26 May 2011
Mutualised news!




                                           42


Thursday, 26 May 2011
43


Thursday, 26 May 2011
44


Thursday, 26 May 2011
45


Thursday, 26 May 2011
46


Thursday, 26 May 2011
DATA STORE
                          A directory of
                        useful data curated
                           by Guardian
                              editors




Thursday, 26 May 2011
POLITICS API
           Open database of
           candidates, voting
        records, constituencies,
          election results, live
          data on election day




Thursday, 26 May 2011
POLITICS API
         Open database of
         candidates, voting
      records, constituencies,
        election results, live
        data on election day




                                 49


Thursday, 26 May 2011
<OBLIGATORY DOGFOOD SLIDE >


                                          50


Thursday, 26 May 2011
51


Thursday, 26 May 2011
Thursday, 26 May 2011
Thursday, 26 May 2011
Thursday, 26 May 2011
Thursday, 26 May 2011
Open for Business




                                            56


Thursday, 26 May 2011
3 Tiers of access
      3 Revenue models

      Keyless: Take our headlines. You keep associated
      revenues.

      Approved: Take our full article content, but with an
      advert. Guardian keeps ad revenue, you keep rest-of-
      page revenue.

      Bespoke: Take, reformat, augment our content
      Revenue model to be negotiated. Combination of
      Media, Fees, Downloads.


                                                             57


Thursday, 26 May 2011
58


Thursday, 26 May 2011
What this means
              Open Out: Developers can now access full content APIs on
              demand with keys post-approved

              Platform is positioned as a place to do business

              So rapid scalability, reliability and performance are now core
              requirements




                                                                               59


Thursday, 26 May 2011
OPEN IN            OPEN OUT
               Bring in data and   Allow partners to
                apps from the      build applications
                    internet        using Guardian
                                      content and
                                   services for other
                                       platforms


Thursday, 26 May 2011
Simple REST/HTTP
          MICROAPPS             framework allows lightweight
                                development
          A framework for
        integrating 3rd party   Applications proxied for
          applications into     performance
           guardian.co.uk
                                Apps generally hosted in the
                                cloud, allows hot deployment
                                into production




                                                               61


Thursday, 26 May 2011
MICROAPPS
         A framework for
       integrating 3rd party
         applications into
          guardian.co.uk




                               62


Thursday, 26 May 2011
• What could I cook?




Thursday, 26 May 2011
Bringing it together




                                               64


Thursday, 26 May 2011
65


Thursday, 26 May 2011
App showcase




                                       66


Thursday, 26 May 2011
From publisher to
                            platform
                        Seeking massive growth, but no longer only
                        broadcasting content on the website

                        User/partner engagement & contribution on
                         Journalism
                         data
                         software
                         applications
                         revenue and ads

                        Support developers and partners with data and APIs,
                        need scalability, reliability, speed
                                                                              67


Thursday, 26 May 2011
Evolving the
                        architecture


                                       68


Thursday, 26 May 2011
Web server     Web server     Web server


                        App server     App server      App server


                                 Memcached (added later)




                                         Oracle



                                         CMS




Thursday, 26 May 2011
Web server   Web server    Web server

                        Why RDBMS?
                        App server   App server    App server
                        5 years ago, fewer alternatives

                                   Memcached
                        Understand operations procedures

                        Can easily recruit DBAs / devs
                                       Oracle
                        Developer/ops tools

                        Business critical system: a safe choice
                                       CMS




Thursday, 26 May 2011
Scaling traffic
                                                   Unique Users
                         30,000,000

                         26,250,000

                         22,500,000
          Unique Users




                         18,750,000

                         15,000,000

                         11,250,000

                          7,500,000

                          3,750,000



                                  Sep 2005   Sep 2006         Sep 2007   Sep 2008

                                                                                    71


Thursday, 26 May 2011
72


Thursday, 26 May 2011
73


Thursday, 26 May 2011
74


Thursday, 26 May 2011
75


Thursday, 26 May 2011
76


Thursday, 26 May 2011
77


Thursday, 26 May 2011
We chose Solr/Lucene
                        Can perform complex queries, including full-text search

                        We can change the schema with no downtime

                        Most queries are of similar cost

                        Scales very well horizontally

                        “Just worked” in the cloud

                        No strange control processes/engines

                        Developers just loved working with it!
                                                                              78


Thursday, 26 May 2011
79


Thursday, 26 May 2011
Api
                        Web servers

                                              Solr
                         App server
                                              Solr
                        Memcached
                                              Solr

                         RDBMS        Solr
                                              Solr

                                              Solr
                          CMS

                                             Cloud, EC2

                                                          80




Thursday, 26 May 2011
What about Open In?

                  OPEN IN                  OPEN OUT

                  Bring in data and apps   Enable partners to
                  from the Internet        build applications
                                           using Guardian
                                           content and services
                                           for other platforms


                                                                  81
                                                                  22




Thursday, 26 May 2011
Apps
                                Web servers



                        Proxy
                 App
                                 App server
                 App

                 App            Memcached

                 App
                                 RDBMS
                 App

                 App
                                  CMS
    external hosting
    app engine etc


                                              82




Thursday, 26 May 2011
Core
                                                               Out
                  In
                                  Web servers

                                                           Solr

                          Proxy
                   App
                                       App server
                   App                                     Solr
                                  Memcached
                   App                                     Solr
                   App   CMS                        Solr
                                                           Solr
                   App
                                        rdbms
                                                           Solr
                   App

external hosting                                           Cloud, EC2
app engine etc
                                                                     83




Thursday, 26 May 2011
84


Thursday, 26 May 2011
85


Thursday, 26 May 2011

More Related Content

Similar to Keynote: from publisher to platform, How The Guardian Embraced the Internet using Content, Search, and Open Source - By Stephen Dunn

Panasonic search
Panasonic searchPanasonic search
Panasonic search
AOE
 
Digital tools for professional learning
Digital tools for professional learningDigital tools for professional learning
Digital tools for professional learning
Ingrid Koehler
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
University of Michigan Taubman Health Sciences Library
 
Frontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, ChicagoFrontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, Chicago
Helgi Þormar Þorbjörnsson
 
Digital isn't everything, it's part of the pie
Digital isn't everything, it's part of the pieDigital isn't everything, it's part of the pie
Digital isn't everything, it's part of the pie
Dominique Hind
 
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
The Guardian Open Platform
 
Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla
lucenerevolution
 
Embedding CPython in Solr
Embedding CPython in SolrEmbedding CPython in Solr
Embedding CPython in Solr
Lucidworks (Archived)
 
Kasbank presentatie 205 jaar
Kasbank presentatie 205 jaar Kasbank presentatie 205 jaar
Kasbank presentatie 205 jaar
Vincent Everts
 
ENoLL FAO Workshop Alvaro Oliveira
ENoLL FAO Workshop Alvaro OliveiraENoLL FAO Workshop Alvaro Oliveira
ENoLL FAO Workshop Alvaro Oliveira
European Network of Living Labs (ENoLL)
 
Onde KH? (where to poop?) Pitch Keynote at SWRIO
Onde KH? (where to poop?) Pitch Keynote at SWRIOOnde KH? (where to poop?) Pitch Keynote at SWRIO
Onde KH? (where to poop?) Pitch Keynote at SWRIO
Bruno Marinho
 
Open Data
Open DataOpen Data
Open Data
SEA Tecnologia
 
1110 cpa bayside
1110 cpa bayside1110 cpa bayside
1110 cpa bayside
Mel Kettle
 
Can Media Queries Save Us All?
Can Media Queries Save Us All?Can Media Queries Save Us All?
Can Media Queries Save Us All?
Tim Kadlec
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
ErikMoeller
 
Sharath Bulusu, Guardian News & Media
Sharath Bulusu, Guardian News & MediaSharath Bulusu, Guardian News & Media
Sharath Bulusu, Guardian News & Media
Mashery
 
Andrew Nicklin, NYC DoITT
Andrew Nicklin, NYC DoITTAndrew Nicklin, NYC DoITT
Andrew Nicklin, NYC DoITT
Mashery
 
Networks and online journalism
Networks and online journalismNetworks and online journalism
Networks and online journalism
Paul Bradshaw
 
Bootcamp jan 26
Bootcamp   jan 26Bootcamp   jan 26
Bootcamp jan 26
GOSO
 
eLearning Consortium 2.0i jan 2011 london
eLearning Consortium 2.0i jan 2011 londoneLearning Consortium 2.0i jan 2011 london
eLearning Consortium 2.0i jan 2011 london
Erwin Huang
 

Similar to Keynote: from publisher to platform, How The Guardian Embraced the Internet using Content, Search, and Open Source - By Stephen Dunn (20)

Panasonic search
Panasonic searchPanasonic search
Panasonic search
 
Digital tools for professional learning
Digital tools for professional learningDigital tools for professional learning
Digital tools for professional learning
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
 
Frontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, ChicagoFrontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, Chicago
 
Digital isn't everything, it's part of the pie
Digital isn't everything, it's part of the pieDigital isn't everything, it's part of the pie
Digital isn't everything, it's part of the pie
 
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...
 
Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla Cpython embedded in solr - By Roman Chyla
Cpython embedded in solr - By Roman Chyla
 
Embedding CPython in Solr
Embedding CPython in SolrEmbedding CPython in Solr
Embedding CPython in Solr
 
Kasbank presentatie 205 jaar
Kasbank presentatie 205 jaar Kasbank presentatie 205 jaar
Kasbank presentatie 205 jaar
 
ENoLL FAO Workshop Alvaro Oliveira
ENoLL FAO Workshop Alvaro OliveiraENoLL FAO Workshop Alvaro Oliveira
ENoLL FAO Workshop Alvaro Oliveira
 
Onde KH? (where to poop?) Pitch Keynote at SWRIO
Onde KH? (where to poop?) Pitch Keynote at SWRIOOnde KH? (where to poop?) Pitch Keynote at SWRIO
Onde KH? (where to poop?) Pitch Keynote at SWRIO
 
Open Data
Open DataOpen Data
Open Data
 
1110 cpa bayside
1110 cpa bayside1110 cpa bayside
1110 cpa bayside
 
Can Media Queries Save Us All?
Can Media Queries Save Us All?Can Media Queries Save Us All?
Can Media Queries Save Us All?
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
 
Sharath Bulusu, Guardian News & Media
Sharath Bulusu, Guardian News & MediaSharath Bulusu, Guardian News & Media
Sharath Bulusu, Guardian News & Media
 
Andrew Nicklin, NYC DoITT
Andrew Nicklin, NYC DoITTAndrew Nicklin, NYC DoITT
Andrew Nicklin, NYC DoITT
 
Networks and online journalism
Networks and online journalismNetworks and online journalism
Networks and online journalism
 
Bootcamp jan 26
Bootcamp   jan 26Bootcamp   jan 26
Bootcamp jan 26
 
eLearning Consortium 2.0i jan 2011 london
eLearning Consortium 2.0i jan 2011 londoneLearning Consortium 2.0i jan 2011 london
eLearning Consortium 2.0i jan 2011 london
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
lucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
lucenerevolution
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
lucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
lucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
lucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
lucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
lucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
lucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
lucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
lucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Keynote: from publisher to platform, How The Guardian Embraced the Internet using Content, Search, and Open Source - By Stephen Dunn

  • 1. From publisher to platform: How the Guardian embraced the internet using content, search, and Open Source Stephen Dunn, Guardian News and Media stephen.dunn@guardian.co.uk, 25th May, 2011 Twitter: @cuica, @openplatform Thursday, 26 May 2011
  • 2. 1 From publisher to platform How the Guardian embraced the Internet using content, search, and Open Source Stephen Dunn, Guardian News and Media 2 Thursday, 26 May 2011
  • 3. The publishing era 3 Thursday, 26 May 2011
  • 4. We started a long time ago: Thursday, 26 May 2011
  • 5. Keyword page Live blogs Apps Mobile site Twitter updates Swine flu Comment Content partnerships Newspapers Audio Video Open platform API Thursday, 26 May 2011
  • 6. To secure the financial and editorial To secure the financial and editorial independence independence of the Guardian in perpetuity. To promote freedom in thein perpetuity of the Guardian press and liberal journalism globally. To promote freedom in the press and liberal To become the world's leading liberal voice. journalism globally Thursday, 26 May 2011
  • 7. Open Web Principles 7 Thursday, 26 May 2011
  • 8. 2009 8 Thursday, 26 May 2011
  • 9. 1. Permanent http://www.flickr.com/photos/fstorr/ • “A cool URI is one that does not change” Tim Berners-Lee 1998 • 1.5 million resources redirected to new scheme 9 Thursday, 26 May 2011
  • 10. 2. Addressable ★ Resources are “about” something - ready for the social web. ★ We live in “the age of point-at-things” (Coates 2005) 10 Thursday, 26 May 2011
  • 11. 3. Discoverable ★ Multiple routes to content ★ Tagging drives discovery 11 Thursday, 26 May 2011
  • 12. 4. Open 12 Thursday, 26 May 2011
  • 13. Example: The Hackable Guardian http:// www.guardian.co.uk/.... /technology/internet /rss /technology/all /rss /environment/climatechange +business/globaleconomy/rss Thursday, 26 May 2011
  • 14. Results... 14 Thursday, 26 May 2011
  • 15. Site traffic growth Final Release Unique Users 30,000,000 26,250,000 First release 22,500,000 Unique Users Pre - project 18,750,000 15,000,000 11,250,000 40M 7,500,000 3,750,000 Sep 2005 Oct 2006 Nov 2007 Dec 2008 15 Thursday, 26 May 2011
  • 16. However... 16 Thursday, 26 May 2011
  • 17. 1 Billion+ Internet Users! 17 Thursday, 26 May 2011
  • 21. ...“How I stopped worrying about my website and learned to love the whole internet.” Matt McAlister 21 Thursday, 26 May 2011
  • 22. The Open Strategy OPEN IN OPEN OUT Bring in data and apps Enable partners to from the Internet build applications using Guardian content and services for other platforms 22 Thursday, 26 May 2011
  • 24. "Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.” 24 Thursday, 26 May 2011
  • 34. Jack Shenker “The Guardian alongside Al Jazeera was the one news source that everybody on the streets in Tahrir - not just in Cairo but in surrounding cities and major centers of revolutionary activity - that people were talking about.” 34 Thursday, 26 May 2011
  • 35. The Open Strategy OPEN IN OPEN OUT Bring in data and apps Enable partners to from the Internet build applications using Guardian content and services for other platforms 35 22 Thursday, 26 May 2011
  • 36. The Open Platform 36 Thursday, 26 May 2011
  • 37. The suite of services enabling partners to build applications with the Guardian 37 Thursday, 26 May 2011
  • 38. OPEN IN OPEN OUT Bring in data and apps Enable partners to from the Internet build applications using Guardian content and services for other platforms 38 22 Thursday, 26 May 2011
  • 39. CONTENT API DATA STORE POLITICS API A service for A directory of Open database selecting and useful data of candidates, collecting curated by voting records, content from Guardian constituencies, the Guardian editors election results, for re-use live data on election day Thursday, 26 May 2011
  • 40. Mutualised news! 40 Thursday, 26 May 2011
  • 41. Mutualised news! 41 Thursday, 26 May 2011
  • 42. Mutualised news! 42 Thursday, 26 May 2011
  • 47. DATA STORE A directory of useful data curated by Guardian editors Thursday, 26 May 2011
  • 48. POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day Thursday, 26 May 2011
  • 49. POLITICS API Open database of candidates, voting records, constituencies, election results, live data on election day 49 Thursday, 26 May 2011
  • 50. <OBLIGATORY DOGFOOD SLIDE > 50 Thursday, 26 May 2011
  • 56. Open for Business 56 Thursday, 26 May 2011
  • 57. 3 Tiers of access 3 Revenue models Keyless: Take our headlines. You keep associated revenues. Approved: Take our full article content, but with an advert. Guardian keeps ad revenue, you keep rest-of- page revenue. Bespoke: Take, reformat, augment our content Revenue model to be negotiated. Combination of Media, Fees, Downloads. 57 Thursday, 26 May 2011
  • 59. What this means Open Out: Developers can now access full content APIs on demand with keys post-approved Platform is positioned as a place to do business So rapid scalability, reliability and performance are now core requirements 59 Thursday, 26 May 2011
  • 60. OPEN IN OPEN OUT Bring in data and Allow partners to apps from the build applications internet using Guardian content and services for other platforms Thursday, 26 May 2011
  • 61. Simple REST/HTTP MICROAPPS framework allows lightweight development A framework for integrating 3rd party Applications proxied for applications into performance guardian.co.uk Apps generally hosted in the cloud, allows hot deployment into production 61 Thursday, 26 May 2011
  • 62. MICROAPPS A framework for integrating 3rd party applications into guardian.co.uk 62 Thursday, 26 May 2011
  • 63. • What could I cook? Thursday, 26 May 2011
  • 64. Bringing it together 64 Thursday, 26 May 2011
  • 66. App showcase 66 Thursday, 26 May 2011
  • 67. From publisher to platform Seeking massive growth, but no longer only broadcasting content on the website User/partner engagement & contribution on Journalism data software applications revenue and ads Support developers and partners with data and APIs, need scalability, reliability, speed 67 Thursday, 26 May 2011
  • 68. Evolving the architecture 68 Thursday, 26 May 2011
  • 69. Web server Web server Web server App server App server App server Memcached (added later) Oracle CMS Thursday, 26 May 2011
  • 70. Web server Web server Web server Why RDBMS? App server App server App server 5 years ago, fewer alternatives Memcached Understand operations procedures Can easily recruit DBAs / devs Oracle Developer/ops tools Business critical system: a safe choice CMS Thursday, 26 May 2011
  • 71. Scaling traffic Unique Users 30,000,000 26,250,000 22,500,000 Unique Users 18,750,000 15,000,000 11,250,000 7,500,000 3,750,000 Sep 2005 Sep 2006 Sep 2007 Sep 2008 71 Thursday, 26 May 2011
  • 78. We chose Solr/Lucene Can perform complex queries, including full-text search We can change the schema with no downtime Most queries are of similar cost Scales very well horizontally “Just worked” in the cloud No strange control processes/engines Developers just loved working with it! 78 Thursday, 26 May 2011
  • 80. Api Web servers Solr App server Solr Memcached Solr RDBMS Solr Solr Solr CMS Cloud, EC2 80 Thursday, 26 May 2011
  • 81. What about Open In? OPEN IN OPEN OUT Bring in data and apps Enable partners to from the Internet build applications using Guardian content and services for other platforms 81 22 Thursday, 26 May 2011
  • 82. Apps Web servers Proxy App App server App App Memcached App RDBMS App App CMS external hosting app engine etc 82 Thursday, 26 May 2011
  • 83. Core Out In Web servers Solr Proxy App App server App Solr Memcached App Solr App CMS Solr Solr App rdbms Solr App external hosting Cloud, EC2 app engine etc 83 Thursday, 26 May 2011