The Guardian Open Platform Content API: Implementation

The Guardian Open Platform
The Guardian Open PlatformThe Guardian Open Platform
Solr in the Wild:
The Guardian’s
Open Platform
 Content API
    Graham Tackley
    guardian.co.uk
                      1
Guardian journalism online: 1995
Guardian journalism online: 1999
Guardian journalism online: 2000
Guardian journalism online: 2010
• Content API
      • MicroApp Framework
      • Politics API
      • Data Store
http://www.guardian.co.uk/open-platform
• Content API
      • MicroApp Framework
      • Politics API
      • Data Store
http://www.guardian.co.uk/open-platform
• Content API
      •                               pis.com
        MicroApp Framework
                              ard iana
        Politics API ten
      •ttp://con         t.gu
      h Data Store
      •
http://www.guardian.co.uk/open-platform
http://content.guardianapis.com
The Guardian Open Platform Content API: Implementation
http://content.guardianapis.com/search.json?q=prague%20beer&order-
by=relevance&show-fields=all&show-tags=all
http://content.guardianapis.com/search.json?q=prague%20beer&order-
by=relevance&show-fields=all&show-tags=all&api-key=eurocon2010
http://content.guardianapis.com/search.json?q=prague
%20beer&order-by=relevance&show-refinements=all
Implementation

• Traffic patterns much less predictable than
  a web site
• Need to easily scale on demand...
• ... and never take down guardian.co.uk due
  to API traffic
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms




      CMS
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms        Content API



      CMS
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms        Content API



      CMS
Why Solr?
• Database could not cope...
• ... and far too expensive to scale
• Solr ...
• ... was easy for developers to understand
• ... has a great replication model
• ... is simple to install
Core

  Web servers


   App server


Memcached (20Gb)




      CMS
Core

  Web servers


   App server


Memcached (20Gb)


                Solr Master


                   Indexer
      CMS
Core
                                               Api
  Web servers
                                             Solr & Api
   App server
                                             Solr & Api
Memcached (20Gb)




                              Replication
                                             Solr & Api




                                 Solr
                Solr Master
                                             Solr & Api


                   Indexer                   Solr & Api

      CMS
                                            Cloud, EC2
The Guardian Open Platform Content API: Implementation
The Guardian Open Platform Content API: Implementation
n
otl y
Solr Schema


• 350+ tables in database schema
Content fields are just fields...
The Guardian Open Platform Content API: Implementation
Tags
Tags




Factbox
Tags




        Factbox

Media
Keywor                         Article
  d

Contributor                     Video


 Series       Tags   Content   Audio


Publication                    Gallery



  Tone                         Cartoon
... tags ...
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
tag-external-names: [ Eyewitness, Formula One, Monaco, ...]
... tags ...
     record-type: content
     id: world/picture/2010/may/14/formula-one-monaco
     tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
     tag-external-names: [ Eyewitness, Formula One, Monaco, ...]




record-type: tag
id: world/series/eyewitness
section-name: World news
web-title: Eyewitness
type: series
internal-name: Eyewitness (centespread
 photo series)
... tags ...
     record-type: content
     id: world/picture/2010/may/14/formula-one-monaco
     tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
     tag-external-names: [ Eyewitness, Formula One, Monaco, ...]




record-type: tag
id: world/series/eyewitness
section-name: World news
web-title: Eyewitness                       Included in search
type: series
internal-name: Eyewitness (centespread         stored=false
 photo series)
... factboxes ...
... factboxes ...




record-type: content
id: world/picture/2010/may/14/formula-one-monaco
factbox-data: [ 197544~|~~|~photography-tip~|~ ]
fact-data: [ 197544~|~pro-tip~|~The photographer has framed the cars between the
boats and spectators and played with the scales of the components of the scene ]
fact-value: [ The photographer has framed the cars between the boats and spectators
and played with the scales of the components of the scene ]
... media ...
... media
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
media-asset-ids: [ PICTURE|362634152|IMAGE|362629791, ...]
... media
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
media-asset-ids: [ PICTURE|362634152|IMAGE|362629791, ...]




record-type: media
id: PICTURE|362634152|IMAGE|362629791
credit: Mark Thompson/Getty Images
width: 1024
height: 768
path: /sys-images/Guardian/About/General/2010/5/14/1273823813621/66-lap-
Monaco-grand-prix-002.jpg
The Code

• Written in Scala
• Uses SolrJ
• Plan to open source in the new few months
The Code
The Code
The Guardian Open Platform Content API: Implementation
Creating the Index

• Existing search index takes 20 hours to
  build
• Solr index takes 1 hour
• Here’s how...
1.1 million+ items of content in the database
1.1 million+ items of content in the database




                Split into Batches
SELECT id FROM (
  SELECT id, ROWNUM rownumber FROM
  content_live ORDER BY id )
WHERE MOD(rownumber, 10000) = 0
1.1 million+ items of content in the database




                Split into Batches
SELECT id FROM (
  SELECT id, ROWNUM rownumber FROM
  content_live ORDER BY id )
WHERE MOD(rownumber, 10000) = 0
1.1 million+ items of content in the database




                                                      Actor 1



                                                      Actor 2



Each actor:                                           Actor 3
1. reads data from database
2. builds solr input document
                                                      Actor 4
3. submits to solr
1.1 million+ items of content in the database




                                                      Actor 1



                                                      Actor 2



Each actor:                                           Actor 3
1. reads data from database
2. builds solr input document
                                                      Actor 4
3. submits to solr
Summary

• Solr made free access to our content API
  possible
• Replication rocks for scaling
• Solr just works for us (thank you!)
• NoSQL really isn’t that scary
• http://guardian.co.uk/open-platform
   • http://content.guardianapis.com
graham.tackley@guardian.co.uk · @tackers
                                           37
1 of 51

Recommended

Why we chose mongodb for guardian.co.uk by
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
35.3K views49 slides
Ruby on Rails workshop for beginner by
Ruby on Rails workshop for beginnerRuby on Rails workshop for beginner
Ruby on Rails workshop for beginnerUmair Amjad
2.3K views109 slides
Project Fedena and Why Ruby on Rails - ArvindArvind G S by
Project Fedena and Why Ruby on Rails - ArvindArvind G SProject Fedena and Why Ruby on Rails - ArvindArvind G S
Project Fedena and Why Ruby on Rails - ArvindArvind G SThoughtWorks
3.1K views17 slides
Intro to Ruby on Rails by
Intro to Ruby on RailsIntro to Ruby on Rails
Intro to Ruby on RailsMark Menard
1.1K views48 slides
Services inception in Ruby by
Services inception in RubyServices inception in Ruby
Services inception in RubyDave McCrory
3.7K views66 slides
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB by
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB MongoDB
6.8K views64 slides

More Related Content

What's hot

Rails On Spring by
Rails On SpringRails On Spring
Rails On Springswamy g
1.9K views43 slides
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔... by
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...Amazon Web Services Korea
1.4K views42 slides
Introduction to Ruby on Rails by
Introduction to Ruby on RailsIntroduction to Ruby on Rails
Introduction to Ruby on RailsManoj Kumar
14.9K views42 slides
Advanced Container Management and Scheduling by
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and SchedulingAmazon Web Services
305 views60 slides
Building Global Serverless Backends by
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless BackendsAmazon Web Services
304 views62 slides
Deep Dive into AWS Fargate by
Deep Dive into AWS FargateDeep Dive into AWS Fargate
Deep Dive into AWS FargateAmazon Web Services
6.9K views62 slides

What's hot(20)

Rails On Spring by swamy g
Rails On SpringRails On Spring
Rails On Spring
swamy g1.9K views
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔... by Amazon Web Services Korea
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
Introduction to Ruby on Rails by Manoj Kumar
Introduction to Ruby on RailsIntroduction to Ruby on Rails
Introduction to Ruby on Rails
Manoj Kumar14.9K views
Apache Camel v3, Camel K and Camel Quarkus by Claus Ibsen
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
Claus Ibsen3.6K views
Apache Jackrabbit Oak on MongoDB by MongoDB
Apache Jackrabbit Oak on MongoDBApache Jackrabbit Oak on MongoDB
Apache Jackrabbit Oak on MongoDB
MongoDB10.7K views
Scaling with swagger by Tony Tam
Scaling with swaggerScaling with swagger
Scaling with swagger
Tony Tam6.2K views
Effectively Deploying MongoDB on AEM by Norberto Leite
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
Norberto Leite1.3K views
Web Clients for Ruby and What they should be in the future by Toru Kawamura
Web Clients for Ruby and What they should be in the futureWeb Clients for Ruby and What they should be in the future
Web Clients for Ruby and What they should be in the future
Toru Kawamura11K views
How Shopify Scales Rails by jduff
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
jduff20.9K views
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A... by Amazon Web Services Korea
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
AEM WITH MONGODB by Nate Nelson
AEM WITH MONGODBAEM WITH MONGODB
AEM WITH MONGODB
Nate Nelson1.9K views
Padrino - the Godfather of Sinatra by Stoyan Zhekov
Padrino - the Godfather of SinatraPadrino - the Godfather of Sinatra
Padrino - the Godfather of Sinatra
Stoyan Zhekov7.5K views
Melbourne User Group OAK and MongoDB by Yuval Ararat
Melbourne User Group OAK and MongoDBMelbourne User Group OAK and MongoDB
Melbourne User Group OAK and MongoDB
Yuval Ararat1.4K views

Similar to The Guardian Open Platform Content API: Implementation

Kandroid for nhn_deview_20131013_v5_final by
Kandroid for nhn_deview_20131013_v5_finalKandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_finalNAVER D2
2.9K views52 slides
Infrastructure Automation with Chef by
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with ChefAdam Jacob
18.4K views126 slides
Rails and the Apache SOLR Search Engine by
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
3.4K views42 slides
How bol.com makes sense of its logs, using the Elastic technology stack. by
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
1.1K views17 slides
Broadcast Music Inc - Release Automation Rockstars! by
Broadcast Music Inc - Release Automation Rockstars!Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!ghodgkinson
659 views49 slides
Red Hat OpenShift Operators - Operators ABC by
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRobert Bohne
1.8K views46 slides

Similar to The Guardian Open Platform Content API: Implementation(20)

Kandroid for nhn_deview_20131013_v5_final by NAVER D2
Kandroid for nhn_deview_20131013_v5_finalKandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_final
NAVER D22.9K views
Infrastructure Automation with Chef by Adam Jacob
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with Chef
Adam Jacob18.4K views
Rails and the Apache SOLR Search Engine by David Keener
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
David Keener3.4K views
How bol.com makes sense of its logs, using the Elastic technology stack. by Renzo Tomà
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
Renzo Tomà1.1K views
Broadcast Music Inc - Release Automation Rockstars! by ghodgkinson
Broadcast Music Inc - Release Automation Rockstars!Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!
ghodgkinson659 views
Red Hat OpenShift Operators - Operators ABC by Robert Bohne
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABC
Robert Bohne1.8K views
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ... by CODE BLUE
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
CODE BLUE2.2K views
Design & Deploy a data-driven Web API in 2 hours by Restlet
Design & Deploy a data-driven Web API in 2 hoursDesign & Deploy a data-driven Web API in 2 hours
Design & Deploy a data-driven Web API in 2 hours
Restlet2.9K views
Apache Solr - search for everyone! by Jaran Flaath
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
Jaran Flaath1.6K views
DockerFinder: Multi-attribute search of Docker images by Davide Neri
DockerFinder: Multi-attribute search of Docker imagesDockerFinder: Multi-attribute search of Docker images
DockerFinder: Multi-attribute search of Docker images
Davide Neri536 views
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation) by contest-theta360
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
contest-theta3601.3K views
Lessons learned while building Omroep.nl by tieleman
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
tieleman840 views
AngularJS 1.x - your first application (problems and solutions) by Igor Talevski
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)
Igor Talevski570 views
A Framework Driven Development by 정민 안
A Framework Driven DevelopmentA Framework Driven Development
A Framework Driven Development
정민 안2.9K views
Jornadas gvSIG 2009 WSS English by sabueso81
Jornadas gvSIG 2009 WSS EnglishJornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS English
sabueso81290 views
Lessons learned while building Omroep.nl by bartzon
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
bartzon376 views

Recently uploaded

Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueShapeBlue
222 views7 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
106 views12 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
119 views17 slides
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...ShapeBlue
184 views12 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
160 views32 slides

Recently uploaded(20)

Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue222 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue106 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue119 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue184 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson160 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue297 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue152 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue180 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue161 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue166 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software176 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 views

The Guardian Open Platform Content API: Implementation

Editor's Notes

  1. As Stephen said: Very basic links to interesting content
  2. Note the registration paywall
  3. Broadcast, stories, basic community Rebuild started in 2005
  4. “Web 2.0”, community, (full fat) RSS, discoverability, tagging. Where do we go from here? Other newspaper sites - looking to restrict access to content via paywalls etc - we’re looking to open up
  5. We’ve spent the last 12 months experimenting around open distribution and open partnerships - 4 initiatives make up the open platform (right now) (As stephen said)
  6. This talk focuses on the content API - provides a way for others to re-present our content in their applications
  7. http://content.guardianapis.com
  8. http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance (most users want most recent content, so default ordering is newest) This is just a dismax search
  9. Can also retrieve extra metadata, including tags http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-fields=all&show-tags=all
  10. If you have an API key can get full content. (You need to apply for this and agree to some T&Cs - mostly to ensure that we can take down content for legal reasons.) This example key is only valid for this conference, will be disabled afterwards :) http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-fields=all&show-tags=all&api-key=eurocon2010
  11. Refinements give the ability to narrow down your result set (ofc these are just solr facets) http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-refinements=all
  12. Our current architecture - perhaps we could feed the content api off the database?
  13. Our current architecture - perhaps we could feed the content api off the database?
  14. time to developer understanding: about 2 hours
  15. currently rebuild every night, incrementals during the day [next] expose solr master to EC2, create hosts in EC2 that replicate using solr replication - works fantastically. 6GB index size. Load-balancer config. We use solr.war from 1.4 dist totally unchanged - run api webapp in same jetty container
  16. currently rebuild every night, incrementals during the day [next] expose solr master to EC2, create hosts in EC2 that replicate using solr replication - works fantastically. 6GB index size. Load-balancer config. We use solr.war from 1.4 dist totally unchanged - run api webapp in same jetty container
  17. Lots of talk nowadays on “no sql” solutions
  18. No. Designed a new logo that better reflects where we currently are
  19. disclaimer: the next slides describe how *we* did it; not necessarily best practice! We took the opportunity to simplify our domain model....
  20. Content fields are just fields But also need to map tags, media, and factboxes
  21. Here’s how we model tags & content
  22. Fact boxes associate arbitary information with content We need to search them, but 1-to-1 relationship with content So no separate record
  23. Fact boxes associate arbitary information with content We need to search them, but 1-to-1 relationship with content So no separate record
  24. show-media allows access to the non-text assets of an item of content
  25. Code mostly just takes input params, converts to solr query, and transforms result to json or xml I’m not here to talk about scala, but here’s a quick couple of snippets
  26. RichSolrDocument makes SolrDocument more “scala” ish
  27. Scala can make writing understandable code much easier
  28. Supporting auto scaling in EC2 - our base images all have empty index (EC2 load balance is configured to check this url & add server to list on 200 response)
  29. Thanks to Grant Ingersoll from Lucid Imagination for guiding us down this route (were planning to do something much more complicated), Also thanks to Francis Rhys-Jones to actually implementing this This is game changing - suddenly we’re prepared to change the index -- and NoSQL solutions seem a whole lot less scary: we migrate our entire database every night!
  30. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  31. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  32. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  33. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  34. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  35. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  36. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  37. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  38. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  39. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  40. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  41. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  42. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  43. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  44. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  45. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  46. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  47. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  48. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  49. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  50. All we wanted was a search engine... but actually we got an easy to work with, fast, scalable NoSQL solution!
  51. All we wanted was a search engine... but actually we got an easy to work with, fast, scalable NoSQL solution!