Your SlideShare is downloading. ×

Dsp bbc-jem rayfield-semtech2011

4,107

Published on

BBC Dynamic Semantic Publishing. …

BBC Dynamic Semantic Publishing.
Transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. Supporting BBC World Cup 2010, BBC Sport and BBC Olympics 2012 online.

http://www.bbc.co.uk/worldcup/
http://news.bbc.co.uk/sport/
http://www.bbc.co.uk/2012/

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,107
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
140
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • These relationships mean more interesting user journeys, greater link density, and more interesting queries on the data.
  • Demo: GET https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea Accept text/rdf+n3 GET https://api.live.bbc.co.uk/dsp/sport/football/facup
  • Transcript

    • 1. BBC Dynamic Semantic Publishing [DSP] Jem Rayfield : Senior Technical Architect BBC Future Media and Technology
    • 2. BBC News Online BBC World Cup 2010 BBC Sport 2011 BBC Olympics 2012 Outline
    • 3. Radio since 1922 TV Since 1930 Web since 1994
    • 4. http://bbc.co.uk/news online
    • 5. BBC News [Static Publishing]
    • 6. Static News Architecture
    • 7. BBC CPS/CMS Asset Authoring
    • 8. BBC CPS/CMS Index Authoring
    • 9. Static News The Good 1) Simple 2) Scales cheaply 3) Difficult to break [bad rendering logic etc..] 4) Handles high load
    • 10. Static News The BAD
      • Relational taxonomic
      • meta model
      2) Static! Inflexible! SSI! 3) Document publishing 4) Content non re-usable 5) Content non repurpose-able 6) Difficult to personalize 7) Publication per output
    • 11. BBC World Cup 2010 http://bbc.co.uk/worldcup
    • 12.
      • 32 teams, 8 groups, 736 players  776 pages
      • Fixtures & Results, Groups & Teams pages
      • To many web pages for too few journalists
      • Improve the publishing system to help achieve all of this
      World Cup 2010
    • 13. Page Per Player http://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney
    • 14. Page Per Team
    • 15. Page Per Group
    • 16. Semantic publishing USER EXPERIENCE ONTOLOGY TRIPLE STORE
    • 17. Rationale
      • Automated content publishing
      • Huge increase in content breadth (number of manageable pages)
      • Content re-use and re-purposing, increasing reach
      • Simplified content management
      • Journalist headcount reduction
      • Multi-dimensional entry points and semantic navigation
      • Improved user experience with high levels of user engagement
      • Dynamic, state (time|event) and semantic driven page layout
      • Personalized content
      • Open data and API’s
    • 18. Dynamic Semantic Architecture [DSP]
    • 19. API Stack
    • 20. Highly Scalable Clustered BigOWLIM
      • Horizontally scalable
      • No single point of failure
      • Fault tolerant
    • 21. Plenty of Caching
    • 22. Extendable Domain Driven Asset Tagging
    • 23. Open Ontology/Dataset reuse Event | Geonames | Foaf | Etc.
    • 24. World cup ontology
    • 25. Graffiti: Suggest -> Tag [Player]
    • 26. Graffiti: Suggest -> Tag [Location] (Geonames)
    • 27. Tag player Infer team Infer competition  Happy Journalist
    • 28.
      • World Cup statistics
      • 750+ Dynamic aggregations/pages (Player, Squad, Group, etc..)
      • Average unique page requests a day : 2 million +
      • Average BigOWLIM SPARQL queries a day : 1 million
      • 100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. Including sports statistics
      • Multi data center fully resilient, clustered 6 node triple store
    • 29. BBC Sport Online Refresh http://bbc.co.uk/sport
    • 30. Sport Refresh : Stealth Infra upgrade [DSP] http://bbc.co.uk/sport1/hi/football/teams/c/chelsea
    • 31. Content negotiation: json rdf, xml rdf, turtle Publically accessible (with SSL cert) GET /sport/football/teams/<TEAM> Accept: application/rdf+json GET /sport/football/<COMPETITION> Accept: application/rdf+xml GET /assets/<ASSET> Accept: text/rdf+n3 Etc…. REST API
    • 32. <http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> . <http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> . <http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName &quot;Chelsea&quot;^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name &quot;Chelsea&quot; ; domain:shortName &quot;Chelsea&quot;^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> . <http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> . <urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> . <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName &quot;Premier League&quot;^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> . GET Accept text/rdf+n3 https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea
    • 33. GET Accept application/rdf+json https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea { &quot;http://www.chelseafc.com/&quot;:{ &quot;http://www.bbc.co.uk/ontologies/domain/documentType&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/document-types/homepage&quot;, &quot;type&quot;:&quot;uri&quot; }, { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/document-types/external&quot;, &quot;type&quot;:&quot;uri&quot; } ] }, &quot;http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id&quot;:{ &quot;http://www.bbc.co.uk/ontologies/domain/externalId&quot;:[ { &quot;value&quot;:&quot;http://dbpedia.org/resource/Chelsea_F.C.&quot;, &quot;type&quot;:&quot;uri&quot; }, { &quot;value&quot;:&quot;urn:sports-stats:137316635&quot;, &quot;type&quot;:&quot;uri&quot; } ], &quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/ontologies/sport/CompetitiveSportingOrganisation&quot;, &quot;type&quot;:&quot;uri&quot; } ], &quot;http://www.bbc.co.uk/ontologies/domain/name&quot;:[ { &quot;value&quot;:&quot;Chelsea&quot;, &quot;type&quot;:&quot;literal&quot; } ], &quot;http://www.bbc.co.uk/ontologies/sport/competesIn&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id&quot;, &quot;type&quot;:&quot;uri&quot; } ],
    • 34. PHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/) EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC) protected function getOptions() { return array( &quot;config&quot; => array(&quot;usecert&quot; => true), &quot;headers&quot; => array( &quot;Accept&quot; => &quot;application/rdf+json&quot;, &quot;X-Expect&quot; => &quot;http://www.bbc.co.uk/things/platforms/hiweb&quot; ) ); $options = $this->getOptions() $response = $this->get(&quot;https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea&quot;, $options) $this->data = new EasyRdf_Graph(&quot;http://www.bbc.co.uk&quot;, $response->getBody()); $teams = $this->data->allofType(&quot;sport:CompetitiveSportingOrganisation”) PHP->EasyRDF->API
    • 35. But?..... “ Our website is the API ” http://www.bbc.co.uk/programmes/ Program “The Carpenters’ Story” HTML => http://www.bbc.co.uk/programmes/b011rf7f RDF => http://www.bbc.co.uk/programmes/b007cllb.rdf Sport .RDF coming……soon…
    • 36. Augment architecture with a Content Store
      • Atomic content assets stored in MarkLogic XML store
      • XML content queryable via Xquery
      • Content Assets searchable
      • Sports statistics searchable/queryable via XQuery
      • Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic
    • 37. API Stack
    • 38. Ontology aware NLP GATE + Ontotext
    • 39. Euro 2012 Dynamic semantic aggregation pages for 8 Venues 4 Groups 16 Teams 336 Players
    • 40. Olympics 2012 http://www.bbc.co.uk/2012/
    • 41. Olympics 2012 – The requirements
      • Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue  A lot of output…
      • Almost real time statistics and live event pages
      • Time coded, metadata annotated, on demand video, 58,000 hours of content
      • Far too many web pages for far too few journalists
      • DSP annotation architecture to automate content aggregation
    • 42. BBC Sport: http://www.bbc.co.uk/ontologies/sport Open Sport Ontology
    • 43. More…. BBC Open Ontologies Programmes : http://www.bbc.co.uk/ontologies/programmes Wildlife : http://www.bbc.co.uk/ontologies/wildlife/
    • 44.
      • Entire BBC sport site re-engineered and domain modeled using RDF framework
      • Geospatial (GeoSPARQL) powered news aggregations. Stories about London or Berlin…
      • News Event and time based asset aggregations
      • Additional domain modeling and extensions. (Business, wildlife, programmes etc..).
      • Replicated triple store to facilitate a public facing BBC SPARQL endpoint and API
      • SportML and BBC Sport ontology mapping
      Platform future…..
    • 45. Questions? [email_address]

    ×