Dsp bbc-jem rayfield-semtech2011

4,851 views

Published on

BBC Dynamic Semantic Publishing.
Transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. Supporting BBC World Cup 2010, BBC Sport and BBC Olympics 2012 online.

http://www.bbc.co.uk/worldcup/
http://news.bbc.co.uk/sport/
http://www.bbc.co.uk/2012/

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,851
On SlideShare
0
From Embeds
0
Number of Embeds
66
Actions
Shares
0
Downloads
150
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide
  • These relationships mean more interesting user journeys, greater link density, and more interesting queries on the data.
  • Demo: GET https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea Accept text/rdf+n3 GET https://api.live.bbc.co.uk/dsp/sport/football/facup
  • Dsp bbc-jem rayfield-semtech2011

    1. 1. BBC Dynamic Semantic Publishing [DSP] Jem Rayfield : Senior Technical Architect BBC Future Media and Technology
    2. 2. BBC News Online BBC World Cup 2010 BBC Sport 2011 BBC Olympics 2012 Outline
    3. 3. Radio since 1922 TV Since 1930 Web since 1994
    4. 4. http://bbc.co.uk/news online
    5. 5. BBC News [Static Publishing]
    6. 6. Static News Architecture
    7. 7. BBC CPS/CMS Asset Authoring
    8. 8. BBC CPS/CMS Index Authoring
    9. 9. Static News The Good 1) Simple 2) Scales cheaply 3) Difficult to break [bad rendering logic etc..] 4) Handles high load
    10. 10. Static News The BAD <ul><li>Relational taxonomic </li></ul><ul><li>meta model </li></ul>2) Static! Inflexible! SSI! 3) Document publishing 4) Content non re-usable 5) Content non repurpose-able 6) Difficult to personalize 7) Publication per output
    11. 11. BBC World Cup 2010 http://bbc.co.uk/worldcup
    12. 12. <ul><li>32 teams, 8 groups, 736 players  776 pages </li></ul><ul><li>Fixtures & Results, Groups & Teams pages </li></ul><ul><li>To many web pages for too few journalists </li></ul><ul><li>Improve the publishing system to help achieve all of this </li></ul>World Cup 2010
    13. 13. Page Per Player http://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney
    14. 14. Page Per Team
    15. 15. Page Per Group
    16. 16. Semantic publishing USER EXPERIENCE ONTOLOGY TRIPLE STORE
    17. 17. Rationale <ul><li>Automated content publishing </li></ul><ul><li>Huge increase in content breadth (number of manageable pages) </li></ul><ul><li>Content re-use and re-purposing, increasing reach </li></ul><ul><li>Simplified content management </li></ul><ul><li>Journalist headcount reduction </li></ul><ul><li>Multi-dimensional entry points and semantic navigation </li></ul><ul><li>Improved user experience with high levels of user engagement </li></ul><ul><li>Dynamic, state (time|event) and semantic driven page layout </li></ul><ul><li>Personalized content </li></ul><ul><li>Open data and API’s </li></ul>
    18. 18. Dynamic Semantic Architecture [DSP]
    19. 19. API Stack
    20. 20. Highly Scalable Clustered BigOWLIM <ul><li>Horizontally scalable </li></ul><ul><li>No single point of failure </li></ul><ul><li>Fault tolerant </li></ul>
    21. 21. Plenty of Caching
    22. 22. Extendable Domain Driven Asset Tagging
    23. 23. Open Ontology/Dataset reuse Event | Geonames | Foaf | Etc.
    24. 24. World cup ontology
    25. 25. Graffiti: Suggest -> Tag [Player]
    26. 26. Graffiti: Suggest -> Tag [Location] (Geonames)
    27. 27. Tag player Infer team Infer competition  Happy Journalist
    28. 28. <ul><li>World Cup statistics </li></ul><ul><li>750+ Dynamic aggregations/pages (Player, Squad, Group, etc..) </li></ul><ul><li>Average unique page requests a day : 2 million + </li></ul><ul><li>Average BigOWLIM SPARQL queries a day : 1 million </li></ul><ul><li>100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. Including sports statistics </li></ul><ul><li>Multi data center fully resilient, clustered 6 node triple store </li></ul>
    29. 29. BBC Sport Online Refresh http://bbc.co.uk/sport
    30. 30. Sport Refresh : Stealth Infra upgrade [DSP] http://bbc.co.uk/sport1/hi/football/teams/c/chelsea
    31. 31. Content negotiation: json rdf, xml rdf, turtle Publically accessible (with SSL cert) GET /sport/football/teams/<TEAM> Accept: application/rdf+json GET /sport/football/<COMPETITION> Accept: application/rdf+xml GET /assets/<ASSET> Accept: text/rdf+n3 Etc…. REST API
    32. 32. <http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> . <http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> . <http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName &quot;Chelsea&quot;^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name &quot;Chelsea&quot; ; domain:shortName &quot;Chelsea&quot;^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> . <http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> . <urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> . <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName &quot;Premier League&quot;^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> . GET Accept text/rdf+n3 https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea
    33. 33. GET Accept application/rdf+json https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea { &quot;http://www.chelseafc.com/&quot;:{ &quot;http://www.bbc.co.uk/ontologies/domain/documentType&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/document-types/homepage&quot;, &quot;type&quot;:&quot;uri&quot; }, { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/document-types/external&quot;, &quot;type&quot;:&quot;uri&quot; } ] }, &quot;http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id&quot;:{ &quot;http://www.bbc.co.uk/ontologies/domain/externalId&quot;:[ { &quot;value&quot;:&quot;http://dbpedia.org/resource/Chelsea_F.C.&quot;, &quot;type&quot;:&quot;uri&quot; }, { &quot;value&quot;:&quot;urn:sports-stats:137316635&quot;, &quot;type&quot;:&quot;uri&quot; } ], &quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/ontologies/sport/CompetitiveSportingOrganisation&quot;, &quot;type&quot;:&quot;uri&quot; } ], &quot;http://www.bbc.co.uk/ontologies/domain/name&quot;:[ { &quot;value&quot;:&quot;Chelsea&quot;, &quot;type&quot;:&quot;literal&quot; } ], &quot;http://www.bbc.co.uk/ontologies/sport/competesIn&quot;:[ { &quot;value&quot;:&quot;http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id&quot;, &quot;type&quot;:&quot;uri&quot; } ],
    34. 34. PHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/) EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC) protected function getOptions() { return array( &quot;config&quot; => array(&quot;usecert&quot; => true), &quot;headers&quot; => array( &quot;Accept&quot; => &quot;application/rdf+json&quot;, &quot;X-Expect&quot; => &quot;http://www.bbc.co.uk/things/platforms/hiweb&quot; ) ); $options = $this->getOptions() $response = $this->get(&quot;https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea&quot;, $options) $this->data = new EasyRdf_Graph(&quot;http://www.bbc.co.uk&quot;, $response->getBody()); $teams = $this->data->allofType(&quot;sport:CompetitiveSportingOrganisation”) PHP->EasyRDF->API
    35. 35. But?..... “ Our website is the API ” http://www.bbc.co.uk/programmes/ Program “The Carpenters’ Story” HTML => http://www.bbc.co.uk/programmes/b011rf7f RDF => http://www.bbc.co.uk/programmes/b007cllb.rdf Sport .RDF coming……soon…
    36. 36. Augment architecture with a Content Store <ul><li>Atomic content assets stored in MarkLogic XML store </li></ul><ul><li>XML content queryable via Xquery </li></ul><ul><li>Content Assets searchable </li></ul><ul><li>Sports statistics searchable/queryable via XQuery </li></ul><ul><li>Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic </li></ul>
    37. 37. API Stack
    38. 38. Ontology aware NLP GATE + Ontotext
    39. 39. Euro 2012 Dynamic semantic aggregation pages for 8 Venues 4 Groups 16 Teams 336 Players
    40. 40. Olympics 2012 http://www.bbc.co.uk/2012/
    41. 41. Olympics 2012 – The requirements <ul><li>Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue  A lot of output… </li></ul><ul><li>Almost real time statistics and live event pages </li></ul><ul><li>Time coded, metadata annotated, on demand video, 58,000 hours of content </li></ul><ul><li>Far too many web pages for far too few journalists </li></ul><ul><li>DSP annotation architecture to automate content aggregation </li></ul>
    42. 42. BBC Sport: http://www.bbc.co.uk/ontologies/sport Open Sport Ontology
    43. 43. More…. BBC Open Ontologies Programmes : http://www.bbc.co.uk/ontologies/programmes Wildlife : http://www.bbc.co.uk/ontologies/wildlife/
    44. 44. <ul><li>Entire BBC sport site re-engineered and domain modeled using RDF framework </li></ul><ul><li>Geospatial (GeoSPARQL) powered news aggregations. Stories about London or Berlin… </li></ul><ul><li>News Event and time based asset aggregations </li></ul><ul><li>Additional domain modeling and extensions. (Business, wildlife, programmes etc..). </li></ul><ul><li>Replicated triple store to facilitate a public facing BBC SPARQL endpoint and API </li></ul><ul><li>SportML and BBC Sport ontology mapping </li></ul>Platform future…..
    45. 45. Questions? [email_address]

    ×