Successfully reported this slideshow.

Mark logic user-group-2012

3,377 views

Published on

Sports Refresh + Olympics
Dynamic Semantic Publishing

MarkLogic User Group Presentation

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

Mark logic user-group-2012

  1. 1. BBC Dynamic Semantic Publishing [DSP] MarkLogic User Group July 2012 • Jem Rayfield : Lead Technical Architect • BBC Future MediaFuture Media © BBC MMXII
  2. 2. Outline BBC News Online BBC World Cup 2010 BBC Sport 2012 + Olympics BBC News MobileFuture Media © BBC MMXII
  3. 3. Radio since 1922 TV Since 1930 Web since 1994 Future Media © BBC MMXII
  4. 4. onlinehttp://bbc.co.uk/news Future Media © BBC MMXII
  5. 5. BBC News [Static Publishing]Future Media © BBC MMXII
  6. 6. Static News ArchitectureFuture Media © BBC MMXII
  7. 7. BBCCPS/CMSAssetAuthoringFuture Media © BBC MMXII
  8. 8. BBCCPS/CMSIndexAuthoringFuture Media © BBC MMXII
  9. 9. Static NewsThe Good 1) Simple 2) Scales cheaply 3) Difficult to break [bad rendering logic etc..] 4) Handles high loadFuture Media © BBC MMXII
  10. 10. Static NewsThe BAD 1) Relational taxonomic meta model 2) Static! Inflexible! SSI! 3) Document publishing 4) Content non re-usable 5) Content non repurpose-able 6) Difficult to personalize 7) Publication per outputFuture Media © BBC MMXII
  11. 11. BBC World Cup 2010http://bbc.co.uk/worldcup Future Media © BBC MMXII
  12. 12. World Cup 20102. 32 teams, 8 groups, 736 players  776 pages4. Fixtures & Results, Groups & Teams pages6. To many web pages for too few journalists8. Improve the publishing system to help achieve all of this Future Media © BBC MMXII
  13. 13. Page Per Playerhttp://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney Future Media © BBC MMXII
  14. 14. PagePerTeam Future Media © BBC MMXII
  15. 15. PagePerGroup Future Media © BBC MMXII
  16. 16. Open Sport OntologyBBC Sport: http://www.bbc.co.uk/ontologies/sport Future Media © BBC MMXII
  17. 17. Semantic publishing TRIPLE STORE ONTOLOGY USER EXPERIENCEFuture Media © BBC MMXII
  18. 18. Graffiti: Suggest -> Tag [Player]Future Media © BBC MMXII
  19. 19. Graffiti: Suggest -> Tag [Location] (Geonames) Future Media © BBC MMXII
  20. 20. Graffiti Demo… (maybe video, depending on wifi… )Journalism © BBC MMIX
  21. 21. Rationale • Automated content publishing • Huge increase in content breadth (number of manageable pages) • Content re-use and re-purposing, increasing reach • Simplified content management • Journalist headcount reduction • Multi-dimensional entry points and semantic navigation • Improved user experience with high levels of user engagement • Dynamic, state (time|event) and semantic driven page layout • Personalized content aggregations • Open data and API’sFuture Media © BBC MMXII
  22. 22. World Cup DSP ArchitectureFuture Media © BBC MMXII
  23. 23. APIStackFuture Media © BBC MMXII
  24. 24. Highly Scalable Clustered BigOWLIMJournalism © BBC MMIX
  25. 25. ExtendableDomain DrivenAssetTaggingJournalism © BBC MMIX
  26. 26. Open Ontology/Dataset reuseEvent | Geonames | Foaf | Etc.Journalism © BBC MMIX
  27. 27. Infer… player->team->competitionJournalism © BBC MMIX
  28. 28. REST API Content negotiation: json rdf, xml rdf, turtle Publically accessible (with SSL cert) GET /sport/football/teams/<TEAM> Accept: application/rdf+json GET /sport/football/<COMPETITION> Accept: application/rdf+xml GET /assets/<ASSET> Accept: text/rdf+n3Journalism Etc…. © BBC MMIX
  29. 29. GET Accept text/rdf+n3https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea <http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> . <http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> . <http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName "Chelsea"^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name "Chelsea" ; domain:shortName "Chelsea"^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> . <http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> . <urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> . <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName "Premier League"^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> . Journalism © BBC MMIX
  30. 30. PHP->EasyRDF->API PHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/) EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC) protected function getOptions() { return array( "config" => array("usecert" => true), "headers" => array( "Accept" => "application/rdf+json", "X-Expect" => "http://www.bbc.co.uk/things/platforms/hiweb" ) ); $options = $this->getOptions() $response = $this->get("https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea", $options) $this->data = new EasyRdf_Graph("http://www.bbc.co.uk", $response->getBody()); $teams = $this->data->allofType("sport:CompetitiveSportingOrganisation”)Journalism © BBC MMIX
  31. 31. World Cup statistics the GOOD• 750+ Dynamic aggregations/pages (Player, Squad, Group, etc..)• Average unique page requests a day : 2 million +• Average OWLIM SPARQL queries a day : 1 million• 100s RDF statement updates/inserts per minute with full OWL reasoningand associated inference.• Multi data center fully resilient, clustered 6 node triple store• RDF graph model ideally suited to model domain representations such assport Future Media © BBC MMXII
  32. 32. World Cup statistics the BAD• Sports stories and indices static• Sport content not responsive or personalized• RDF Store unable to handle thousands of statistic updates a second• RDF Store forward-chained closures expensive increase write latency• RDF graph model and SPARQL not ideally suited to the BBC’s News andSport document publication model Future Media © BBC MMXII
  33. 33. BBC Sport 2012; Online Refreshhttp://bbc.co.uk/sport Future Media © BBC MMXII
  34. 34. Sport Refresh 2012• Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue, Page per team  A lot of output…• Almost real time statistics and live event pages• Time coded, metadata annotated, on demand video, 58,000 hours of content• Far too many web pages for far too few journalists• DSP annotation architecture to automate content aggregation Future Media © BBC MMXII
  35. 35. Sport Refresh 2012; Red -> Bright Yellow Future Media © BBC MMXII
  36. 36. 10000+ Dynamic Aggregations Future Media © BBC MMXII
  37. 37. Lots of Dynamic (Live) sports stats Future Media © BBC MMXII
  38. 38. Dynamic Navigation Future Media © BBC MMXII
  39. 39. Static Stories +Dynamic includes Future Media © BBC MMXII
  40. 40. Olympics - 27 Live Video SteamsLive Stats overlaysStats -> Ontology driven aggregations Future Media © BBC MMXII
  41. 41. XML powered video chapter pointsPage URLhttp://www.test.bbc.co.uk/sport/olympics/2012/live-video/p00g2lqpXml api to get video log eventshttps://api.test.bbc.co.uk/olympicdata/bdf-log/pid/p00g2lqp Future Media © BBC MMXII
  42. 42. Augment architecture with a Content Store2. Atomic content assets stored in MarkLogic XML store4. XML content queryable via Xquery6. Content Assets searchable8. Sports statistics searchable/queryable via XQuery10. Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic Future Media © BBC MMXII
  43. 43. StaticSports stats(OLD) Future Media © BBC MMXII
  44. 44. DynamicSports stats Future Media © BBC MMXII
  45. 45. Olympics XML etc.. Future Media © BBC MMXII
  46. 46. Ontology Aware NLP• Information Workbench• OWLIM• (Spice) GATE+Ontotext Future Media © BBC MMXII
  47. 47. Extraction Highlights Ex-England ? Roy Hodgson: boss Sven- coach Generic ? Roy Hodgson: Goran Eriksson says a "smear hockey player Analysis ? ………. campaign" has … Update CES APP been aimed at Roy Hodgson KB Gazetteer for omitting Rio Ferdinand. … …V Sven-Goran V Rio Ferdinand V Roy Hodgson: -……. …Eriksson-……. - ………. coach -Roy Hodgson: … OWLIM- ………. hockey player - ………. Disambiguation Retrain & … Adapt … 1. Eriksson (78%) … 2. Roy Hodgson (69%) Relevance 3. Rio Ferdinand (58%) Curate Ranking 4. … Future Media © BBC MMXII
  48. 48. Disambiguation of Locations•Geospatial distance - a feature of OWLIM•Super region – GeoNames hierarchy and containment relations, e.g.parentFeature•RDF Rank•Human approval score (on the basis of curated documents)•Class/code based priority – fine grained ontology may allow a rule or machinelearning prioritization of classes and entities based on learning we already have.•Asset geo association - some entities could be disambiguated by using theasset domain association. BBC UK local sports is more likely to talk about nationalentities. Future Media © BBC MMXII
  49. 49. Entity Relevance: Objective• Rank entities by their relatedness to the article• Accuracy 75%• We consider various frequencies of entity mentions in the article and in the entire set of articles• Positions in the article fields or in the first paragraphs of the body boost the relevance Future Media © BBC MMXII
  50. 50. DSPArchitecture Future Media © BBC MMXII
  51. 51. APIStackMarkLogic!! Future Media © BBC MMXII
  52. 52. “You could run Nasa on that”Lee Pollington Future Media © BBC MMXII
  53. 53. Xquery master<->master repFuture Media © BBC MMXII
  54. 54. MarkLogic 5 & XA Tx (just… thanks John Snelson)Future Media © BBC MMXII
  55. 55. Plenty ofCachingFuture Media © BBC MMXII
  56. 56. Sport Stats REST API Future Media © BBC MMXII
  57. 57. Sport Stats REST API Future Media © BBC MMXII
  58. 58. Sport Stats REST API Demo…. Future Media © BBC MMXII
  59. 59. Olympics API (RDF) /tripod2012 /athletes /{uid} /countries /{country} /countries-iso /{iso} /sports /stories /{discipline} /{discipline}/events /{discipline}/events/stories /{discipline}/events/{event} /metadata /disciplines /onestowatch /countries/{countryUrlName} /london2012 /sports/{disciplineUrlName} /sports/{disciplineUrlName}/events/{eventUrlName} /podium/events /{rscCode} /record/events /venues /{urlName} Future Media © BBC MMXII
  60. 60. @prefix domain: <http://www.bbc.co.uk/ontologies/domain/> .Olympics API (RDF) @prefix sesame: <http://www.openrdf.org/schema/sesame#> . @prefix owlim: <http://www.ontotext.com/> . @prefix oly: <http://www.bbc.co.uk/ontologies/2012olympics/> . @prefix par: <http://purl.org/vocab/participation/schema#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . <http://www.bbc.co.uk/things/82f5db84-0591-49ee-b6f4-a1d26e9381fb#id> a sport:Person ; rdfs:label "Usain Bolt"^^xsd:string , "Bolt Usain-athletics-jam-1986-08-21"^^xsd:string ; foaf:name "Usain Bolt"^^xsd:string , "Bolt Usain-athletics-jam-1986-08-21"^^xsd:string ; domain:canonicalName "Bolt Usain-athletics-jam-1986-08-21"^^xsd:string ; foaf:givenName "Usain"^^xsd:string ; foaf:familyName "Bolt"^^xsd:string ; domain:name "Usain Bolt"^^xsd:string ; oly:dateOfBirth "1986-08-21"^^xsd:date ; oly:gender "M"^^xsd:string ; oly:height "195.0"^^xsd:float ; oly:weight "94.0"^^xsd:float ; oly:worldOlympicDream "true"^^xsd:boolean ; sport:discipline <http://www.bbc.co.uk/things/b3a086df-ab42-2b44-be8b-76b600bfcdce#id> ; sport:competesIn <http://www.bbc.co.uk/things/1b499a08-4f02-4196-aa6c-c43ea353138b#id> . <http://www.bbc.co.uk/things/b3a086df-ab42-2b44-be8b-76b600bfcdce#id> a sport:SportsDiscipline ; domain:name "Athletics"^^xsd:string ; domain:document <http://www.bbc.co.uk/sport/olympics/2012/sports/athletics> . <http://www.bbc.co.uk/things/1b499a08-4f02-4196-aa6c-c43ea353138b#id> a sport:MedalCompetition ; domain:name "Mens 100m"^^xsd:string ; domain:shortName "Mens 100m"^^xsd:string ; domain:document <http://www.bbc.co.uk/sport/olympics/2012/sports/athletics/events/mens-100m> ; oly:measurementType <http://www.bbc.co.uk/things/measurement-types/time> ; domain:externalId <urn:ioc2012:ATM001000> . <http://www.bbc.co.uk/things/903ef380-bdae-4a45-9a8b-5e5a270a7d6c#id> oly:oneToWatch <http://www.bbc.co.uk/things/82f5db84-0591-49ee-b6f4-a1d26e9381fb#id> . <http://news.bbc.co.uk/sport1/hi/athletics/16554814.stm#asset> tag:tag <http://www.bbc.co.uk/things/a50dc8ba-947e-4856-8eb0-1cdbbf208ef7#thing> ; dc:title "Event Guide: ATHLETICS"^^xsd:string ; asset:storyType <http://www.bbc.co.uk/things/story-types/profile> ; domain:document <http://news.bbc.co.uk/sport1/mobile/athletics/16554814.stm> . <http://www.bbc.co.uk/things/a50dc8ba-947e-4856-8eb0-1cdbbf208ef7#thing> tag:taggedWithTag <http://www.bbc.co.uk/things/b3a086df-ab42-2b44-be8b-76b600bfcdce#id> . <http://news.bbc.co.uk/sport1/mobile/athletics/16554814.stm> domain:platform <http://www.bbc.co.uk/things/platforms/mobile> . Future Media © BBC MMXII
  61. 61. Olympics API (XML) /olympicdata/ /athletes /simulator/{scenarioName} /{guid} /bdf-log /pid/{pid} /pid/{pid}/chapter-points /{logId} /chapter-points /pid/{pid} /{logId} /days-to-go /live-text /assets /assets/{id} /medallists /athletes/{guid} /{medalGroup} /medals /athletes/{athleteGuid} /countries/{country} /medaltable /countries/{country} /disciplines/medals /disciplines/{rsc} /overall /sportcontent /obsvideosessions /full /update /podium /countries/{country} /disciplines/{rsccode} /events/{rsccode} Future Media /latest © BBC MMXII
  62. 62. Olympics API (XML) /olympicdata/ /pulse /beat /beats /records /athletes/{guid} /events/{rsc} /results /athletes/{guid} /schedule /detail/days/{date} /detail/disciplines-code/{rsccode} /detail/disciplines-code/{rsccode}/events/{eventrsccode} /detail/disciplines-code/{rsc}/days/{date} /detail/disciplines/{rsc}/days/{date} /detail/disciplines/{urlname} /detail/disciplines/{urlname}/days/{date} /detail/disciplines/{urlname}/events/{eventrsccode} /overview/days /overview/disciplines /overview/disciplines/{rsc} /stats /count/{directory:.+} /sessionCode/{sessionCode} /simulator/{scenarioName} /{documentSerial} /team /{odfid} /unit-status /{sessionCode} /video /days/{date} /videosessionid/{videoSessionId} /{pid} Future Media © BBC MMXII
  63. 63. Olympics API (XML - Medallist)https://api.int.bbc.co.uk/olympicdata/public/medallists/men<?xml version="1.0" encoding="UTF-8"?><!--Generated on: 30/03/2012 17:42:13 | Transform Time: 111--><document> <header> <documentGroup>general</documentGroup> <documentType>medalistCompact</documentType> <rsc code="GEM000000"/> <documentSerial>b1df2feb-1bcd-43b4-aa61-8ae9c82c3bb9</documentSerial> <timeStamp>2012-03-30T15:42:13.843Z</timeStamp> </header> <medalist> <m o="1" r="1" i="82f5db84-0591-49ee-b6f4-a1d26e9381fb" g=“2" s="1" b="0" t="2" c=”JAM">Usain Bolt</m> <m o="2" r="1" i="56bdea69-dce5-4cab-91ff-802be5077350" g="1" s="0" b="0" t="1" c="NED">VOGELS Guus</m> <m o="3" r="1" i="1f199d68-3a7d-44fa-adae-a5cde084395b" g="1" s="0" b="0" t="1" c="NED">DERIKX Rob</m> <!- blah blah and blah -> </medalist></document>XML and RDF data inter-linked via GUIDS Future Media © BBC MMXII
  64. 64. Olympics API (XML – Video catch example) https://api.stage.live.co.uk/olympicdata/public/videos/catchup <?xml version="1.0" encoding="UTF-8"?> <catchup poll-interval-in-seconds="60"> <slot index="1" type="session"> <pid>b017t8b4</pid> <videoSessionId>OBS-1284552321</videoSessionId> <sessionCode>FB001</sessionCode> <discipline RSC="FB0000000" name="Football" url="http://www.bbc.co.uk/sport/olympics/2012/sports/football" urlName="football" icon="http://static.live.bbci. /ivp2012/images/icons/sports/football.png"/> <scheduleStart>2001-12-17T09:30:47Z</scheduleStart> <scheduleEnd>2001-12-17T09:30:47Z</scheduleEnd> <actualStart>2001-12-17T09:30:47Z</actualStart> <actualEnd>2001-12-17T09:30:47Z</actualEnd> <available>true</available> <editorsPick>true</editorsPick> <live>false</live> <title>28/11/2011</title> <shortSynopsis>A desperate Roxy makes the unwise decision to dig for dirt on Derek.</shortSynopsis> <myImageBaseUrl>http://node2.bbcimg.co.uk/iplayer/images/episode/</myImageBaseUrl> </slot> <slot index="2" type="session"> <pid>hbtest1</pid> <videoSessionId>OBS-64835275893</videoSessionId> <sessionCode>HB001</sessionCode> <discipline RSC="HB0000001"/> <scheduleStart>2001-12-17T09:30:47Z</scheduleStart> <scheduleEnd>2001-12-17T09:30:47Z</scheduleEnd> <actualStart>2001-12-17T09:30:47Z</actualStart> <actualEnd>2001-12-17T09:30:47Z</actualEnd> <available>true</available> <editorsPick>true</editorsPick> <live>false</live> </slot> Future Media </catchup> © BBC MMXII
  65. 65. Olympics API (Video log/chapter points) GET https://api.test.bbc.co.uk/olympicdata/bdf-log/pid/p00g2lqp <document> <logEvent> <header> <documentGroup>general</documentGroup> <documentType>videoLogging</documentType> <rsc code="FE0000000"/> <documentSerial>fdf38714-45d3-4d89-5100-8883be340700</documentSerial> <timeStamp>2012-07-10T10:42:38+01:00</timeStamp> <video> <pid>p00g2lqp</pid> <timeCode>2012-07-10T08:56:38Z</timeCode> </video> <taggingtool>Version 0.1.10</taggingtool> </header> <Log> <LogId>BBC-TT-1341913358.6783</LogId> <Action>UPDATE</Action> <Date>2012-07-10</Date> <TimeCode>09:56:38</TimeCode> <RSC>FE0000000</RSC> <Keywords> <Keyword>BBC free text</Keyword> </Keywords> <bbcLabel>Test One</bbcLabel> </Log> </logEvent> <logEvent> <header> <documentGroup>general</documentGroup> <documentType>videoLogging</documentType> <rsc code=""/> <documentSerial>20dc4019-d3be-4bb1-6a75-256ba7622b5d</documentSerial> <timeStamp>2012-07-10T10:42:51+01:00</timeStamp> <video> <pid>p00g2lqp</pid> <timeCode>2012-07-10T09:16:38Z</timeCode> </video> <taggingtool>Version 0.1.10</taggingtool> Future Media </header> © BBC MMXII <Log>
  66. 66. Olympics Architecture Large A1 printout…. Way Too big for a slide.. Future Media © BBC MMXII
  67. 67. online2012  http://m.bbc.co.uk/news Future Media © BBC MMXII
  68. 68. Dynamic News mobile• Multi device capability• Responsive Web design• Built on a dynamic service API• New re-usable content model• Dynamic assets Future Media © BBC MMXII
  69. 69. Responsive Dynamic News mobile (iPad) Future Media © BBC MMXII
  70. 70. Responsive Dynamic News mobile (iPhone) Future Media © BBC MMXII
  71. 71. Recap  Static News ArchitectureFuture Media © BBC MMXII
  72. 72. DynamicNews ArchitectureFuture Media © BBC MMXII
  73. 73. New Content Model (Re-usable XML/RDF ) Future Media © BBC MMXII
  74. 74. MarkLogics handy Xinclude resolutionIncluding story data on news index XML<item> <xi:include href="http://www.bbc.co.uk/asset/13447877" xpointer="xmlns(bbc=http://www.bbc.co.uk/content/asset)xpointer(/bbc:story/bbc:itemMeta)"> <xi:fallback> <!-- Unable to find href="http://www.bbc.co.uk/asset/13447877"xpointer="xmlns(bbc=http://www.bbc.co.uk/content/asset) xpointer(/bbc:story/bbc:itemMeta)" --> </xi:fallback> </xi:include> ... Future Media © BBC MMXII
  75. 75. News Index APIIncluding story data on news index XMLHTTP GEThttps://api.live.bbc.co.uk/content/asset/news/technology/HTTP HeadersX-Candy-Audience: DomesticX-Candy-Platform: EnhancedMobile Contextualised outputAccept: application/json •AudienceOr •Platform •Response typeHTTP HeadersX-Candy-Audience: DomesticX-Candy-Platform: EnhancedMobileAccept: application/xml Future Media © BBC MMXII
  76. 76. News Story APIIncluding story data on news index XMLHTTP GEThttps://api.live.bbc.co.uk/content/asset/news/uk-17829360HTTP HeadersX-Candy-Audience: DomesticX-Candy-Platform: EnhancedMobileAccept: application/jsonOrHTTP HeadersX-Candy-Audience: DomesticX-Candy-Platform: EnhancedMobileAccept: application/xml Future Media © BBC MMXII
  77. 77. Platform future…..BBC sport site re-engineered to use fully dynamic approach (News Mobile style)BBC news high web site re-engineered to use fully dynamic approach (News Mobile style)MarkLogic as CMS repository (iSite)MarkLogic Binary storage R&DEtc….etc.. Future Media © BBC MMXII
  78. 78. Questions? Slides: jem.rayfield @bbc.co.uk Twitter: @jemrayfieldFuture Media © BBC MMXII

×