0
5 June 2013
BBC Linked Data Platform
Using semantic technologies to make our content more connected and more discoverable
A (very) short history
✤ Dynamic Semantic Publishing
✤ BBC Sport - Transition from ‘static’ to ‘dynamic’
✤ Introduction of...
Olympics 2012
Athletes & Medals: from trackside to our audience
BBC Linked Data Platform
(our logo)
LDP:The CreativeWork
MinimalMetadata
Semantically
AggregatedMetadata
Triple Store
Website
Mobile
Apps
IPTV
Open API
CreativeWorks
✤ Minimal metadata
✤ Enough non-semantic metadata to support ‘rich links’ in a wide
range of applications
✤ ...
Some use-cases
✤ Automated index pages/feeds
✤ Semantic navigation
✤ Semantic search
✤ A typical query:
✤ Top 10, most rec...
Powered by LDP
BBC Sport
BBC Music
BBC Olympics 2012
BBC Knowledge & Learning Beta
BBC News Local Beta
BBC Sport Mobile App
CreativeWork Ontology
CreativeWorks in Code
case class CreativeWork(
locators: Set[Locator],
title: String,
modified: DateTime,
format: Option[F...
Creative
Work Query*
CONSTRUCT {
?creativeWork a cwork:CreativeWork ;
a ?type ;
cwork:title ?title ;
cwork:about ?about ;
...
Our principal challenge:
Data Management
4 Kinds of Data
✤ Creative Works
✤ Reference Data, managed in sets (Datasets)
✤ Reference Data, managed individually (Reso...
99.99% Availability
Our own URIs
✤ Everything has a ‘Thing URI’:
✤ http://www.bbc.co.uk/things/{GUID}#ID
✤ Opaque ID, dereferencable*
✤ BBC co...
Our own ontologies
✤ Core set of ontologies that are BBC owned
✤ Creative Work, BBC, (Organsational) Provenance, etc
✤ Abi...
Open data
✤ Provided through Mashery
✤ ‘Connected Studio’ events will validate
our API
✤ Public beta to follow
✤ JSON-LD &...
The Hard Problems...
Managing concepts across BBC
✤ Which domain ‘owns’ Arnold Schwarzenegger?
✤ News? Entertainment? History? Politics?
✤ Can ...
Metadata
Often subjective, never complete
✤ What is this TV programme about?
✤ Manual tag curation
✤ Subjective
✤ Long-ter...
When to reason?
✤ Our options...
✤ Before writing to the triple store
✤ Materialised in the triple store (Forward-chaining...
Maturity of SemanticTech
✤ From a Software Industry perspective, Semantic (RDF) Technology is
not mainstream and is theref...
Find out more
✤ Video from QCon London 2013:
✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api
✤ BBC Intern...
Upcoming SlideShare
Loading in...5
×

BBC Linked Data Platform (SemTechBiz San Fran 2013)

517

Published on

A introduction to the BBC's Linked Data Platform, with occassional dips into the detail of the code, ontologies and queries that make it possible.

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
517
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
32
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "BBC Linked Data Platform (SemTechBiz San Fran 2013)"

  1. 1. 5 June 2013 BBC Linked Data Platform Using semantic technologies to make our content more connected and more discoverable
  2. 2. A (very) short history ✤ Dynamic Semantic Publishing ✤ BBC Sport - Transition from ‘static’ to ‘dynamic’ ✤ Introduction of Semantic Technologies for World Cup 2010 ✤ Raising the bar for Olympics 2012 ✤ Linked Data Platform & The Creative Work
  3. 3. Olympics 2012 Athletes & Medals: from trackside to our audience
  4. 4. BBC Linked Data Platform (our logo)
  5. 5. LDP:The CreativeWork MinimalMetadata Semantically AggregatedMetadata Triple Store Website Mobile Apps IPTV Open API
  6. 6. CreativeWorks ✤ Minimal metadata ✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications ✤ Enough semantic metadata (tags) to support discovery through semantic queries ✤ Full metadata requires a content-type-specific metadata API ✤ Access to content requires a content API
  7. 7. Some use-cases ✤ Automated index pages/feeds ✤ Semantic navigation ✤ Semantic search ✤ A typical query: ✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party
  8. 8. Powered by LDP BBC Sport BBC Music BBC Olympics 2012 BBC Knowledge & Learning Beta BBC News Local Beta BBC Sport Mobile App
  9. 9. CreativeWork Ontology
  10. 10. CreativeWorks in Code case class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None ) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", "")) } object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300 }
  11. 11. Creative Work Query* CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel . } WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE { ! {{#about}} ! ! FILTER (?about = <{{about}}>) . ! ! ?creativeWork cwork:about ?about . ! {{/about}} ! {{#mentions}} ! ! FILTER (?mentions = <{{mentions}}>) . ! ! ?creativeWork cwork:mentions ?mentions . ! {{/mentions}} ! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . ! } ! ORDER BY DESC(?modified) ! LIMIT 10 ! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } } } *Simplified SPARQL CONSTRUCT Inner SELECT Parametisation Pagination Mustache-templated
  12. 12. Our principal challenge: Data Management
  13. 13. 4 Kinds of Data ✤ Creative Works ✤ Reference Data, managed in sets (Datasets) ✤ Reference Data, managed individually (Resources) ✤ Ontologies
  14. 14. 99.99% Availability
  15. 15. Our own URIs ✤ Everything has a ‘Thing URI’: ✤ http://www.bbc.co.uk/things/{GUID}#ID ✤ Opaque ID, dereferencable* ✤ BBC controls identity, therefore quality & consistency ✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc *coming soon
  16. 16. Our own ontologies ✤ Core set of ontologies that are BBC owned ✤ Creative Work, BBC, (Organsational) Provenance, etc ✤ Ability to change regularly and unilaterally ✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org) ✤ Domain ontologies can be shared or reused ✤ Sport, Politics, GeoLocation, etc
  17. 17. Open data ✤ Provided through Mashery ✤ ‘Connected Studio’ events will validate our API ✤ Public beta to follow ✤ JSON-LD & Turtle ✤ Future ✤ Self-provisioned, cloud-based triple stores ✤ Data Dumps
  18. 18. The Hard Problems...
  19. 19. Managing concepts across BBC ✤ Which domain ‘owns’ Arnold Schwarzenegger? ✤ News? Entertainment? History? Politics? ✤ Can domains ‘own’ predicates? ✤ Layering information over shared concepts ✤ High quality sub-sets vs. lower quality ‘long-tail’ ✤ Synchronisation with external datasets ✤ Tools for creating and managing concepts ✤ Emerging, splitting & combining concepts ✤ Linked Data gives us a language to solve these problems
  20. 20. Metadata Often subjective, never complete ✤ What is this TV programme about? ✤ Manual tag curation ✤ Subjective ✤ Long-term expense ✤ Inconsistent ✤ Automated tag generation ✤ Short-term expense ✤ Value in data or algorithm? ✤ Complex ✤ Relies on assumptions ✤ Our approach? Invest in both. Validate learnings.
  21. 21. When to reason? ✤ Our options... ✤ Before writing to the triple store ✤ Materialised in the triple store (Forward-chaining inference) ✤ Inferred by the SPARQL engine (Backward-chaining inference) ✤ After SPARQL results have returned ✤ None/some/all of the above
  22. 22. Maturity of SemanticTech ✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell ✤ Library/application immaturity can be a hinderance to innovation ✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction ✤ Semantic Technology is complex, but using it, need not be
  23. 23. Find out more ✤ Video from QCon London 2013: ✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api ✤ BBC Internet Blog: ✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-­‐Data-­‐Connecting-­‐ together-­‐the-­‐BBCs-­‐Online-­‐Content ✤ david.rogers@bbc.co.uk ✤ @daverog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×