0
Building coherence at bbc.co.uk
Tom Scott
8 UK TV Channels
10 UK Radio Stations
5 National TV and radio
40 local radio stations
Plus the World Service (in 32 langua...
...and a website... since 1994
... that all makes for a big archive!
Historically the BBC has created a series of
microsites – each coherent in their own right but
not across the breadth of B...
Which means I can’t find everything about “CERN”
Which means I can’t find everything about “CERN”
...Paul Weller...

                    Paul Weller http://www.flickr.com/photos/johnbullas/3410330728/
...Lion...
...or even Jeremy Clarkson
I can’t follow my nose, I can’t browse by meaning,
from one page to the next following a semantic
thread
                 ...
But things are changing
Linked Data has helped us build a coherent,
scalable, sane service. One that we hope is a bit
more human literate.
     Li...
Use URIs to identify things not only documents

                      How it works: The Web http://flickr.com/photos/danbri...
Use HTTP URIs - globally unique names that
anyone can dereference

                   Colon Slash Slash http://www.flickr.c...
Provide useful information [in RDF] when someone
looks up a URI

                     Information Desk http://www.flickr.co...
Include links to other URIs to let people discover
related information

                             Links http://www.flick...
One implication of this is that I think there’s only
URIs and metadata... nothing else

                   Self-portraitur...
URIs are used as identifiers for real world things
...like Polar Bears and Jeremy Clarkson
Just as my passport is an identifier for me
...which in turn makes assertions about me
Thomas Scott
              16th May 1972       United Kingdom




...which in turn makes assertions about me
bbc.co.uk/nature/species/tiger
is an identifier for the tiger species with resources
which make assertions about it
bbc.co.uk/nature/species/tiger
is an identifier for the tiger species with resources
which make assertions about it
Linked Data at the BBC

                     Test Card X http://www.flickr.com/photos/marksmanuk/3098983708/
A page (URI) per programmes
bbc.co.uk/programmes/:pid
...and programme segments...
In the music domain we have a page for every
artist the BBC plays
bbc.co.uk/music/artist/:musicbrainzID
And in the natural history domain we have URIs of
animals...
bbc.co.uk/nature/:rank/:dbpediaID
...adaptations and behaviours...
bbc.co.uk/nature/adaptaion/:dbpediaID
...and habitats...
bbc.co.uk/nature/habitats/:dbpediaID
And because the web is about URIs not pages
there are separate URIs for each resource
These are our building blocks

                                Silos http://www.flickr.com/photos/bottleleaf/2218990208/
But context lies in the links between these domains
Programmes featuring a species
Clips from programmes about a species
Clips live at /programmes but are transcluded onto
other pages
                              Silos http://www.flickr.com/ph...
Tracks played in an episode
Programmes that have played an artist
How have we put the blocks together?
DBpedia as a controlled vocabulary

                             Silos http://www.flickr.com/photos/bottleleaf/2218990208/
Different teams model their domain
Brands



                           Series     Programme



                          Episodes
             Content
     ...
Link models together
Linked Data allows loosely coupled, distributed
teams to share data, share models and build on
each others work
Thank you
Programmes ontology
	 http://www.bbc.co.uk/ontologies/programmes
Understanding the big BBC graph
	 http://blogs....
Upcoming SlideShare
Loading in...5
×

Online Information Conference

3,115

Published on

1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,115
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
232
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide
  • Although I’m in speaking in the semantic web strand of this conference I’m not going to talk about RDF/XML.

    That’s not because I don’t think it’s important, I do, but rather because RDF is often conflated with RDF/XML and I would rather consider the model for a bit - what it means and how we’ve used it. So I guess what I really mean is that what I’m going to be talking about is RDF the model not RDF the data format. If however that is something you are interested in that perhaps grab me after my talk because we are publishing lots and lots of RDF/XML.
  • The BBC is the largest broadcasting corporation in the world.

    Its mission is to enrich people's lives with programmes that inform, educate and entertain. It is a public service broadcaster, established by a Royal Charter and funded by the licence fee that is paid by UK households.

    The BBC uses the income from the licence fee to provide services, including...

    8 national TV channels + regional variations and programming
    National TV and radio for Scotland, Wales and Northern Ireland plus 40 local radio stations
  • and that’s before you get to the World Service which broadcasts to the world in 32 languages.

    We’ve had a web presence since 1994

    What all this means is that the BBC produces an incredible range, diversity and volume of content .

    This volume of content is a challenge in it’s own right let alone before you consider the size of the existing archive
  • This size presents a number of challenges - how to organise, how to build
  • For starters traditional 'left hand nav' style navigation doesn't work. From a UX POV, nor from a coordination and governance POV.

    As a result the BBC has historically created a series of microsite. Each coherent in their own right but not across the breadth of BBC content.

    Consider for example I can navigate around a Radio 4 site about the opening of the LHC... but...
  • I can’t find everything to BBC knows about CERN... but equally I can’t find everything
  • I can’t find everything to BBC knows about CERN... but equally I can’t find everything
  • I can’t find everything to BBC knows about CERN... but equally I can’t find everything
  • I can’t find everything to BBC knows about CERN... but equally I can’t find everything
  • I can’t find everything to BBC knows about CERN... but equally I can’t find everything
  • Paul Weller, or any other artist, nor can I find everything
  • But things are changing..

    Starting with the data and how people think about it rather than starting with the web page down. And when I say data I really mean starting with understanding what things people care about and giving each of those things a URI and returning appropriate representations...
  • Of course what I’m talking about is Linked Data... even if we didn’t quite realise that when we started.

    But the idea that we should care about our URIs, care about having one per concept, care about having machine representations for those resources instead of a separate API has helped us build a coherent, scalable, sane service.

    Linking Open Data is a grassroots project to use web technologies to expose data on the web. It is for many people synonymous with the semantic web, or worse web 3.0, a term I personally can’t stand (esp when you consider that TimBLs original memo described a web of things).

    It does, as far as I’m concerned, represent a very large subset of the semantic web project.

    But what is it?

    Well it can be described with 4 simple rules.
  • The web was designed to be a web of things, not just a web of documents.

    Those documents make assertions about things in the real world but that doesn’t mean the identifiers can only be used to identify web documents.

    Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.
  • The beauty of the web is its ubiquitous nature - the fact it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.

    URI’s are globally unique, open to all and decentralised.

    Don’t go using DOI or any other identifier - on the web all you need is an HTTP URI.
  • And obviously you need to provide some information at that URI. When people dereference it you need to give them some data - ideally as RDF as well as HTML.

    Providing the data as RDF means that machines can process that information for people to use. Making it more useful.
  • And of course you also need to provide links to other resources so people can continue their journey.

    And that means contextual links to other resources elsewhere on the web, not just your site.

    And that’s it. Pretty simple.

    And I would argue that, other than the RDF bit, these principles should be followed for any website - they just make sense.
  • Including that I look like this
    Was born here
    That my name is this

    (diff slide - my driving license is another identifier which also makes assertions about me)
  • Including that I look like this
    Was born here
    That my name is this

    (diff slide - my driving license is another identifier which also makes assertions about me)
  • Including that I look like this
    Was born here
    That my name is this

    (diff slide - my driving license is another identifier which also makes assertions about me)
  • Including that I look like this
    Was born here
    That my name is this

    (diff slide - my driving license is another identifier which also makes assertions about me)
  • Tigers look like this
    Sound like this
    Do these things
    This has happened to them
    They live here
    Do have this sort of way of life (adaptations)
  • Tigers look like this
    Sound like this
    Do these things
    This has happened to them
    They live here
    Do have this sort of way of life (adaptations)
  • Tigers look like this
    Sound like this
    Do these things
    This has happened to them
    They live here
    Do have this sort of way of life (adaptations)
  • Tigers look like this
    Sound like this
    Do these things
    This has happened to them
    They live here
    Do have this sort of way of life (adaptations)
  • People care about our programme brands - they search for them, love watching them and expect the BBC to provide footage/ clips of them.
  • And we have separate pages for every artist the BBC plays on the new music site.
  • And you can do the same thing for sounds, news stories, links, wikipedia etc
  • If you build things correctly then like lego we can stick things together to build more stuff
  • Information about a thing is important and it is interesting, but it’s interest is somewhat limited. What’s really interesting is the join the link between things.
  • What programmes or clips do we have about a given species?
  • Clips live at /programmes but are transcluded onto other pages
  • Which tracks were plaid on a particular show - linking through to the artist pages.

    Again the information about the artist ‘lives’ at /music but it’s pulled into the programme domain because
  • Which in turn tell you about which programmes and radio stations play that artist - with links through to the programme or station.
  • What probably isn’t completely obvious is that we have modeled and structured the site around those things.

    So we have classes of object and relationships between them, and resources within each class. For example - a Lion is a Species and species have defined relationships to habitats, location, conservation status and adaptation.

    What this means is that when we create a new species it appears on it’s habitat, adaptation page etc.
  • Transcript of "Online Information Conference"

    1. 1. Building coherence at bbc.co.uk Tom Scott
    2. 2. 8 UK TV Channels 10 UK Radio Stations 5 National TV and radio 40 local radio stations Plus the World Service (in 32 languages)
    3. 3. ...and a website... since 1994
    4. 4. ... that all makes for a big archive!
    5. 5. Historically the BBC has created a series of microsites – each coherent in their own right but not across the breadth of BBC content Radio 4 Big Bang http://www.bbc.co.uk/radio4/bigbang/
    6. 6. Which means I can’t find everything about “CERN”
    7. 7. Which means I can’t find everything about “CERN”
    8. 8. ...Paul Weller... Paul Weller http://www.flickr.com/photos/johnbullas/3410330728/
    9. 9. ...Lion...
    10. 10. ...or even Jeremy Clarkson
    11. 11. I can’t follow my nose, I can’t browse by meaning, from one page to the next following a semantic thread Snickers http://www.flickr.com/photos/homer4k/386980596/
    12. 12. But things are changing
    13. 13. Linked Data has helped us build a coherent, scalable, sane service. One that we hope is a bit more human literate. Linked Data cloud diagram http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05_colored.png
    14. 14. Use URIs to identify things not only documents How it works: The Web http://flickr.com/photos/danbri/2415237566/
    15. 15. Use HTTP URIs - globally unique names that anyone can dereference Colon Slash Slash http://www.flickr.com/photos/jeffsmallwood/299208539/
    16. 16. Provide useful information [in RDF] when someone looks up a URI Information Desk http://www.flickr.com/photos/metropol2/149294506/
    17. 17. Include links to other URIs to let people discover related information Links http://www.flickr.com/photos/ravages/2831688538/
    18. 18. One implication of this is that I think there’s only URIs and metadata... nothing else Self-portraiture + metadata http://www.flickr.com/photos/saltatempo/323462998/
    19. 19. URIs are used as identifiers for real world things ...like Polar Bears and Jeremy Clarkson
    20. 20. Just as my passport is an identifier for me
    21. 21. ...which in turn makes assertions about me
    22. 22. Thomas Scott 16th May 1972 United Kingdom ...which in turn makes assertions about me
    23. 23. bbc.co.uk/nature/species/tiger is an identifier for the tiger species with resources which make assertions about it
    24. 24. bbc.co.uk/nature/species/tiger is an identifier for the tiger species with resources which make assertions about it
    25. 25. Linked Data at the BBC Test Card X http://www.flickr.com/photos/marksmanuk/3098983708/
    26. 26. A page (URI) per programmes bbc.co.uk/programmes/:pid
    27. 27. ...and programme segments...
    28. 28. In the music domain we have a page for every artist the BBC plays bbc.co.uk/music/artist/:musicbrainzID
    29. 29. And in the natural history domain we have URIs of animals... bbc.co.uk/nature/:rank/:dbpediaID
    30. 30. ...adaptations and behaviours... bbc.co.uk/nature/adaptaion/:dbpediaID
    31. 31. ...and habitats... bbc.co.uk/nature/habitats/:dbpediaID
    32. 32. And because the web is about URIs not pages there are separate URIs for each resource
    33. 33. These are our building blocks Silos http://www.flickr.com/photos/bottleleaf/2218990208/
    34. 34. But context lies in the links between these domains
    35. 35. Programmes featuring a species
    36. 36. Clips from programmes about a species
    37. 37. Clips live at /programmes but are transcluded onto other pages Silos http://www.flickr.com/photos/bottleleaf/2218990208/
    38. 38. Tracks played in an episode
    39. 39. Programmes that have played an artist
    40. 40. How have we put the blocks together?
    41. 41. DBpedia as a controlled vocabulary Silos http://www.flickr.com/photos/bottleleaf/2218990208/
    42. 42. Different teams model their domain
    43. 43. Brands Series Programme Episodes Content Service Publishing Version Event Broadcast Different teams model their domain
    44. 44. Link models together
    45. 45. Linked Data allows loosely coupled, distributed teams to share data, share models and build on each others work
    46. 46. Thank you Programmes ontology http://www.bbc.co.uk/ontologies/programmes Understanding the big BBC graph http://blogs.talis.com/n2/archives/569 Music ontology http://musicontology.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×