• BBC working to integrate data and linking documents across BBC domains • Collaboration with Freie Universität Berlin, Rattle Research (and Ontotext) • Semantic Web context: usage of Linked Data from MusicBrainz and DBpediawww.sti-innsbruck.at 2
Problem • BBC publishes large amounts of online content text/videos/audio • Mostly data for broadcast brands and domain specific microsites • Division of its services by domain, e.g. food, music, news etc. No interlinking between these domain specific sites – not using the full potential of available datawww.sti-innsbruck.at 3
Objectives • DBpedia to provide a common ”controlled” vocabulary and equivalency service, which in turn is used to add ”topic badges” to existing, legacy web pages • Soft transition of the old to the new system – Developing a new service that supports the branding of our Radio stations, TV channels and programmes (bbc.co.uk/programmes) – Developing a new music offering (bbc.co.uk/music/beta) that builds on existing open web standards and is fully integrated with programme support service – Simple navigational elements (i.e. Topic Badges and term extraction) to support contextual, semantic navigation – Common set of web scale identifiers to help classify all BBC online content (and external URLs) and to help create equivalency between multiple vocabularieswww.sti-innsbruck.at 4
Cross-Linking Legacy Content with Legacy Systems • Desire to link to further BBC domains (apart from programmes and music) – Through an about-relationship between programmes, people, places and subjects • Data was created with a legacy auto-categorization system called CIS. • CIS holds a hierarchy of terms in five main top-level classes: – Proper names – Subjects – Brands – Time periods – Places Objects identified with /programmes and /music are also to be found within other domains: Mechanism to map between equivalent terms Linking CIS Concepts to DBpedia www.sti-innsbruck.at 5
Linking BBC Domains • DBpedia weighted Label Lookup using Wikipedia inter-article-links as weight indicator – links(redirect)*log2(weight(article)) • Context-Based Disambiguation – Disambiguate possible concept matches to identify similarity contexts of CIS terms by clustering matches and finding according contexts in DBpediawww.sti-innsbruck.at 7
Linking Documents to Concepts • Named entity extraction system Muddy Boots – Instead of solutions from OpenCalais, Twine and Zemanta because it reuses existing web identifiers, i.e. Wikipedia/Dbpedia URIs • BBC News articles, recognize entities in those articles • Use DBpedia identifier for those entities • Content Link Tool to add or remove DBpedia identifiers from any given BBC URLwww.sti-innsbruck.at 8
Create User Journeys: Topic Pages and Navigation Badges • Topic pages – Creation of aggregation pages of unstructured and structured content – Pull together the modeled world of BBC programmes (CIS identifiers mapped to DBpedia) and unstructured world of BBC News articles • Navigational Badges – Once a user has entered an area of BBC content there are few links through to other related content – Providing this link is the role of the navigation badgewww.sti-innsbruck.at 9
Conclusions • User experience in the center of BBC efforts • Semantics as enabler • What we can learn form the BBC – User should be in the center of efforts – Pages not strictly structured according to domain model – Semantics primarily enable smart interlinking to additional content – Well hidden magic – Simplicity of domain models is beauty • For more information refer to “Beyond the polar bear presentation” – http://www.slideshare.net/reduxd/beyond-the-polar-bearwww.sti-innsbruck.at 10
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.