2. www.sti-innsbruck.at
• BBC working to integrate data and linking documents across BBC
domains
• Collaboration with Freie Universität Berlin, Rattle Research (and
Ontotext)
• Semantic Web context: usage of Linked Data from MusicBrainz and
DBpedia
2
3. www.sti-innsbruck.at
Problem
• BBC publishes large amounts of online content text/videos/audio
• Mostly data for broadcast brands and domain specific microsites
• Division of its services by domain, e.g. food, music, news etc.
No interlinking between these domain specific sites – not using the full
potential of available data
3
4. www.sti-innsbruck.at
Objectives
• DBpedia to provide a common ”controlled” vocabulary and
equivalency service, which in turn is used to add ”topic badges” to
existing, legacy web pages
• Soft transition of the old to the new system
– Developing a new service that supports the branding of our Radio stations, TV
channels and programmes (bbc.co.uk/programmes)
– Developing a new music offering (bbc.co.uk/music/beta) that builds on existing
open web standards and is fully integrated with programme support service
– Simple navigational elements (i.e. Topic Badges and term extraction) to support
contextual, semantic navigation
– Common set of web scale identifiers to help classify all BBC online content (and
external URLs) and to help create equivalency between multiple vocabularies
4
5. www.sti-innsbruck.at
Cross-Linking Legacy Content with Legacy Systems
• Desire to link to further BBC domains (apart from programmes and music)
– Through an about-relationship between programmes, people, places and subjects
• Data was created with a legacy auto-categorization system called CIS.
• CIS holds a hierarchy of terms in five main top-level classes:
– Proper names
– Subjects
– Brands
– Time periods
– Places
Objects identified with /programmes and /music are also to be found within other
domains: Mechanism to map between equivalent terms
Linking CIS Concepts to DBpedia
5
7. www.sti-innsbruck.at
Linking BBC Domains
• DBpedia weighted Label Lookup using Wikipedia inter-article-links as weight
indicator
– links(redirect)*log2(weight(article))
• Context-Based Disambiguation
– Disambiguate possible concept matches to identify similarity contexts of CIS terms by clustering matches
and finding according contexts in DBpedia
7
8. www.sti-innsbruck.at
Linking Documents to Concepts
• Named entity extraction system Muddy Boots
– Instead of solutions from OpenCalais, Twine and Zemanta because it reuses existing
web identifiers, i.e. Wikipedia/Dbpedia URIs
• BBC News articles, recognize entities in those articles
• Use DBpedia identifier for those entities
• Content Link Tool to add or remove DBpedia identifiers from any given
BBC URL
8
9. www.sti-innsbruck.at
Create User Journeys:
Topic Pages and Navigation Badges
• Topic pages
– Creation of aggregation pages of unstructured and structured content
– Pull together the modeled world of BBC programmes (CIS identifiers mapped to
DBpedia) and unstructured world of BBC News articles
• Navigational Badges
– Once a user has entered an area of BBC content there are few links through to other
related content
– Providing this link is the role of the navigation badge
9
10. www.sti-innsbruck.at
Conclusions
• User experience in the center of BBC efforts
• Semantics as enabler
• What we can learn form the BBC
– User should be in the center of efforts
– Pages not strictly structured according to domain model
– Semantics primarily enable smart interlinking to additional content
– Well hidden magic
– Simplicity of domain models is beauty
• For more information refer to “Beyond the polar bear presentation”
– http://www.slideshare.net/reduxd/beyond-the-polar-bear
10