Linked data context strategy

461
-1

Published on

Presentation to open data week

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
461
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • from its beginnings in the 1920s in radio. Now 10 national radio channels and more than 40 in the nations and regions
  • TV broadcast since the 1930s
  • On the Web since 1994. that's a lot of web-history too, we've been doing this for a while
  • The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
  • The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
  • The BBC Music Website has a content-rich offering. Not surprising when you have 10 major national radio stations, many more local stations, and a lot of music programmes in your TV schedule. But it doesn't mean you have to manage everything from bios to discography from scratch
  • Data is a first-class citizen
  • Working on the World Service audio archive three years of continuous audio
  • Speech recognition -> automated transcripts + topic identification (at scale) Kiwi is a framework aimed at automatically identifying topics in speech radio programmes, with topic identifiers being drawn from Linked Open Data sources such as DBpedia. In order to generate such topics in a reasonable time for large programme archives, we built a processing infrastructure distributing computations on cloud resources (e.g. Amazon EC2). We used this infrastructure to automatically tag the entire BBC World Service archive (70,000 programmes) in around two weeks.
  • Linked data context strategy

    1. 1. Research and Development ♥ BBC MMXIII A Linked Data Context Strategy for the BBC Michael Smethurst, BBC Internet Research and Future Services With thanks to Yves Raimond, Tristan Ferne, Olivier Thereaux, Paul Rissen
    2. 2. Research and Development ♥ BBC MMXIII Why Linked Data? 1. On the web content needs context to be useful 2. The BBC has data on its output but not on the subjects of its output 3. Commercial data is usually modelled at the wrong level (saleable items) 4. Commercial data doesn’t give you the freedom to make your own APIs on top 5. Using inference minimises workload
    3. 3. “Inform, Educate and Entertain”
    4. 4. George Orwell, 1940s
    5. 5. On the Web since 1994 / 1995
    6. 6. Linked data
    7. 7. 1. Consuming linked data 2. Managing linked data 3. Publishing linked data
    8. 8. 1000 ~ 1500 programmes ~ 750 news articles every day
    9. 9. Loose coupling via shared identifiers
    10. 10. Linked data as experience prism
    11. 11. Principle #1: The web is our CMS
    12. 12. 1. Consuming linked data 2. Managing linked data 3. Publishing linked data
    13. 13. Annotate once, infer and re-use
    14. 14. 1. Consuming linked data 2. Managing linked data 3. Publishing linked data
    15. 15. Principle #2: Our web site is our API
    16. 16. Generating data from content
    17. 17. Automated Tagging +Speaker recognition of a very large audio archive
    18. 18. How do we answer… Which radio programmes interviewed Nelson Mandela in 1990? How can I find a picture of a relative in a library’s photo archive? Was my music used in the background of that TV programme?
    19. 19. Thank you. Questions to michael.smethurst@bbc.co.uk @fantasticlife

    ×