BBC News Labs at ISKO Conference, UCL, London - July 2013


Published on

BBC News Labs presentation from ISKO 2013 in London at UCL on Monday 8th July 2013.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • UK's most popular news website - 6 million unique browsers every day (3rd biggest site in the UK after Google and Facebook) publish around 500 articles every day - local, national global publish in 27 languages as World Service (+ 2 UK languages alongside English) hundreds of journalists, many working cross-media (TV/radio/online)
  • articles created in a home-grown Content Management System flat page publishing via FTP - good for high load events but limits our UX and data potential migrating to a dynamic publishing platform typical three-tier architecture: presentation – service – data data layer is a content store (MarkLogic) + a triple store (Bigowlim) that holds annotations made by journalists about content in the content store
  • need to minimize impact on journalists integration with existing tools and workflow as much as possible tagging rather than semantic annotation suggest concepts rather than free-hand annotation Sheffield University’s GATE framework for Natural Language Processing, identify the ‘things’ in an article use the concepts in the triple store as a data dictionary jiurnalists should mostly just have to accept or reject tags
  • pilot - can we automate the production of the 58 local news region sub-index pages? (old transmitter locations) currently entirely manual task to maintain these pages GET articles about or mentioning places that fall within the BBC News region
  • generally worked well – journalists tagging did not cause too much disruption, and we were able to generate aggregations of topic by concept BUT we saw some problems duplication where multiple articles were written about large events journalists wanted the ability to set the running order (defaults to chronologically most recent) quality of concept extraction was poor (may improve over time?) journalists gaming the system – adding tags to get on specific indexes, republishing to effect pinning
  • - a simple ontology for people, organisations, places and intangibles (themes) and their intersection with events - based on rNews, the Event ontology and PA ’ s SNaP Stuff ontology - annotate articles with events, where the event:place is Birmingham etc.
  • - IPTC rNews terms in RDFa - basic publishing metadata in the <head> for rich snippets - linked open data in the body
  • - immediate results - rich snippets for articles - apparently better ranking by topic (anecdotal)
  • - we introduced the change in the first week of May - by the end of may we were seeing some positive press coverage, people were noticing
  • BBC News Labs at ISKO Conference, UCL, London - July 2013

    1. 1. Unlocking the Data in BBC News ISKO Conference July 8th 2013
    2. 2.
    3. 3. moving to linked data • moving from static HTML to dynamic, responsive site • introducing linked data to power content aggregations around related topics • starting to embed linked open data in every page as RDFa • using the IPTC rNews vocabulary to describe contnet in a machine-readable way
    4. 4. impact on journalists • annotating (“tagging”) content with topics • tool embedded into existing CMS • concept extraction/NLP for topic suggestion • journalists accept/reject suggested topics for annotation
    5. 5. pilot - local indexes
    6. 6. learning from the pilot • generally - it works • but duplication for big events • also need pinning • concept extraction poor • journalists gaming the system
    7. 7. corenews model
    8. 8. pilot - publishing RDFa • using RDFa + rNews to embed machine- readable metadata in article source code • discoverability: rich snippets + better ranking • publish Linked Open Data: <articleURI> rdf:type rnews:Article <articleURI> rnews:about <thingURI> etc...
    9. 9. learning from the pilot
    10. 10. learning from the pilot
    11. 11. next steps • rolling out tagging to journalists throughout BBC News • making better use of rNews/RDFa - full mark-up integration • piloting the use of organising content by storylines
    12. 12. more info • • -05-01.shtml • • twitter: @jeremytarling
    13. 13. BBC News Labs At ISKO
    14. 14. BBC News Labs • Explore opportunities for BBC News • Using real data • Prototype quickly • …which is normally hard in big Orgs…
    15. 15. Unlocking the Data in BBC News • All we have is a bunch of articles... • What does a “tagged” world looks like? • The Juicer does [badly] what Journalists will do 1 Grab BBC News & Sport Articles 2 Extract Concepts 3 Match to DBpedia 4 Annotate Article 5 Push to Triplestore 6 Expose via API The News Juicer
    16. 16. Demo • Juicer : • Person : q=Andy_Murray • Place : q=Cheshire • News Near Me :
    17. 17. Next • “Juice” more of BBC Archive • Build prototypes • See what works • Storyline : News Org Partnerships
    18. 18. More info • BBC-News-Lab • • twitter: @completedespair • @BBC_News_Labs
    19. 19. In case network blows up