BBC News Labs at ISKO Conference, UCL, London - July 2013
Upcoming SlideShare
Loading in...5
×
 

BBC News Labs at ISKO Conference, UCL, London - July 2013

on

  • 620 views

BBC News Labs presentation from ISKO 2013 in London at UCL on Monday 8th July 2013.

BBC News Labs presentation from ISKO 2013 in London at UCL on Monday 8th July 2013.

Statistics

Views

Total Views
620
Views on SlideShare
620
Embed Views
0

Actions

Likes
1
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • UK's most popular news website - 6 million unique browsers every day (3rd biggest site in the UK after Google and Facebook) publish around 500 articles every day - local, national global publish in 27 languages as World Service (+ 2 UK languages alongside English) hundreds of journalists, many working cross-media (TV/radio/online)
  • articles created in a home-grown Content Management System flat page publishing via FTP - good for high load events but limits our UX and data potential migrating to a dynamic publishing platform typical three-tier architecture: presentation – service – data data layer is a content store (MarkLogic) + a triple store (Bigowlim) that holds annotations made by journalists about content in the content store
  • need to minimize impact on journalists integration with existing tools and workflow as much as possible tagging rather than semantic annotation suggest concepts rather than free-hand annotation Sheffield University’s GATE framework for Natural Language Processing, identify the ‘things’ in an article use the concepts in the triple store as a data dictionary jiurnalists should mostly just have to accept or reject tags
  • pilot - can we automate the production of the 58 local news region sub-index pages? (old transmitter locations) currently entirely manual task to maintain these pages GET articles about or mentioning places that fall within the BBC News region
  • generally worked well – journalists tagging did not cause too much disruption, and we were able to generate aggregations of topic by concept BUT we saw some problems duplication where multiple articles were written about large events journalists wanted the ability to set the running order (defaults to chronologically most recent) quality of concept extraction was poor (may improve over time?) journalists gaming the system – adding tags to get on specific indexes, republishing to effect pinning
  • - a simple ontology for people, organisations, places and intangibles (themes) and their intersection with events - based on rNews, the Event ontology and PA ’ s SNaP Stuff ontology - annotate articles with events, where the event:place is Birmingham etc.
  • - IPTC rNews terms in RDFa - basic publishing metadata in the for rich snippets - linked open data in the body
  • - immediate results - rich snippets for articles - apparently better ranking by topic (anecdotal)
  • - we introduced the change in the first week of May - by the end of may we were seeing some positive press coverage, people were noticing

BBC News Labs at ISKO Conference, UCL, London - July 2013 BBC News Labs at ISKO Conference, UCL, London - July 2013 Presentation Transcript

  • Unlocking the Data in BBC News ISKO Conference July 8th 2013
  • www.bbc.co.uk/news
  • moving to linked data • moving from static HTML to dynamic, responsive site • introducing linked data to power content aggregations around related topics • starting to embed linked open data in every page as RDFa • using the IPTC rNews vocabulary to describe contnet in a machine-readable way
  • impact on journalists • annotating (“tagging”) content with topics • tool embedded into existing CMS • concept extraction/NLP for topic suggestion • journalists accept/reject suggested topics for annotation
  • pilot - local indexes
  • learning from the pilot • generally - it works • but duplication for big events • also need pinning • concept extraction poor • journalists gaming the system
  • corenews model
  • pilot - publishing RDFa • using RDFa + rNews to embed machine- readable metadata in article source code • discoverability: rich snippets + better ranking • publish Linked Open Data: <articleURI> rdf:type rnews:Article <articleURI> rnews:about <thingURI> etc...
  • learning from the pilot
  • learning from the pilot
  • next steps • rolling out tagging to journalists throughout BBC News • making better use of rNews/RDFa - full mark-up integration • piloting the use of organising content by storylines
  • more info • http://www.bbc.co.uk/blogs/internet/posts/News-L • http://www.bbc.co.uk/ontologies/news/2013 -05-01.shtml • jeremy.tarling@bbc.co.uk • twitter: @jeremytarling
  • BBC News Labs At ISKO
  • BBC News Labs • Explore opportunities for BBC News • Using real data • Prototype quickly • …which is normally hard in big Orgs…
  • Unlocking the Data in BBC News • All we have is a bunch of articles... • What does a “tagged” world looks like? • The Juicer does [badly] what Journalists will do 1 Grab BBC News & Sport Articles 2 Extract Concepts 3 Match to DBpedia 4 Annotate Article 5 Push to Triplestore 6 Expose via API The News Juicer
  • Demo • Juicer : http://staging.juicer.bbcnewslabs.co.uk/ • Person : http://staging.juicer.bbcnewslabs.co.uk/demo/person? q=Andy_Murray • Place : http://staging.juicer.bbcnewslabs.co.uk/demo/place? q=Cheshire • News Near Me : http://newsnearme2.herokuapp.com/
  • Next • “Juice” more of BBC Archive • Build prototypes • See what works • Storyline : News Org Partnerships
  • More info • http://www.bbc.co.uk/blogs/internet/posts/ BBC-News-Lab • Matt.shearer@bbc.co.uk • twitter: @completedespair • @BBC_News_Labs
  • In case network blows up