NoTube: Models & Semantics
Upcoming SlideShare
Loading in...5

NoTube: Models & Semantics






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

NoTube: Models & Semantics NoTube: Models & Semantics Presentation Transcript

  • Monday, March 26, 2012
  • WP1  Overview • “Backend” shared datasets and services • Mappings, integration and common vocabulary • Extra datasets to support usecase scenarios 2Monday, March 26, 2012
  • WP1:  Year  3  Direc2on  &  Achievements • Moving  from  single  ‘warehouse’  to  distributed   set  of  databases,  datasets  and  services • Planning  for  sustainable  life-­‐aFer-­‐project • Integra2ng  feedback  from  end-­‐to-­‐end  demos 3Monday, March 26, 2012
  • 4Monday, March 26, 2012
  • Why  WP1?  two  roles • NoTube  internal:  a  hub  for  data  sharing • NoTube  external:  show  how  shared  datasets   and  vocabularies  help  with  user-­‐facing  “Web   and  TV”  problems • “show”  -­‐cri2cally-­‐  includes  “thinking  out  loud”   as  we  explore,  via  blog,  email,  twiTer  etc. – scholarly  ar2cles  rarely  reach  our  target  audiences 5Monday, March 26, 2012
  • Outreach  message • Let  metadata  flow  widely  -­‐  adver2sing  content,   rather  than  be  a  hidden  asset • Iden/fy  and  link  content  with  useful  URLs(*) • Open  APIs  to  control  TV  and  link  devices  [WP7c] ...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011 WP1 concerned primarily with the first two: getting metadata into the Web from source, rather than scraping, guessing, approximating. 6Monday, March 26, 2012
  • Aside:  RDFa  went  mainstream • Try  ‘View  source’  on  IMDB,  RoTen  Tomatoes,   BBC,  sites  to  find  RDF  descrip2ons  of   TV  content.   • NoTube’s  approach  was  to  lead  by  example,   to  engage  with  industry  and  to  plan  from  the   beginning  for  the  ‘aFerlife’. • This  strategy  worked. 7Monday, March 26, 2012
  • Facebook OGP The Wire page ...simple, extensible standards are being adopted OGP since 2010; since 2011... 8Monday, March 26, 2012
  • TV  Data  Warehouse • We  s2ll  host  several  crawls  of  TV  EPG  data • Trend  is  for  data  to  be  more  cleanly  available   from  source,  without  scraping • Crawling,  aggrega2on  and  integra2on  s2ll   useful,  but  less  scraping  required • Crawled  data  warehouse  also  used  as  a   research  testbed  collec2on 9Monday, March 26, 2012
  • WP1:  Example  Datasets   • WP7c/WP3  use  DBpedia/Wikipedia  URLs  for   topics;  covers  all  mainstream  areas.     • BBC  also  using  Lonclass/UDC  topic  codes   (we’re  helping  prepare  this  for  sharing) • For  Music,  we  adopt  MusicBrainz  IDs • Mapping  diverse  representa2ons  of  ‘genre’ • “Organic”  item/topic  similarity  measures   derived  from  user  data  from  WP3 10Monday, March 26, 2012
  • WP1:  Data  Services • Data  Services  exposed  as  sta2c  files: – Show  how  to  embed  RDFa  in  HTML – Publish  as  RDF/XML  Linked  Data • Interac2ve  Data  Services: – Using  W3C  SPARQL,  SQL  or  SOLR/Lucene,  over   HTTP  and/or  XMPP. 11Monday, March 26, 2012
  • WP1:  Exploita2on  and  Sustainability • WP1’s  approach  designed  to  outlive  NoTube • Use,  augment  and  contribute  to  external  data – e.g.  DBpedia,,  W3C  &  wider  Web  of   data  trend  (e.g.  RDFa  adop2on) – also  we  demonstrate  e.g.  on  blog  how  we  did  it  -­‐   so  others  can  replicate  it – WP4  enrichments  can  be  fed  back  to  externals,   e.g.  similarity  metrics  &  clusters 12Monday, March 26, 2012
  • WP1:  Sustainability  2 • NoTube’s  2010  W3C  “Web  &  TV”  posi2on   paper  lobbied  for  unique  IDs  &  public   metadata  for  video  content;  this  is  now  going   mainstream. • VUA  will  con2nue  hos2ng  some  data,  using  so  can  pass  e.g.  to  W3C  later. • Collab  with  Facebook  OGP  (helped  with  their   RDFa  adop2on)  and  now  search  engines  (RDFa  and  extending  TV  vocab). 13Monday, March 26, 2012
  • 14Monday, March 26, 2012
  • Workpackage  Links • Background  data  for  all  Workpackages • Collaborated  with  WP2  on  BMF  RDF  models • Closer  2es  throughout  WP3/7  developments • WP4  en2ty  and  topic  URIs  point  to  WP1 • Outreach  work  around  RDFa,  Posi2on  Paper   15Monday, March 26, 2012
  • 2nd  review  comments • Not  clear  though  how  this  work  has  built  upon  the  results  of  year  1,   and  how  the  current  progress  is  in  line  with  the  case  studies.   – Worked  more  closely  and  pragma1cally  with  case  studies  in   WP7,  especially  7c  and  related  WP3  work.  Moved  towards  more   decentralised  model,  instead  of  warehouse. – 7c  collabora1on  with  KMIs  Watch  and  Buy  scenario,  and  with   WP4  1med  ad  inser1on  work,  used  EU  p2pnext  limo  work;  also   egtaMETA  from  EBU  from  7c – WP1  work  became  more  "hands-­‐on";  we  helped  WP7  extract   datasets  such  as  and  which  we  expect  will   shortly  be  replaceable  by  cleaner  informa1on  from  official   sources.   16Monday, March 26, 2012
  • 2nd  review  comments • No  relevant  state  of  the  art  is  documented  and  no  details  or   cita<ons  on  automated  algorithms  are  given.  Evalua<on  is   restricted  to  examples  and  no  quan<ta<ve  data  are  given. – We  accept  weakness  in  report  (lack  of  scholarly/ scien1fic  detail);  chose  to  focus  on  more  informal   communica1on  with  outside  world  in  final  phase.  A  2nd   version  of  the  doc  was  produced,  but  main  changes   were  around  life  aUer  project  themes  rather  than   adding  more  scien1fic  and  scholarly  detail. 17Monday, March 26, 2012
  • 2nd  review  comments •  A  close  collabora5on  with  WP7  is   recommended  in  order  to  ensure  that  work   meets  the  requirements  of  the  use  cases. – this  very  well  describes  our  emphasis  in  final   phase 18Monday, March 26, 2012
  • Lessons  Learned • Its  hard  to  simulate  an  evolving  global  data   ecosystem;  but  weve  played  a  small  part  in   some  huge  changes. • Publishers  will  adopt  simple  Seman2c  Web   standards  when  they  are  given  an  incen5ve. • Its  hard  for  a  4-­‐year  old  plan  to  stay  relevant   in  such  an  environment;  ability  to  be  agile  was   cri2cally  important. 19Monday, March 26, 2012
  • WP1  Summary • Used  open  standards  (RDF)  and  largely  open  data  (e.g.   Wikipedia/DBpedia) • Integrated,  mapped  and  data-­‐mined • Contribu1ng  our  addi1ons  back  to  the  community  /   commons  (highlight:  BBC  sims) • Documen1ng  what  we  learned  for  external  developers  and   subsequent  projects Questions? 20Monday, March 26, 2012
  • 21Monday, March 26, 2012
  • 22Monday, March 26, 2012
  • WP1:  End-­‐to-­‐End  issues • In  final  year,  our  End-­‐to-­‐End  scenarios  have   more  mature  implementa2ons • Feedback  from  WP3/7c:  key  issue  is  sparsity   of  large  vocabularies  when  used  for  record   matching.  No  single  solu2on  here. • Integra2ng  techniques  from  WP4  (e.g.   clustering,  data-­‐mining)  cri2cal  for  applying   large  and  chao2c  vocabularies  for  prac2cal   recommenda2ons. 23Monday, March 26, 2012