NoTube: Models & Semantics
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
469
On Slideshare
469
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Monday, March 26, 2012
  • 2. WP1  Overview • “Backend” shared datasets and services • Mappings, integration and common vocabulary • Extra datasets to support usecase scenarios 2Monday, March 26, 2012
  • 3. WP1:  Year  3  Direc2on  &  Achievements • Moving  from  single  ‘warehouse’  to  distributed   set  of  databases,  datasets  and  services • Planning  for  sustainable  life-­‐aFer-­‐project • Integra2ng  feedback  from  end-­‐to-­‐end  demos 3Monday, March 26, 2012
  • 4. 4Monday, March 26, 2012
  • 5. Why  WP1?  two  roles • NoTube  internal:  a  hub  for  data  sharing • NoTube  external:  show  how  shared  datasets   and  vocabularies  help  with  user-­‐facing  “Web   and  TV”  problems • “show”  -­‐cri2cally-­‐  includes  “thinking  out  loud”   as  we  explore,  via  blog,  email,  twiTer  etc. – scholarly  ar2cles  rarely  reach  our  target  audiences 5Monday, March 26, 2012
  • 6. Outreach  message • Let  metadata  flow  widely  -­‐  adver2sing  content,   rather  than  be  a  hidden  asset • Iden/fy  and  link  content  with  useful  URLs(*) • Open  APIs  to  control  TV  and  link  devices  [WP7c] ...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011 WP1 concerned primarily with the first two: getting metadata into the Web from source, rather than scraping, guessing, approximating. 6Monday, March 26, 2012
  • 7. Aside:  RDFa  went  mainstream • Try  ‘View  source’  on  IMDB,  RoTen  Tomatoes,   BBC,  tv.com  sites  to  find  RDF  descrip2ons  of   TV  content.   • NoTube’s  approach  was  to  lead  by  example,   to  engage  with  industry  and  to  plan  from  the   beginning  for  the  ‘aFerlife’. • This  strategy  worked. 7Monday, March 26, 2012
  • 8. Facebook OGP tv.com The Wire page ...simple, extensible standards are being adopted OGP since 2010; schema.org since 2011... 8Monday, March 26, 2012
  • 9. TV  Data  Warehouse • We  s2ll  host  several  crawls  of  TV  EPG  data • Trend  is  for  data  to  be  more  cleanly  available   from  source,  without  scraping • Crawling,  aggrega2on  and  integra2on  s2ll   useful,  but  less  scraping  required • Crawled  data  warehouse  also  used  as  a   research  testbed  collec2on 9Monday, March 26, 2012
  • 10. WP1:  Example  Datasets   • WP7c/WP3  use  DBpedia/Wikipedia  URLs  for   topics;  covers  all  mainstream  areas.     • BBC  also  using  Lonclass/UDC  topic  codes   (we’re  helping  prepare  this  for  sharing) • For  Music,  we  adopt  MusicBrainz  IDs • Mapping  diverse  representa2ons  of  ‘genre’ • “Organic”  item/topic  similarity  measures   derived  from  user  data  from  WP3 10Monday, March 26, 2012
  • 11. WP1:  Data  Services • Data  Services  exposed  as  sta2c  files: – Show  how  to  embed  RDFa  in  HTML – Publish  as  RDF/XML  Linked  Data • Interac2ve  Data  Services: – Using  W3C  SPARQL,  SQL  or  SOLR/Lucene,  over   HTTP  and/or  XMPP. 11Monday, March 26, 2012
  • 12. WP1:  Exploita2on  and  Sustainability • WP1’s  approach  designed  to  outlive  NoTube • Use,  augment  and  contribute  to  external  data – e.g.  DBpedia,  Archive.org,  W3C  &  wider  Web  of   data  trend  (e.g.  RDFa  adop2on) – also  we  demonstrate  e.g.  on  blog  how  we  did  it  -­‐   so  others  can  replicate  it – WP4  enrichments  can  be  fed  back  to  externals,   e.g.  similarity  metrics  &  clusters 12Monday, March 26, 2012
  • 13. WP1:  Sustainability  2 • NoTube’s  2010  W3C  “Web  &  TV”  posi2on   paper  lobbied  for  unique  IDs  &  public   metadata  for  video  content;  this  is  now  going   mainstream. • VUA  will  con2nue  hos2ng  some  data,  using   PURL.org  so  can  pass  e.g.  to  W3C  later. • Collab  with  Facebook  OGP  (helped  with  their   RDFa  adop2on)  and  now  search  engines   Schema.org  (RDFa  and  extending  TV  vocab). 13Monday, March 26, 2012
  • 14. schema.org 14Monday, March 26, 2012
  • 15. Workpackage  Links • Background  data  for  all  Workpackages • Collaborated  with  WP2  on  BMF  RDF  models • Closer  2es  throughout  WP3/7  developments • WP4  en2ty  and  topic  URIs  point  to  WP1 • Outreach  work  around  RDFa,  Posi2on  Paper   15Monday, March 26, 2012
  • 16. 2nd  review  comments • Not  clear  though  how  this  work  has  built  upon  the  results  of  year  1,   and  how  the  current  progress  is  in  line  with  the  case  studies.   – Worked  more  closely  and  pragma1cally  with  case  studies  in   WP7,  especially  7c  and  related  WP3  work.  Moved  towards  more   decentralised  model,  instead  of  warehouse. – 7c  collabora1on  with  KMIs  Watch  and  Buy  scenario,  and  with   WP4  1med  ad  inser1on  work,  used  EU  p2pnext  limo  work;  also   egtaMETA  from  EBU  from  7c – WP1  work  became  more  "hands-­‐on";  we  helped  WP7  extract   datasets  such  as  TED.com  and  Archive.org  which  we  expect  will   shortly  be  replaceable  by  cleaner  informa1on  from  official   sources.   16Monday, March 26, 2012
  • 17. 2nd  review  comments • No  relevant  state  of  the  art  is  documented  and  no  details  or   cita<ons  on  automated  algorithms  are  given.  Evalua<on  is   restricted  to  examples  and  no  quan<ta<ve  data  are  given. – We  accept  weakness  in  report  (lack  of  scholarly/ scien1fic  detail);  chose  to  focus  on  more  informal   communica1on  with  outside  world  in  final  phase.  A  2nd   version  of  the  doc  was  produced,  but  main  changes   were  around  life  aUer  project  themes  rather  than   adding  more  scien1fic  and  scholarly  detail. 17Monday, March 26, 2012
  • 18. 2nd  review  comments •  A  close  collabora5on  with  WP7  is   recommended  in  order  to  ensure  that  work   meets  the  requirements  of  the  use  cases. – this  very  well  describes  our  emphasis  in  final   phase 18Monday, March 26, 2012
  • 19. Lessons  Learned • Its  hard  to  simulate  an  evolving  global  data   ecosystem;  but  weve  played  a  small  part  in   some  huge  changes. • Publishers  will  adopt  simple  Seman2c  Web   standards  when  they  are  given  an  incen5ve. • Its  hard  for  a  4-­‐year  old  plan  to  stay  relevant   in  such  an  environment;  ability  to  be  agile  was   cri2cally  important. 19Monday, March 26, 2012
  • 20. WP1  Summary • Used  open  standards  (RDF)  and  largely  open  data  (e.g.   Wikipedia/DBpedia) • Integrated,  mapped  and  data-­‐mined • Contribu1ng  our  addi1ons  back  to  the  community  /   commons  (highlight:  BBC  sims) • Documen1ng  what  we  learned  for  external  developers  and   subsequent  projects Questions? 20Monday, March 26, 2012
  • 21. 21Monday, March 26, 2012
  • 22. 22Monday, March 26, 2012
  • 23. WP1:  End-­‐to-­‐End  issues • In  final  year,  our  End-­‐to-­‐End  scenarios  have   more  mature  implementa2ons • Feedback  from  WP3/7c:  key  issue  is  sparsity   of  large  vocabularies  when  used  for  record   matching.  No  single  solu2on  here. • Integra2ng  techniques  from  WP4  (e.g.   clustering,  data-­‐mining)  cri2cal  for  applying   large  and  chao2c  vocabularies  for  prac2cal   recommenda2ons. 23Monday, March 26, 2012