Vanessa lopez linked data and search

752 views

Published on

Dublinked Technical Workshop - Linked Data & Search by Vanessa Lopez

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
752
On SlideShare
0
From Embeds
0
Number of Embeds
137
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Vanessa lopez linked data and search

  1. 1. IBM Research – Ireland Linked  Data  and  Search   Vanessa  Lopez   Smarter  Ci*es  Technology  Centre     IBM  Research  Ireland   © 2012 IBM Corporation
  2. 2. IBM Research – Ireland Background:  Why  Linked  Data   Provides  explicit  seman9cs   Extensible   Interoperability-­‐focused:  to  enable  automa9c  discovery  and  inges9on   Large  exis9ng  corpora   Fundamentally  incremental  (like  the  Web)   W3C  standard  representa9on  and  common  format   Government  push  (e.g.  data.gov,  data.gov.uk,  Linked  Government  Data)   © 2012 IBM Corporation
  3. 3. IBM Research – Ireland Yes,  yes..  Richer  structured  queries   but  ..       ..  Limited  usability  for  both  data   publishers  and  consumers     © 2012 IBM Corporation
  4. 4. IBM Research – Ireland How can  we  help  users  in  querying     and  exploring  the  Seman9c  Web  content?     © 2012 IBM Corporation
  5. 5. IBM Research – Ireland State  of  the  art   •  Seman9c  search  over  messy,  heterogeneous   data  and  mash-­‐ups   •  Exploratory  and  Faceted  systems   •  Query  Builders  and  rela9onship  finders     •  Ques9on  Answer  over  Linked  Data  sources   •  Google  knowledge  graph     hVp://technologies.kmi.open.ac.uk/poweraqua   © 2012 IBM Corporation
  6. 6. IBM Research – Ireland State  of  the  art   © 2012 IBM Corporation
  7. 7. IBM Research – Ireland Linked  Data  and  Search  -­‐  Problem  domain:   What  makes  City  Data     so  special?   How  can  we  make  it  more   accessible?     © 2012 IBM Corporation
  8. 8. IBM Research – Ireland Seman9c  processing  of  urban  data   –  why  is  different?   •  How  can  we  go  from  raw  data  to  insight  into   the  opera9on  of  a  city  with  minimal  effort?   Return-­‐on-­‐Investment   (because  data  integra9on  is  expensive)   Fit-­‐for-­‐all   (ci9zen  engagement)   © 2012 IBM Corporation
  9. 9. IBM Research – Ireland Challenges:  Big  city  data   Volume   Velocity   •  Lots  of  relevant   informa*on   •  Not  linked  to   authorita*ve  sources   •  Streams   •  Frequent  updates   Variety   Veracity   •  Different  models  and  file   formats   •  Open  domain  -­‐  Unknown   schema   •  Diverse  sources   •  Difficult  to  do  assess   quality   © 2012 IBM Corporation
  10. 10. IBM Research – Ireland Business  case:  open  data  as  a  means  to  an  end   © 2012 IBM Corporation
  11. 11. IBM Research – Ireland Business  case     •  Why  are  ambulances  late?   Sources  of  informa*on   •  100’s  of  datasets  from  four  municipal  authori9es  in  Dublin   •  Most  sta9c,  some  dynamic   •  Social  Media:  twiVer,  LiveDrive,  even_ul,  eventBright,  …   •  Linked  Data:  DBpedia,  ..   •  Vocabularies:  IPSV,  FOAF,  VOID,  PROV,  DCAT,  WSG   Domain  of  informa*on   •  Loca9ons  of  Health  Services   •  Ambulance  call  outs  and  response  9mes   •  Tweets  about  traffic  conges9on   •  Geo-­‐located  tweets  about  people  movement   •  Road  network   •  Event  Web  Services   •  …   © 2012 IBM Corporation
  12. 12. IBM Research – Ireland Issues   •  Linked  Data  to  enrich  data  and  give  contextual   insight  for  publishers  and  consumers:   –  Publish  (vocabularies,  annota9on)   –  Discovery  and  Search  (metadata  /  cataloguing,   full-­‐text  indexing,  seman9c  en99es)   –  Link  (schema  alignment,  linked  data,  social  media)   –  Extract  interes9ng  views   –  Reason  (diagnose  traffic  problems)   Ubiquitous  aspects:  Provenance,  Governance,  Performance,  Security,  Privacy     © 2012 IBM Corporation
  13. 13. IBM Research – Ireland Approach–  Data  model   Documents  +   Metadata   Structure   Tabular  Graph   C1  a  Cell   C1  inRow  r1   C1  value  “name”    …   En**es   En9ty  Graph   e1  a  En9ty   e1  inRow  r1   e1  inCol  c2    …   Links   Views   Annota9on  Graph   Mapping  Graph   e1  a  En9ty   e1  a  En9ty   e1  rdfs:label  “name”   e1  sameAs  e2   e1  addr  “X  st”   …   e1  lat  :53.23”      …   Pay-­‐as-­‐you-­‐go,  Gain-­‐as-­‐you-­‐go   •  •  •  •  Structured  metadata  -­‐>  Queries  over  the  metadata   Files  into  a  standard  representa9on  -­‐>  Queries  over  the  data.   Par9ally  integrate  schemata  -­‐>  Queries  across  datasets.   Integrate  globally  -­‐>  Queries  across  Web  data   © 2012 IBM Corporation Insight  
  14. 14. IBM Research – Ireland Discovery:  Publishing  and  Cataloguing   •  METADATA   –  Many  data  publishers  and  disconnected  datasets   –  Link  metadata  using  domain  vocabularies:  IPSV   –  Convert  to  simple  RDF  format     Vocabulary  matching   IPSV   © 2012 IBM Corporation
  15. 15. IBM Research – Ireland © 2012 IBM Corporation
  16. 16. IBM Research – Ireland Search  and  linking   •  Full  text  indexing  for  search  over  metadata  and  content   •  En9ty  linking  and  naviga9on  (keywords,  categories,   publishing  agencies,  regions,..)   •  Open  metadata  and  vocabularies  (VOID,  PROV,  etc)  for   data  discovery  and  linking   •  Mining  descrip9ons  (Dbpedia  spotlight)     Open  metadata   Full  text  indexing   En9ty  linking   Mining  descrip9ons   © 2012 IBM Corporation
  17. 17. IBM Research – Ireland Faceted  search:  “beaches  in  Fingal”   © 2012 IBM Corporation
  18. 18. IBM Research – Ireland © 2012 IBM Corporation
  19. 19. IBM Research – Ireland Content  integra9on   •  Incrementally  lij  data  content  (beyond  search  to   querying  across  datasets  content)   –  Extract  en99es  represented  in  RDF  (PAYGO)   –  Label  extrac9on  and  annota9on   –  Link  when  we  have  higher  confidence  (lat,  long)   –  Geo-­‐coding  and  taxonomy  of  tweets  (traffic)   Geocoding   Label  extrac9on   Minimal  Entry  cost   Provenance-­‐based  dataset  ranking   © 2012 IBM Corporation
  20. 20. IBM Research – Ireland Views   •  Beyond  search  to  guiding  the  user  to  create   meaningful  views:   –  Guide  the  users  to  annotate  data,  recommend   related  datasets  and  create  dataviews  on  the  fly   –  Ranking  and  context-­‐based  recommenda9ons   –  Allow  seman9c  based  analysis  on  mul9ple  views     Hidden  informa9on  discovery   Cross  domain  queries   Mul9ple  endpoints   Mul9ple  interpreta9ons   © 2012 IBM Corporation
  21. 21. IBM Research – Ireland Demo   •  Currently:  Web  services  and  technology   demonstrator   •  Next:  Open  RDF-­‐based  data  management  deployed   in  Dublin  City  (read/write).  Deployment  of  traffic   diagnoser.   •  SPUD:  Seman*c  Processing  of  Urban  Data   (2nd  prize  at  the  Seman*c  Web  Challenge  –  ISWC)   •  Live  demo:  www.dublinked.ie/sandbox/Seman9cWebChall     Spyros  Kotoulas,  Vanessa  Lopez,  Raymond  Lloyd,  Marco  Luca  Sbodio,  Freddy   Lecue,  Mar;n  Stephenson,  Elizabeth  Daly,  Veli  Bicer,  Aris  Gkoulalas-­‐Divanis,   Giusy  Di  Lorenzo,  Anika  Schumann,  Denis  PaFerson,  and  Pol  Mac  Aonghusa     © 2012 IBM Corporation
  22. 22. IBM Research – Ireland Thank  you!     Reference  Publica9on:   •  QuerioCity:  A  Linked  Data  PlaZorm  for  Urban  Informa*on  Management   V.  Lopez,  S.  Kotoulas,  M.  L.  Sbodio,  M.  Stephenson,  A.  Gkoulalas-­‐Divanis,   P.  Mac  Aonghusa.  In  Use  track  at  the  11th  Interna;onal  Seman;c  Web   Conference  (ISWC).   City  Fabric  Team:   © 2012 IBM Corporation

×