Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Data Generation Process

1,488 views

Published on

Lesson from Raúl García-Castro on "Linked Data Generation Process" in the 1st Summer School on Smart Cities and Linked Open Data (LD4SC 2015).

Published in: Science
  • Be the first to comment

Linked Data Generation Process

  1. 1. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   1st  Summer  School  on     Smart  Ci2es  and  Linked  Open  Data  (LD4SC-­‐15)   Linked  Data  Genera=on  Process   Raúl  García-­‐Castro,    Filip  Radulovic,  Oscar  Corcho,  María  Poveda,   Víctor  Rodríguez-­‐Doncel,  Asunción  Gómez-­‐Pérez,  Daniel  Vila-­‐Suero   Presenter:  Raúl  García-­‐Castro  
  2. 2. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Index   •  Linked  Open  Data  in  Smart  Ci2es   •  Guidelines  for  the  Genera=on  of  Linked  Data   •  Discussion   •  Hands-­‐on  Descrip=on   2  
  3. 3. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Data  in  smart  ci=es   hQp://br.fiberhomegroup.com/pt/Enterprise/324/2282.aspx   3  
  4. 4. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   •  For  example,  (re)using  open  transport  data   –  Provide  travel  informa=on  to  persons   –  Allow  beQer  mul=modal  route  planning   –  Facilitate  public  transport  management   –  …   –  Accessibility   •  Which  metro  accesses  are  accessible  for  wheelchair  users?   •  In  which  bus  stops  is  it  safer  and  more  convenient  for  a   wheelchair  user  to  wait?   •  Is  there  any  accessible  parking  space  nearby  a  bus  stop?   •  etc.   Open  data…  for  what?   4  
  5. 5. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Legal  framework  and  open  data  ini=a=ves   •  Aarhus  Conven=on  (1998)   –  Right  to  par=cipa=on  and  access;  41  countries  and  the  EU   •  Open  Access  Ini=a=ve  (2001)   –  Scien=fic  informa=on  on  the  Web;  >  510  organisa=ons   •  PSI  Direc=ve   –  PSI  Reuse  (2003/98/EC)   •  Conven=on  for  the  access  to  official  documents  (2009)   –  Signed  by  12  countries   –  Belgium,  Finland,  Norway,  Sweden,  Hungary,  Estonia,  Lithuania,  Slovenia,  Georgia,   Montenegro,  Serbia  and  Macedonia   •  Law  37/2007.  PSI  Reuse   •  Law  11/2007.  Ci=zen  access  to  public  services  and  right  to  the  quality  of  services   •  RD  4/2010  Na=onal  Interoperability  Scheme   –  Open  standards   –  Technology  neutral   –  Open  source  solware   •  RD  1495/2011  It  develops  law  37/2007   •  Norma  Técnica  de  Interoperabilidad  (19/02/2013,  BOE  4/3/2013)   Adapted  from  Antonio  Rodríguez  Pascual  (IGN)   5  
  6. 6. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   The  problem:  lack  of  interoperability   Publish   Extract   Publish   Extract   Publish   Extract   I  want  to  publish  data  in   an  interoperable   structure  and  format   I  use  GTFS   I  use  my  own  CSV   structure   I  provide  a  web   service   Build  an  app  that  is   available  all  over  the   world   6  
  7. 7. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Scenario:  open  transport  data     Is  there  any  open  transport   data  already?     We  are  surrounded  by  them   7  
  8. 8. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Open  data  and  how  they  are  published   1)  In  no2ce  boards   –  For  those  who  have  a  lot  of  free  =me   –  Or  those  who  are  there  at  the  right  moment  in  =me   Adapted  from  Antonio  Rodríguez  Pascual  (IGN)   DATA   8  
  9. 9. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain    Open  data  and  how  they  are  published   2)  In  web  pages  and  mobile  apps   –  For  people   Adapted  from  Antonio  Rodríguez  Pascual  (IGN)   On  the  Web,  open  license   DATA   9  
  10. 10. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain    Open  data  and  how  they  are  published   2)  In  web  pages  and  mobile  apps   –  For  people   Adapted  from  Antonio  Rodríguez  Pascual  (IGN)   On  the  Web,  open  license   DATA   Machine-­‐readable   Non-­‐proprietary  format  
  11. 11. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain    Open  data  and  how  they  are  published   3)  As  web  files   –  So  that  they  can  be  loaded  by  humans  in  their   informa=on  systems  (XML,  HTML,  CSV,  etc.)   –  Hopefully  it  is  not  a  scanned  PDF   Adapted  from  Antonio  Rodríguez  Pascual  (IGN)   On  the  Web,  open  license   DATA   Machine-­‐readable   Non-­‐proprietary  format   11  
  12. 12. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain  Adapted  from  Antonio  Rodríguez  Pascual  (IGN)    Open  data  and  how  they  are  published   4)  Via  web  services   –  For  humans  and  machines   –  It  allows  genera=ng  added-­‐value  services   –  And  can  be  integrated  in  the  applica=on  business  logic   On  the  Web,  open  license   DATA   Machine-­‐readable   Non-­‐proprietary  format   12  
  13. 13. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   What  is  open  data?   •  Open  data  are  data  that  can  be  freely  used,  reused  and   redistributed  by  anyone  -­‐  subject  only,  at  most,  to  the   requirement  to  a9ribute  and  sharealike.   •  The  most  important  aspects  to  consider:   –  Availability  and  Access:  data  must  be  available  as  a  whole  and  at  no   more  than  a  reasonable  reproduc2on  cost,  preferably  by   downloading  over  the  Internet.  Data  must  also  be  available  in  a   convenient  and  modifiable  form.   –  Reuse  and  Redistribu2on:  data  must  be  provided  under  terms  that   permit  reuse  and  redistribu2on  including  the  intermixing  with  other   datasets.   –  Universal  Par2cipa2on:  everyone  must  be  able  to  use,  reuse  and   redistribute  -­‐  there  should  be  no  discrimina2on  against  fields  of   endeavour  or  against  persons  or  groups.  For  example,  ‘non-­‐ commercial’  or  ‘only  in  educa=on’  restric=ons.   Source:  Open  Data  Handbook   13  
  14. 14. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Scenario:  open  transport  data     Is  there  any  open  transport   data  already?     Can  we  do  it  beSer?   14  
  15. 15. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Going  into  4  and  5            Linked  Data   Make  it  available  as  structured  data  (e.g.,  Excel  instead  of  image  scan  or  a  table)   Use  non-­‐proprietary  formats  (e.g.,  CSV  instead  of  Excel)   Use  URIs  to  iden2fy  things,  so  that  people  can  point  at  your  stuff   Link  your  data  to  other  data  to  provide  context   Make  your  stuff  available  on  the  Web  (whatever  format)  under  an  open  license   15  
  16. 16. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   USE  URIs  +  RDF   RDF  standards   José   Mobility   impairment   Boardgames   API   Mirasierra   Ven=squero  de   la  Condesa   Yes   CSV   Mega  Games   Ven=squero  de   la  Condesa   Yes   CSV   Mega  Games   Conquer  &   Smash!   MG 29,95   HTML   José   Mobility   Impairment   hasImpairment   Wheelchair Accessibility   requires   Boardgame   likes   Mirasierra   address   Ven=squero  de   la  Condesa   Wheelchair Accessibility   hasAccessibility   Mega   Games   address   hasAccessibility  Wheelchair Accessibility   Ven=squero  de   la  Condesa   Mega   Games   Conquer  &   Smash!   is  a   Boardgame   sells   API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  
  17. 17. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Link  your  data   Linked  RDF   José   Mobility   impairment   Boardgames   Mirasierra   Ven=squero  de   la  Condesa   Yes   Mega  Games   Ven=squero  de   la  Condesa   Yes   Mega  Games   Conquer  &   Smash!   MG 29,95   API   CSV   CSV   HTML   José   Mobility   Impairment   hasImpairment   Wheelchair Accessibility   requires   Boardgame   likes   Mirasierra   address   Ven=squero  de   la  Condesa   Wheelchair Accessibility   Mega   Games   address   hasAccessibility  Wheelchair Accessibility   Mega   Games   Conquer  &   Smash!   is  a   hasAccessibility   Boardgame   Ven=squero  de   la  Condesa   sells   API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  
  18. 18. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Wheelchair Accessibility   Ven=squero  de   la  Condesa   Boardgame   Link  your  data   Linked  RDF   José   Mobility   impairment   Boardgames   Mirasierra   Ven=squero  de   la  Condesa   Yes   Mega  Games   Ven=squero  de   la  Condesa   Yes   Mega  Games   Conquer  &   Smash!   MG 29,95   API   CSV   CSV   HTML   José   Mobility   Impairment   hasImpairment   Wheelchair Accessibility   requires   Boardgame   likes   Mirasierra   address   Ven=squero  de   la  Condesa   hasAccessibility   Wheelchair Accessibility   Mega   Games   address  Ven=squero  de   la  Condesa   hasAccessibility  Wheelchair Accessibility   Mega   Games   sells   Conquer  &   Smash!   is  a   Boardgame   API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  
  19. 19. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Make  complex  queries   Where  can  I  buy  the   Conquer  &  Smash!   game?   Which  are  the  most   accessible  routes  for   Christmas  shopping?   Expansion  pack  for  Conquer  &  Smash!   Take  metro  line  9  and  in  35  minutes   we  can  demo  it  to  you!   Or  beQer  take  bus  231  because  it   is  sunny  and  you  can  take  a   glance  at  the  outdoor  art   exhibi=on  in  Plaza  de  Cas=lla   MG
  20. 20. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Using  Linked  Open  Transport  Data   •  Calculate  accessible  routes   –  Combined  with  geographical  data  (IGN)   –  Which  stop  should  I  use  if  I  have  mobility  problems?   •  Commercial  routes  by  bus   –  Combined  with  Madrid’s  shop  census  (from  Ayto.  Madrid)   •  Geomarke=ng  decisions  for  enterpreneurs   –  Where  should  I  open  my  shop?  Based  on  the  combina=on  of   the  number  of  travellers  per  stop,  demographic  data,  data   about  other  businesses  and  shops  around,  etc.   •  Personalised  offers  to  travellers   –  With  real-­‐=me  data  and  data  about  consump=on  paQerns   (e.g.,  credit  card  transac=ons)   •  …   20  
  21. 21. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Index   •  Linked  Open  Data  in  Smart  Ci=es   •  Guidelines  for  the  Genera2on  of  Linked  Data   •  Discussion   •  Hands-­‐on  Descrip=on   21  
  22. 22. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  life  cycle   Specification Modelling GenerationPublication Exploitation Linking 22  
  23. 23. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Requirements  (smart  ci=es  domain)     1.  Tabular  formats  (i.e.,  SQL,  XLS  or  CSV)   –  Other  data  structures  (e.g.,  XML)  less  important  in  prac=ce   or  are  unstructured  and  would  require  much  more  work   2.  Changing  data  (dynamic  or  streaming  data),  versioning,   (automa=c)  data  quality  assurance  and  reliability   3.  Data  access  through  web  services,  proprietary  APIs  and   data  files   4.  Legal  aspects  (e.g.,  licensing,  data  ownership)   5.  Access  rights  management  or  mechanisms  for   extrac=ng  public  data  (plenty  of  confiden=al  data)   23  
  24. 24. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  genera=on  process   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology 24   F.  Radulovic,  M.  Poveda-­‐Villalón,  D.  Vila-­‐Suero,  V.  Rodríguez-­‐Doncel,  R.  García-­‐Castro  and  A.  Gómez-­‐  Pérez,  Guidelines  for  Linked  Data  genera=on  and  publica=on:   An  example  in  building  energy  consump=on,  Automa=on  in  Construc=on,  Special  Issue  on  Linked  Data  in  Architecture  and  Construc=on.  Available  online  April  2015.  
  25. 25. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  genera=on  process   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology DATA PREPARATION 25  
  26. 26. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Select  data  source   •  Select  the  data  source  that  will  be  transformed   into  Linked  Data   •  Steps:   – To  define  the  requirements  for  selec=on   – To  select  one  or  several  data  sources   •  The  data  set  may  be:   – Owned  by  your  organiza=on…   – …  or  not  (external  data  sources)   26  
  27. 27. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Select  data  source  –  LCmple   •  Requirements   –  Real-­‐world  scenario  in  the  smart  city  domain     –  Available  for  use   –  Available  in  machine-­‐processable  format  (the  more   structured  the  data  are,  the  beQer)   –  Can  be  linked  with  generic  en==es  (e.g.,  loca=on)   •  Leeds  City  Council  –  energy  consump=on     –  hQp://data.gov.uk/dataset/council-­‐energy-­‐consump=on   27  
  28. 28. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Obtain  access  to  data  source   •  Data  access  means     –  Technical  means  to  retrieve  the  data   –  Legal  rights  to  use  the  data   •  If  the  data  is  not  accessible:   –  To  iden=fy  the  person  to  contact   –  To  request  the  access   –  To  obtain  access  and  to  retrieve  the  data   •  Access  alterna=ves:     –  file,     –  programming  interface,     –  database,     –  data  stream,     –  etc.   28  
  29. 29. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Obtain  access  to  data  source  –  Lample   •  Data  set  already  available  as  a  CSV  file   29  
  30. 30. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analysing  licensing  of  the  data  source   •  Licenses  specify  the  legal  terms  under  which  a  data  set  can  be  used   and  exploited     •  Neither  legal  prescrip=ons  on  how  to  declare  licenses  nor  common   standard  prac=ces  to  do  so     •  Steps  (not  automatable):   –  To  iden=fy  the  rightsholder  and  the  authorita=ve  publisher   •  Righstholder  vs.  authorized  distributor   –  To  find  the  applicable  license   •  Web  page,  data  set  metadata,  data  themselves   •  Contact  the  publisher   –  To  read  the  license  and  analyse  legal  terms   •  Tips   –  Analysis  should  be  performed  upon  all  copies  and  formats  of  the  data   –  Ensure  license  compa=bility  when  integra=ng  several  data  sources   30  
  31. 31. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  resources  can  be  protected   Ontologies are intellectual works, they can be protected by copyright RDF Datasets can be considered as databases, also legally protected in the EU 31  
  32. 32. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Create, consume, aggregate, derive and publish Linked Data in a lawful environment 0 Always  license  your  data   …   Data  shops   Government   Individuals   32  
  33. 33. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Licensed  Linked  Data   Non-­‐licensed  Linked  Data   Licensed  Linked  Data   +License Unless there is a license allowing to do so, the resource cannot be copied, modified or published. In practice, non-licensed resources are useless in industrial settings Licensed Linked Data can be used 33  
  34. 34. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Licensed  Linked  Data  in  prac=ce   Linked Open Data Published Open License (Published) Linked Data Published No Open License Linked Data Not Published No Open License 34  
  35. 35. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   ç Guidelines  for  licensing  linked  data   35   Add  "rights"  metadata  in  the  dataset  descrip=on   (e.g.,  VoID,  DCAT)  1   Use  standard  predicates  to  declare  "rights"  statements     (e.g.,  Dublin  Core  terms:  dc:rights,  dct:license)  2   ? Use  rights  declara2on   language,  e.g.,  ODRL   Yes Use  URI  of  standard   license    e.g.,  CC0   3b  3a   No Standard license available ODRL   Open  Digital  Rights  Language   DCAT   Data  catalog  vocabulary  
  36. 36. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Licensing  Linked  Data  is  Simple…   The  Bri=sh  Na=onal  Bibliography  (BNB)  lists  the  books   and  new  journal  =tles  published  or  distributed  in  the   United  Kingdom  and  Ireland  since  1950.   J   36  
  37. 37. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   …  or  complex  depending  your  needs   Policies  can  be  expressed  with  ODRL  2.0  to  govern  access  to  Linked  Data   Example  of  access  to  Linked  Data  for  a  price  (15EUR  for  the  dataset  or  0.01EUR  for  a  triple  thereof)   @prefix gr: <http://purl.org/goodrelations/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . <http://salonica.dia.fi.upm.es/ldr/policy/cdaddba4-fc2e-4ee0-a784-e62f1db259bf> a odrl:Set ; rdfs:label "License Offering Paid Linked Data" ; odrl:permission [ a odrl:Permission ; odrl:target <http://example.org/dataset/ds01> ; odrl:action odrl:reproduce ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement dcat:Dataset ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "15,00 EUR" ] ] , [ a odrl:Permission ; odrl:action odrl:reproduce ; odrl:target <http://example.org/dataset/ds01> ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement rdf:Statement ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "0,01 EUR" ] ] .. The target can be an ontology, a dataset, a SPARQL endpoint… …or a SPARQL query itself or a triple pattern: {mysubject, ?p , ?o} 37  
  38. 38. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   And  you  have  support  for  that   •  Condi=onal  access  to  Linked  Data   –  hQp://condi=onal.linkeddata.es   •  Dataset  of  licenses  in  RDF   –  hQp://rdflicense.appspot.com   •  ODRL  Profile  for  Linked  Data   –  hQp://purl.oclc.org/NET/ldr/ns#   –  hQps://www.w3.org/community/odrl/profile/linkeddata/     38  
  39. 39. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  licensing  –  LCmple   39  
  40. 40. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  data  source   •  Get  insight  into  the  data  structure  and  organiza=on   •  Steps:   –  To  analyse  the  characteris=cs  of  the  data   •  Data  values,  data  ranges,  etc.   –  To  obtain  the  schema  of  the  data   •  Concepts  and  their  rela=onships   •  Data  can  be  available  as:     –  Structured  data   –  Unstructured  data   •  If  the  schema  does  not  exist:     –  Use  a  standard  modeling  language  for  describing  the  data   schema  (e.g.,  UML)   40  
  41. 41. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  data  source  –  LCmple   •  Metadata  not  quite   descrip=ve:   –  Different  types  of  council   sites  (mostly  buildings)   –  Electricity,  gas  and  oil   consump=ons     –  1-­‐year  intervals  -­‐   2010/11,  2011/12,   2012/13   •  Analysis  required   contac=ng  with  people   from  LCC  open  data   41  
  42. 42. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  data  source  –  LCmple   42   hQp://localhost:3333/  
  43. 43. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  data  source  –  LCmple   43  
  44. 44. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Analyse  data  source  –  LCmple   •  Analyse  the  characteris=cs  of  data  using  facets   •  Obtain  the  schema  of  the  data   44  
  45. 45. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Data  characteris=cs  and  schema  –  LCCLLIDD   Column   Type   Comments  /  Range  (rounded)   Problems   uprn   String   Not  unique,  empty  values   Site  Name   String   Unique?     Site  types  +  name   4  repeated  sites   Address  2   String   Not  unique,  empty  values   Address  3   String   Not  unique,  empty  values   Village?  Civil  Parish?     Address  4   String   Not  unique,  empty  values   City?  Metropolitan  district?   “leeds”  vs  “Leeds”   PostCode   String   Not  unique,  empty  values   Electricity  10/11   Decimal   0  —  2.700.000   Electricity  11/12   Decimal   0  —  2.300.000   Electricity  12/13   Decimal   0  —  2.400.000   Gas  10/11   Decimal   -­‐100,000  —  6,100,000   Nega=ve  values   Gas  11/12   Decimal   -­‐100,000  —  7,800,000   Nega=ve  values     Gas  12/13   Decimal   -­‐100,000  —  8,300,000   Nega=ve  values   Oil  12/13   Decimal   -­‐1,000,000  —  13,000,000   Nega=ve  values  45  
  46. 46. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  genera=on  process   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology DEFINE RESOURCE NAMING STRATEGY 46  
  47. 47. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hash  and  slash  URIs   •  Hash  URIs  (#)     –  hQp://www.energycompany.com/about#energyCompany   –  The  fragment  part  has  to  be  stripped  off  when  the  URI  is   requested  from  the  server  (i.e.,  the  resource  cannot  be   retrieved  directly)   –  Hash  URIs  can  be  used  to  iden=fy  non-­‐document  resources     •  Slash  URIs  (/)   –  hQp://www.energycompany.com/about/energyCompany   –  Imply  a  303  redirec=on  to  the  loca=on  of  a  document  that   represents  the  resource  (+  content  nego=a=on)     •  E.g.,  hQp://www.energycompany.com/about/energyCompany.rdf   –  Drawbacks:  HTTP  round-­‐trip,  redirects,  web  server   configura=on   47  
  48. 48. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hash  or  slash?   •  Depends  on  the  data  and  on  their  expected  use   •  Small  data:   –  Hash  namespace   –  Access  all  the  data  as  a  whole   –  HTTP  GET  would  return  a  single  informa=on  resource  with   everything     •  Large  /  frequently-­‐updated  /  modular  data:   –  Slash  namespace   –  Access  resources  individually  or  in  groups   –  Resource  descrip=ons  may  be  divided  among  many  informa=on   resources  or  may  be  managed  via  a  query  service  (e.g.,  SPARQL)   –  Progressively  greater  detail  about  resources  may  be  retrieved   through  mul=ple  accesses   48  
  49. 49. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Define  resource  naming  strategy   •  Steps:     –  To  choose  a  URI  form  (hash  or  slash)     –  To  choose  a  domain  for  the  URIs.     –  To  choose  a  path  for  the  URIs.     –  To  choose  a  paQern  for  ontology  classes  and  proper=es  in  the   ontology,  as  well  as  for  individuals   •  Tips:     –  One  URI  must  iden=fy  only  one  item  (e.g.,  avoid  mixing  with  web   pages  and  real-­‐world  objects)     –  URIs  should  be  persistent  and  should  not  change  over  =me  (e.g.,   state  informa=on);  PURL  may  support  this   –  Use  a  domain  that  is  under  your  control  (or  a  service  such  as   PURL)     –  Separate  the  ontology  model  from  its  instances     –  Define  meaningful  URIs   49  
  50. 50. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Resource  naming  strategy  –  LCC   •  Hash  URIs  for  ontological  terms,  slash  URIs  for  individuals   •  Domain:  hQp://smartcity.linkeddata.es/   •  Ontological  terms  path:     –  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#   •  Individuals  path:     –  hQp://smartcity.linkeddata.es/lcc/resource/   •  Ontological  terms  paSern:     –  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>   –  Ex.:  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#hasQuan=ta=veValue   •  Individuals  paSern:     –  hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>   –  Ex.:  hQp://smartcity.linkeddata.es/lcc/resource/LeisureCentre/WetJohnCharlesCentreforSport   50  
  51. 51. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  genera=on  process   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology DEVELOP ONTOLOGY 51  
  52. 52. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Ontology  development   6. Ontology implementation 5. Ontology selection 1. Requirements definition Can you represent all your data? 7. Ontology evaluation 2. Terms extraction 3. Ontology conceptualization 4. Ontology search 6.2 Ontology completion 3.1 Initial model drafting 3.2 Detailed model definition 6.1 Ontology integration You  did  this   yesterday   52  
  53. 53. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Ontology  development  –  LCCDD   53  
  54. 54. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  genera=on  process   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology TRANSFORM DATA 54  
  55. 55. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Data  transforma=on   •  Steps:   –  To  select  the  RDF  serializa=on   •  RDF/XML,  Turtle,  N-­‐Triples,  JSON-­‐LD   –  To  select  a  tool.  Depends  on:   •  The  format  of  the  data  (database,  spreadsheets,  etc.),     •  Concrete  needs  of  the  transforma=on  process  (e.g.,   dynamicity)   –  To  transform  the  data  into  RDF   •  Usually  requires  a  mapping  between  the  data  and  the   ontology   •  The  mapping  implements  the  resource  naming  strategy   –  To  evaluate  the  obtained  RDF  data:   •  Syntax,  Completeness,  Accuracy,  Conciseness,  Modelling,   Understandability,  Versa=lity,  Usage,  Licensing,  …   55  
  56. 56. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Data  transforma=on  tools   Database  to  RDF   Data  streams  to  RDF   •  morph-­‐RDB   •  D2R  Server   •  TopBraid  Composer   •  morph-­‐streams   •  D2R  Server   Spreadsheets  to  RDF     XML  to  RDF     •  TopBraid  Composer   •  Excel2RDF   •  RDF123   •  XLWrap   •  OpenRefine/LODRefine     •  XML2RDF   •  TopBraid  Composer   •  OpenRefine/LODRefine   56  
  57. 57. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Data  transforma=on  tools   Database  to  RDF   Data  streams  to  RDF   •  morph-­‐RDB   •  D2R  Server   •  TopBraid  Composer   •  morph-­‐streams   •  D2R  Server   Spreadsheets  to  RDF     XML  to  RDF     •  TopBraid  Composer   •  Excel2RDF   •  RDF123   •  XLWrap   •  OpenRefine/LODRefine     •  XML2RDF   •  TopBraid  Composer   •  OpenRefine/LODRefine   Overview  of   OpenRefine   57  
  58. 58. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   OpenRefine  basic  opera=ons   •  Installing     •  Crea=ng  a  new  project     •  Data  analysis     –  Exploring  data     –  Sor=ng  data     –  Face=ng  data     –  Filtering  data     •  Basic  data  transforma=on  (cleaning/preparing)     –  Columns:     •  Move     •  Rename     •  Remove  columns     •  Collapse  and  expand     •  Common  transforma=ons     –  Rows:   •  Remove  rows     •  Export  whole  project     58  
  59. 59. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Adding  derived  columns   Edit  column  à     Add  column  based  on  this  column...     59  
  60. 60. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Spli‚ng  data  accross  columns   Edit  column  à     Split  into  several  columns...     60  
  61. 61. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Handling  mul=-­‐valued  cells   Edit  Cells  à     Split  mul=-­‐valued  cells...     Edit  Cells  à     Join  mul=-­‐valued  cells...     61  
  62. 62. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Rows  and  records   Show  as:    rows    records   Record   Row   62  
  63. 63. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Clustering  similar  cells   Edit  cells  à     Cluster  and  edit...     63  
  64. 64. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Transposing  rows  and  columns   Transpose  à   Transpose  cells  across  columns  into  rows...     Transpose  à   Columnize  by  key/value  columns...     64  
  65. 65. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Other  useful  u=li=es   •  Regular  expressions     –  Java  regular  expressions   •  Custom  transforma=ons     –  General  Refine  Expression  Language  (GREL)   –  Jython  (Python  implemented  in  Java)   –  Clojure  (func=onal  language  that  resembles  Lisp)     65  
  66. 66. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   66   Using  the  project  history   •  Project  history:   – Access  opera=on  history     – Undo  opera=ons     – Extract  opera=ons  (in  JSON)   – Apply  opera=ons     •  Cau=on:     – Transforma=ons  are  registered  in   the  history;  filters  and  facets  are   not    
  67. 67. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Solving  memory  problems   hQps://github.com/OpenRefine/OpenRefine/wiki/FAQ:-­‐Allocate-­‐More-­‐Memory     67  
  68. 68. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   OpenRefine  RDF  extension  -­‐  RDF  skeleton   •  Resource  naming  strategy     – Ontological  terms  paQern:      hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>   – Individuals  paQern:      hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>   Add  base  URI   Add  prefixes   68  
  69. 69. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Crea=ng  individuals   schema:CivicStructure rdf:type lccRes:CouncilOfficesBelgraveHouse   69  
  70. 70. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Previewing  results   70  
  71. 71. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Adding  property  values   rdfs:label schema:CivicStructure xsd:string rdf:type lccRes:CouncilOfficesBelgraveHouse   rdfs:label “Belgrave  House” 71  
  72. 72. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Expor=ng  RDF   @prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix lcc: <http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CouncilOfficesBelgraveHouse> a schema:CivicStructure ; rdfs:label "Belgrave House" . <http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CommunityCentreTunstallRoad> a schema:CivicStructure ; rdfs:label "Tunstall Road" . Export  à   RDF  as  Turtle   72  
  73. 73. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Evalua=ng  the  exported  data   •  Manual  inspec=on   •  Syntax  evalua=on  (with  syntax  validator)   •  Consistency  with  the  ontologies  (with  reasoner)   •  Usage  evalua=on  (e.g.,  by  running  SPARQL   queries)   – Show  all  electricity  consump=ons  and  the  related   =me  periods  for  all  council  sites  related  to  culture   – Show  all  energy  consump=ons  and  the  related  =me   periods  of  council  sites  from  the  Wakefield  district   73  
  74. 74. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Index   •  Linked  Open  Data  in  Smart  Ci=es   •  Guidelines  for  the  Genera=on  of  Linked  Data   •  Discussion   •  Hands-­‐on  Descrip=on   74  
  75. 75. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   75   Richer  schema  (and  data)   time:Interval schema:City ssn:Observation ssn:observation SamplingTime ssn:SensorOutput ssn:ObservationValue ssn:hasValue ssn:FeatureOf Interest ssn:featureOf Interest lcc:hasQuantityValue :: xsd:decimal ssn:Property ero:FinalEnergy ssn:observed Property ssn:observation Result Legend Class datatype property :: datatype object property subclass of relation schema:CivicStructure lcc:uprn :: xsd:String dc:title :: xsd:String schema:PostalAddress schema:addressLocality :: xsd:String schema:addressRegion :: xsd:String schema:streetAddress :: xsd:String schema:postalCode :: xsd:String schema:address admingeo:District admingeo:district time:Instant time:inXSDDateTime :: xsd:dateTime time:hasBeginning time:hasEnd ero:Energy ConsumerFacility ero:consumes EnergyType om:Unit_of_measure lcc:hasQuantityUnitOf Measurement SupplyOrStorageSite OpenAirSite AccomodationSite AdministrativeSite OfficeSite EducationalSite SocialSite OtherSite CulturalSite schema:containedIn schema:Place schema:Administrative AreaLeisureSite
  76. 76. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Linked  Data  are  just  data   01000000 electric1011 01000000 electric1112 01000000 0 20 40 60 80 100 electric1213 Building Electrical consumption 0e+00 2e+06 4e+06 6e+06 8e+06 0 500000 1000000 1500000 2000000 Electricity Gas Electricity vs gas consumption 12/13 0.0e+00 4.0e+06 8.0e+06 1.2e+07 0 500000 1000000 1500000 2000000 Electricity Oil Electricity vs oil consumption 12/13 76  
  77. 77. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   77   Benefits  of  linking  data   resPlus$electricTotal 0e+00 2e+06 4e+06 6e+06 Total  electric  consump2on     Original  data     +  geoloca=on   resP Total  electric  consump2on  in  loca2ons  with   popula2on  >  20.000     Original  data     +  geoloca=on   +  popula=on  
  78. 78. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Benefits  of  reasoning   resPlus 25 50 75 10 Total  electric  consump2on  in  cultural  buildings   schema:CivicStructure CulturalSite Museum Library 78  
  79. 79. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Index   •  Linked  Open  Data  in  Smart  Ci=es   •  Guidelines  for  the  Genera=on  of  Linked  Data   •  Discussion   •  Hands-­‐on  Descrip2on   79  
  80. 80. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   What  are  we  going  to  do?   Specification Modelling GenerationPublication Exploitation Linking 80  
  81. 81. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   What  are  we  going  to  do?   Select data source Obtain access to data source Analyse data source Analyse licensing of the data source Define resource naming strategy Transform data source Link with other datasets Data source Access, data License Schema, data Resource naming strategy Ontology RDF data Linked dataset Ontology Develop ontology 81  
  82. 82. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hands-­‐on  task  1   •  Goal:  to  get  familiar  with  the  first  steps  in  the  Linked   Data  genera=on  process   •  The  students  will  have  to  take  their  selected  dataset(s)   and  perform  the  following  tasks:     –  Analyse  Data  Set   •  Both  the  data  (quan==es,  value  ranges,  etc.)  and  the  schema   –  Analyse  Licensing  of  the  Data  Source   •  Who  is  the  publisher  and  the  rightsholder?   •  What  is  the  licence?   •  Which  will  be  the  license  to  be  used  for  the  generated  dataset?   –  Define  Resource  Naming  Strategy   •  For  the  ontology  and  the  data  (URI  form,  content  nego=a=on,   URIs  domain,  path,  paQerns,  etc.)   –  Finish  Ontology  Development     •  Lightweight  ontology  (i.e.,  classes,  proper=es,  domains  and   ranges)   82  
  83. 83. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hands-­‐on  task  1  -­‐  Deliverables   •  A  document  that  includes:   – The  analyses  performed  over  the  data  source   – The  licensing  of  the  data  source  and  the   poten=al  license   – The  resource  naming  strategy  defined   •  An  OWL  file  with  the  ontology  developed,   according  to  the  resource  naming  strategy   defined     83  
  84. 84. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hands-­‐on  task  2   •  Goal:  to  get  familiar  with  the  transforma=on   of  CSV  data  into  RDF  using  LODRefine   •  The  students  will  have  to  take  their  selected   dataset(s)  and  perform  the  following  tasks:     –  Import  data  into  LODRefine   –  Analyse  and  fix  data     •  Analysis  performed  in  the  previous  class,  but  can  be   updated  with  new  findings   •  Fix  the  data  to  remove  errors     •  Transform  the  data  to  facilitate  RDF  genera=on   –  Export  data  to  RDF     •  Define  an  RDF  skeleton  for  the  data     •  Export  the  data  to  RDF  (Turtle  syntax)     84  
  85. 85. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   Hands-­‐on  task  2  -­‐  Deliverables   For  each  dataset:   •  An  RDF  file  in  the  Turtle  syntax  with  the   data  transformed  into  RDF   85  
  86. 86. LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   LD4SC  Summer  School   7th  -­‐  12th  June,  Cercedilla,  Spain   1st  Summer  School  on     Smart  Ci2es  and  Linked  Open  Data  (LD4SC-­‐15)   Thank  you  for  your  aQen=on!  

×