20/11/2014	
   ‹Nº›	
  Presenter	
  name	
  
The	
  LIDER	
  Reference	
  Architecture	
  
Philipp	
  Cimiano	
  	
  
(represen:ng	
  the	
  LIDER	
  Project)	
  
LD4LT	
  Teleconference	
  
March	
  5th,	
  2015	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Goal	
  
•  Goal: Develop a Reference model that supports an ecosystem of linguistic
linked data and the development of content analytics services on top of
this ecosystem.
•  Key features:
–  Linked Data: connected ecosystem of data and services,
interoperability, supporting access by both humans and machines
–  Semantic Technologies: open web standards (OWL, RDF) for data
description, SPARQL and HTTP as Web APIs
–  De-centralization: Web architecture, no central point of failure, no
vendor lock-in, open standards
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Certification"
Benchmarking & Validation"
Discovery"
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
GuidelinesandStandardization"
Multilingual Data"
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Multilingual Data"
•  Terminologies	
  
•  (Mul:modal)	
  Corpora	
  
•  Bilingual	
  Dic:onaries	
  
•  Parallel	
  Data	
  
•  Transla:on	
  Memories	
  
•  Ontologies	
  
•  Glossaries,	
  Classifica:on	
  Schemas	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Metadata" Licensing" Provenance"
Multilingual Data"
•  Metadata:	
  providing	
  basic	
  informa:on	
  about	
  the	
  
dataset	
  (author,	
  language,	
  structure),	
  etc.	
  
•  Licensing:	
  specifying	
  the	
  terms	
  and	
  condi:ons	
  of	
  use	
  
•  Provenance:	
  describing	
  the	
  origin	
  and	
  processing	
  
history	
  of	
  data	
  	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
LLD Publishing"
"
"
Metadata" Licensing" Provenance"
Vocabularies" Hosting"
Multilingual Data"
best	
  prac:ces,	
  standards	
  and	
  tools	
  for	
  publica>on	
  and	
  hos>ng	
  of	
  LDL,	
  
and	
  vocabularies	
  for	
  descrip:on	
  and	
  transforma:on	
  of	
  different	
  
types	
  of	
  resources	
  (lexica,	
  corpora,	
  terminologies,	
  lexico-­‐seman:c	
  
resources)	
  into	
  RDF/LDL	
  Linguis:c	
  Linked	
  Data	
  (LDL)	
  	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
LLD Publishing"
"
"
Metadata"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
Multilingual Data"
•  Scalability:	
  	
  caching	
  and	
  non-­‐centralized	
  processing	
  
•  Streaming:	
  process	
  data	
  in	
  a	
  stream	
  fashion,	
  thus	
  
reducing	
  overhead	
  of	
  crea:ng	
  and	
  closing	
  connec:ons	
  
•  Interoperability:	
  common	
  vocabulary	
  to	
  describe	
  
inputs	
  and	
  output	
  of	
  services	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
Multilingual Data"
•  best	
  prac:ces	
  to	
  suppor:ng	
  linking	
  of	
  resources,	
  
combina:on	
  of	
  data	
  with	
  different	
  terms	
  and	
  condi:ons	
  
of	
  use,	
  in	
  par:cular	
  open	
  and	
  closed	
  data	
  
•  support	
  composi>on	
  of	
  services	
  into	
  complex	
  
workflows	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Discovery"
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
Multilingual Data"
Discovery	
  layer	
  implemented	
  by	
  a	
  number	
  of	
  independent	
  
indexing	
  and	
  aggrega:on	
  services	
  that	
  support	
  querying	
  
(SPARQL)	
  and	
  browsing	
  data	
  (Linked	
  Data)	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Benchmarking & Validation"
Discovery"
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
Multilingual Data"
tools	
  suppor:ng	
  comparison	
  of	
  datasets	
  and	
  services	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Certification"
Benchmarking & Validation"
Discovery"
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
Multilingual Data"
16/01/2015	
   Philipp	
  Cimiano	
  
Reference	
  Architecture	
  
Certification"
Benchmarking & Validation"
Discovery"
LLD Linking"
LLD Publishing"
"
"
Metadata"
Service Composition"
LLD-aware Services"
"
"
Licensing" Provenance"
Vocabularies" Hosting" Scalability" Streaming" Interoperability"
GuidelinesandStandardization"
Multilingual Data"
16/01/2015	
   Philipp	
  Cimiano	
  
Metadata	
  
•  Metadata:	
  DataID	
  for	
  the	
  descrip:on	
  of	
  datasets	
  (see	
  
Reference	
  Card	
  for	
  DataID),	
  as	
  well	
  as	
  Dublin	
  Core,	
  DCAT	
  
and	
  a	
  METASHARE	
  ontology	
  currently	
  in	
  development	
  (see	
  
other	
  threads)	
  	
  
•  Licensing:	
  The	
  recommenda:on	
  of	
  the	
  LIDER	
  project	
  is	
  to	
  
use	
  ODRL	
  for	
  the	
  descrip:on	
  of	
  terms	
  and	
  condi:ons	
  	
  
•  Provenance:	
  The	
  recommenda:on	
  of	
  the	
  LIDER	
  project	
  is	
  
to	
  use	
  the	
  PROV-­‐O	
  vocabulary	
  to	
  describe	
  provenance	
  of	
  
linguis:c	
  data	
  resources	
  Data	
  Publishing:	
  The	
  LIDER	
  project	
  
recommends	
  to	
  use	
  DataHub	
  for	
  publishing	
  metadata	
  	
  
•  Data	
  Linking:	
  The	
  LIDER	
  project	
  has	
  implemented	
  services	
  
that	
  link	
  data	
  across	
  sources	
  as	
  proof-­‐of-­‐concept	
  
implementa:on.	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Discovery	
  Layer	
  
•  Reference	
  implementa:on	
  is	
  
LingHub:	
  hcp://linghub.lider-­‐
project.eu/	
  
•  Indexes	
  metadata	
  from	
  METASHARE,	
  
CLARIN,	
  LRE	
  Map,	
  DataHub	
  
•  Integra:on	
  and	
  harmoniza:on	
  of	
  
data	
  by	
  mapping	
  to	
  DCAT,	
  Dublin	
  
Core	
  
•  Exposes	
  DataID	
  metadata	
  
descrip:ons	
  
•  Provides	
  SPARQL	
  endpoint	
  
•  Browsable	
  by	
  humans	
  and	
  machines	
  
(Linked	
  Data)	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Services	
  
Reference	
  implementa:on	
  of	
  NLP	
  services	
  that:	
  
	
  
•  Use	
  web	
  sockets	
  to	
  process	
  data	
  in	
  a	
  streaming	
  fashion	
  
•  Use	
  NIF-­‐grounded	
  RDF/JSON-­‐LD	
  as	
  input	
  and	
  output	
  
•  Can	
  be	
  composed	
  together	
  by	
  merging	
  output	
  (RDF	
  
merge)	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Standardiza:on	
  
Involvement	
  in	
  Community	
  Groups:	
  
•  Ontolex	
  (Ontology-­‐Lexicon	
  Models,	
  CG)	
  
•  BPMLOD	
  (Best	
  Prac:ces	
  for	
  Mul:lingual	
  Linked	
  Open	
  
Data,	
  CG)	
  
•  LD4LT	
  (Linked	
  Data	
  and	
  Language	
  Technologies,	
  CG)	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Use	
  Cases	
  
•  An	
  IT	
  company	
  is	
  ac:ve	
  in	
  the	
  brand	
  reputa>on	
  market	
  and	
  offers	
  a	
  product	
  
that	
  is	
  based	
  on	
  sen:ment	
  analysis	
  for	
  three	
  languages	
  (English,	
  Spanish;	
  
Portuguese),	
  and	
  needs	
  to	
  find	
  sen:ment	
  annotated	
  data	
  for	
  German	
  
•  A	
  terminology	
  management	
  company	
  wants	
  to	
  exploit	
  LLD	
  to	
  support	
  the	
  
process	
  of	
  crea:ng	
  a	
  corporate	
  terminology.	
  They	
  want	
  to	
  provide	
  seed	
  terms	
  
and	
  exploit	
  LLOD	
  to	
  get	
  further	
  candidates	
  for	
  terms.	
  	
  
•  A	
  machine	
  transla>on	
  company	
  wants	
  to	
  exploit	
  LLOD	
  for	
  training	
  machine	
  
transla:on	
  system	
  and	
  ease	
  the	
  adapta:on	
  to	
  a	
  new	
  domain,	
  searches	
  for	
  
parallel	
  data	
  on	
  a	
  certain	
  language	
  pair.	
  
•  An	
  IT	
  company	
  develops	
  informa:on	
  extrac:on	
  techniques	
  for	
  compe>tor	
  
analysis.	
  It	
  needs	
  to	
  develop	
  an	
  applica:on	
  that	
  works	
  on	
  Twicer	
  data.	
  The	
  
company	
  needs	
  to	
  find	
  POS-­‐annotated	
  Twicer	
  data	
  to	
  adapt	
  their	
  POS	
  tagger	
  
to	
  the	
  Twicer	
  domain.	
  	
  
•  A	
  researcher	
  wants	
  to	
  publish	
  a	
  dataset	
  on	
  the	
  Web	
  as	
  Linguis:c	
  Linked	
  Data	
  
and	
  needs	
  support	
  in	
  this.	
  A	
  part	
  of	
  the	
  dataset	
  will	
  be	
  offered	
  for	
  free	
  and	
  
part	
  will	
  be	
  offered	
  in	
  exchange	
  of	
  money.	
  	
  
16/01/2015	
   Philipp	
  Cimiano	
  
Discussion	
  	
  
	
  
Thanks	
  for	
  your	
  acen:on!	
  
Any	
  comments,	
  ques:ons,	
  
…?	
  

Lider Reference Model ld4lt session March, 3rd, 2015

  • 1.
    20/11/2014   ‹Nº›  Presenter  name   The  LIDER  Reference  Architecture   Philipp  Cimiano     (represen:ng  the  LIDER  Project)   LD4LT  Teleconference   March  5th,  2015  
  • 2.
    16/01/2015   Philipp  Cimiano   Goal   •  Goal: Develop a Reference model that supports an ecosystem of linguistic linked data and the development of content analytics services on top of this ecosystem. •  Key features: –  Linked Data: connected ecosystem of data and services, interoperability, supporting access by both humans and machines –  Semantic Technologies: open web standards (OWL, RDF) for data description, SPARQL and HTTP as Web APIs –  De-centralization: Web architecture, no central point of failure, no vendor lock-in, open standards
  • 3.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Certification" Benchmarking & Validation" Discovery" LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" GuidelinesandStandardization" Multilingual Data"
  • 4.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Multilingual Data" •  Terminologies   •  (Mul:modal)  Corpora   •  Bilingual  Dic:onaries   •  Parallel  Data   •  Transla:on  Memories   •  Ontologies   •  Glossaries,  Classifica:on  Schemas  
  • 5.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Metadata" Licensing" Provenance" Multilingual Data" •  Metadata:  providing  basic  informa:on  about  the   dataset  (author,  language,  structure),  etc.   •  Licensing:  specifying  the  terms  and  condi:ons  of  use   •  Provenance:  describing  the  origin  and  processing   history  of  data    
  • 6.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   LLD Publishing" " " Metadata" Licensing" Provenance" Vocabularies" Hosting" Multilingual Data" best  prac:ces,  standards  and  tools  for  publica>on  and  hos>ng  of  LDL,   and  vocabularies  for  descrip:on  and  transforma:on  of  different   types  of  resources  (lexica,  corpora,  terminologies,  lexico-­‐seman:c   resources)  into  RDF/LDL  Linguis:c  Linked  Data  (LDL)    
  • 7.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   LLD Publishing" " " Metadata" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" Multilingual Data" •  Scalability:    caching  and  non-­‐centralized  processing   •  Streaming:  process  data  in  a  stream  fashion,  thus   reducing  overhead  of  crea:ng  and  closing  connec:ons   •  Interoperability:  common  vocabulary  to  describe   inputs  and  output  of  services  
  • 8.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" Multilingual Data" •  best  prac:ces  to  suppor:ng  linking  of  resources,   combina:on  of  data  with  different  terms  and  condi:ons   of  use,  in  par:cular  open  and  closed  data   •  support  composi>on  of  services  into  complex   workflows  
  • 9.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Discovery" LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" Multilingual Data" Discovery  layer  implemented  by  a  number  of  independent   indexing  and  aggrega:on  services  that  support  querying   (SPARQL)  and  browsing  data  (Linked  Data)  
  • 10.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Benchmarking & Validation" Discovery" LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" Multilingual Data" tools  suppor:ng  comparison  of  datasets  and  services  
  • 11.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Certification" Benchmarking & Validation" Discovery" LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" Multilingual Data"
  • 12.
    16/01/2015   Philipp  Cimiano   Reference  Architecture   Certification" Benchmarking & Validation" Discovery" LLD Linking" LLD Publishing" " " Metadata" Service Composition" LLD-aware Services" " " Licensing" Provenance" Vocabularies" Hosting" Scalability" Streaming" Interoperability" GuidelinesandStandardization" Multilingual Data"
  • 13.
    16/01/2015   Philipp  Cimiano   Metadata   •  Metadata:  DataID  for  the  descrip:on  of  datasets  (see   Reference  Card  for  DataID),  as  well  as  Dublin  Core,  DCAT   and  a  METASHARE  ontology  currently  in  development  (see   other  threads)     •  Licensing:  The  recommenda:on  of  the  LIDER  project  is  to   use  ODRL  for  the  descrip:on  of  terms  and  condi:ons     •  Provenance:  The  recommenda:on  of  the  LIDER  project  is   to  use  the  PROV-­‐O  vocabulary  to  describe  provenance  of   linguis:c  data  resources  Data  Publishing:  The  LIDER  project   recommends  to  use  DataHub  for  publishing  metadata     •  Data  Linking:  The  LIDER  project  has  implemented  services   that  link  data  across  sources  as  proof-­‐of-­‐concept   implementa:on.  
  • 14.
    16/01/2015   Philipp  Cimiano   Discovery  Layer   •  Reference  implementa:on  is   LingHub:  hcp://linghub.lider-­‐ project.eu/   •  Indexes  metadata  from  METASHARE,   CLARIN,  LRE  Map,  DataHub   •  Integra:on  and  harmoniza:on  of   data  by  mapping  to  DCAT,  Dublin   Core   •  Exposes  DataID  metadata   descrip:ons   •  Provides  SPARQL  endpoint   •  Browsable  by  humans  and  machines   (Linked  Data)  
  • 15.
    16/01/2015   Philipp  Cimiano   Services   Reference  implementa:on  of  NLP  services  that:     •  Use  web  sockets  to  process  data  in  a  streaming  fashion   •  Use  NIF-­‐grounded  RDF/JSON-­‐LD  as  input  and  output   •  Can  be  composed  together  by  merging  output  (RDF   merge)  
  • 16.
    16/01/2015   Philipp  Cimiano   Standardiza:on   Involvement  in  Community  Groups:   •  Ontolex  (Ontology-­‐Lexicon  Models,  CG)   •  BPMLOD  (Best  Prac:ces  for  Mul:lingual  Linked  Open   Data,  CG)   •  LD4LT  (Linked  Data  and  Language  Technologies,  CG)  
  • 17.
    16/01/2015   Philipp  Cimiano   Use  Cases   •  An  IT  company  is  ac:ve  in  the  brand  reputa>on  market  and  offers  a  product   that  is  based  on  sen:ment  analysis  for  three  languages  (English,  Spanish;   Portuguese),  and  needs  to  find  sen:ment  annotated  data  for  German   •  A  terminology  management  company  wants  to  exploit  LLD  to  support  the   process  of  crea:ng  a  corporate  terminology.  They  want  to  provide  seed  terms   and  exploit  LLOD  to  get  further  candidates  for  terms.     •  A  machine  transla>on  company  wants  to  exploit  LLOD  for  training  machine   transla:on  system  and  ease  the  adapta:on  to  a  new  domain,  searches  for   parallel  data  on  a  certain  language  pair.   •  An  IT  company  develops  informa:on  extrac:on  techniques  for  compe>tor   analysis.  It  needs  to  develop  an  applica:on  that  works  on  Twicer  data.  The   company  needs  to  find  POS-­‐annotated  Twicer  data  to  adapt  their  POS  tagger   to  the  Twicer  domain.     •  A  researcher  wants  to  publish  a  dataset  on  the  Web  as  Linguis:c  Linked  Data   and  needs  support  in  this.  A  part  of  the  dataset  will  be  offered  for  free  and   part  will  be  offered  in  exchange  of  money.    
  • 18.
    16/01/2015   Philipp  Cimiano   Discussion       Thanks  for  your  acen:on!   Any  comments,  ques:ons,   …?