Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked data and language technologies


Published on

This set presents the concept of Linguistic Linked Licensed Data (known as 3LD) and the LIDER project The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Linked data and language technologies

  1. 1. 20/03/2014 1Presenter name Linked Data and Language Technologies: The LIDER project A. Gómez-Pérez (UPM) Project Coordinator CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years
  2. 2. 20/03/2014 2Asun Gómez-Pérez • Motivation • Linked Data for Language Technologies • What is LIDER about
  3. 3. 20/03/2014 3Asun Gómez-Pérez Heterogeneity of Linguistic Resources • Ecosystem of – Open and Close resources – Complementary resources • Lexicon • Corpora • Dictionaries • …. – Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, … – Language Resources available on the web • Meta-share, ELDA, ELRA, Clarin, FLaReNet, MultiJEDI,
  4. 4. 20/03/2014 4Asun Gómez-Pérez Limitations when exploiting LRs • The process of finding and integrating LR in third party applications is manual and time consuming • LR metadata – cannot be queried using a common language (e.g. SPARQL) • LR content – is available in heterogeneous formats – LR content is not linked with other linguistic content Language resources and technologies supported are still far from being Free, Open and Interoperable
  5. 5. 20/03/2014 5Asun Gómez-Pérez ndex.php/Terminesp:red http://www.wordreference.c om/sinonimos/ An example “Red” (computer network)
  6. 6. 20/03/2014 6Asun Gómez-Pérez Complex queries using data from heterogeneous sources
  7. 7. 20/03/2014 7Asun Gómez-Pérez *Picture attribution:
  8. 8. 20/03/2014 8Asun Gómez-Pérez *Picture attribution: ndex.php/Terminesp:red
  9. 9. 20/03/2014 9Asun Gómez-Pérez *Picture attribution: ndex.php/Terminesp:red http://www.wordreference.c om/sinonimos/
  10. 10. 20/03/2014 10Asun Gómez-Pérez *Picture attribution: ndex.php/Terminesp:red http://www.wordreference.c om/sinonimos/
  11. 11. 20/03/2014 11Asun Gómez-Pérez *Picture attribution: “Red” Etimologiy Del latin “rete” Gender: “f” Definition.: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….” “Red” Sinonyms: “sistema”, “malla”,” distribución” “Red” Norm: UNE 21302-131 English: network German: Netzwerk “Red” Pronunciation: [red] Grammar category: sustantivo femenino Singular: “red” Plural: “redes” “Red_de_computadores” Category: redes informáticas Image Complementary but not connected
  12. 12. 20/03/2014 12Asun Gómez-Pérez LD allows linguistic data integration 12 Red Phonetic form Form number singular [RED] Form plural [REDES] Phonetic form number Red Sense written form “red” Sense written form “malla” equivalent Red image Red Sense Sense translation es - en written form “red” “network” written form Red written form Form gender femenine “red”
  13. 13. 20/03/2014 13Asun Gómez-Pérez LD as a possible solution • Agree on 21st century vocabularies for describing resource metadata and content • Unified and standardized language for describing resources ( RDF(S)) • Unified and standardized query language (SPARQL) • Standardized non-propietary APIs • Links to other resources
  14. 14. 20/03/2014 14Presenter name Linked Data for Language Technologies
  15. 15. 20/03/2014 15Asun Gómez-Pérez Linked Open Data and Language 1. LOD is increasingly multilingual 2. LOD interconnects resources – In many domains – in many languages How many Linguistic Resources are exposed in RDF?
  16. 16. 20/03/2014 16Asun Gómez-Pérez Linked Data and Language Resources Linguistic LOD (LLOD)  Subset of LOD  Linguistic domain  Open License  Resources in RDF  Interconnected with other LD resources • Long term experience • Huge amount of resources • Maturity • Curation • Legal liability
  17. 17. 20/03/2014 17Presenter name The LIDER project
  18. 18. 20/03/2014 18Asun Gómez-Pérez The LIDER consortium 18 Universidad Politécnica de Madrid (UPM, Spain) [COORDINATOR] Trinity College Dublin (Ireland) DFKI (Germany) National University of Ireland, Galway (Ireland) Institut für Angewandte Informatik EV (INFAI, Germany) University of Bielefeld (Germany) Universita degli Studi di Roma La Sapienza (Italy) GEIE ERCIM (France)
  19. 19. 20/03/2014 19Asun Gómez-Pérez What is 3LD? 3LD Linguistic Linked Licensed Data Language resources such as: - Lexica - Corpora - Dictionaries .. NIF NLP Interchange Format Using RDF and standard data models (vocabularies): - Lexica - Corpora ODRL Open Digital Rights Language Published along with a machine-readable license.
  20. 20. 20/03/2014 20Asun Gómez-Pérez Challenge • Which extensions to the LOD are needed to support a new generation of large-scale content analytics applications that will overcome language barriers. – Expose Linguistic Resources in LD format with license information • Metadata • Content – Guidelines for Linguistic Linked Licensed Data (3LD) – Specification of a new generation of 3LD aware NLP services • Requirements: – Keep track of the License information – Keep track of the Provenance of the resource – Keep track of the use of the resource
  21. 21. 20/03/2014 21Asun Gómez-Pérez LOD as large background knowledge for NLP Producers Multimedia and Multilingual Content Metadata Generation Consumers Content Analytics Metadata as LD ... Language Resources (Lexicon, corpora, ...) some of them are FOI other are private Linguistic LOD generation (Metadata and Content) Language resources as LD LOD-aware NLP services
  22. 22. 20/03/2014 22Asun Gómez-Pérez Industry use cases 1. Roadmap on 3LD for Content Analytics 2. Guidelines for 3LD 3. 3LD Reference Architecture Community building networking LD4LT BP-MLOD W3C-CG OntoLex W3C-CG .- Surveys .- Requirements
  23. 23. 20/03/2014 23Asun Gómez-Pérez Community Building • Industrial Board • Open community Events tailored to the different audiences – Roadmapping Workshops 2013 • 21 March, EDF (Athens) • 7-8 May, Multilingual Web WS (Madrid) • 26-27 May, WS on Emotions (LREC – Reykjavik) • 27 May, WS on LD and Linguistics (LREC – Reykjavik) • 4-6 June, WS on Localization World (Dublin) • 2 September, WS on Semantics Conference (Leipzig) – Publication of best practices material via W3C community groups • LD4LT • BP-MLOD W3C-CG • OntoLex W3C-CG – Hackathon on September - Semantics Conference (Leipzig) – Surveys to localization industry and general Web companies
  24. 24. 20/03/2014 24Asun Gómez-Pérez Expected Contributions from the Community • Use case definition from industry will be input to the roadmap • Linguistic resources LLOD • Validation of guidelines and reference architecture • Participation in surveys • Participation in events: – Roadmapping WS, hackatons, etc. Lider will help with travelling grants to participants in Roadmapping WS
  25. 25. 20/03/2014 25Asun Gómez-Pérez Web channels Hashtag: #LiderEU Join the community