Linked data and language technologies
Upcoming SlideShare
Loading in...5
×
 

Linked data and language technologies

on

  • 309 views

This set presents the concept of Linguistic Linked Licensed Data (known as 3LD) and the LIDER project http://www.lider-project.eu/. The project’s mission is to provide the basis for the creation ...

This set presents the concept of Linguistic Linked Licensed Data (known as 3LD) and the LIDER project http://www.lider-project.eu/. The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.

Statistics

Views

Total Views
309
Views on SlideShare
284
Embed Views
25

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 25

https://twitter.com 25

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Linked data and language technologies Linked data and language technologies Presentation Transcript

  • 20/03/2014 1Presenter name Linked Data and Language Technologies: The LIDER project A. Gómez-Pérez (UPM) asun@fi.upm.es Project Coordinator CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years
  • 20/03/2014 2Asun Gómez-Pérez • Motivation • Linked Data for Language Technologies • What is LIDER about
  • 20/03/2014 3Asun Gómez-Pérez Heterogeneity of Linguistic Resources • Ecosystem of – Open and Close resources – Complementary resources • Lexicon • Corpora • Dictionaries • …. – Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, … – Language Resources available on the web • Meta-share, ELDA, ELRA, Clarin, FLaReNet, MultiJEDI,
  • 20/03/2014 4Asun Gómez-Pérez Limitations when exploiting LRs • The process of finding and integrating LR in third party applications is manual and time consuming • LR metadata – cannot be queried using a common language (e.g. SPARQL) • LR content – is available in heterogeneous formats – LR content is not linked with other linguistic content Language resources and technologies supported are still far from being Free, Open and Interoperable
  • 20/03/2014 5Asun Gómez-Pérez http://es.wiktionary.org http://rae.es http://www.wikilengua.org/i ndex.php/Terminesp:red http://es.wikipedia.org http://www.wordreference.c om/sinonimos/ An example “Red” (computer network)
  • 20/03/2014 6Asun Gómez-Pérez http://rae.es Complex queries using data from heterogeneous sources
  • 20/03/2014 7Asun Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell http://es.wiktionary.org http://rae.es
  • 20/03/2014 8Asun Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell http://es.wiktionary.org http://rae.es http://www.wikilengua.org/i ndex.php/Terminesp:red
  • 20/03/2014 9Asun Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell http://es.wiktionary.org http://rae.es http://www.wikilengua.org/i ndex.php/Terminesp:red http://www.wordreference.c om/sinonimos/
  • 20/03/2014 10Asun Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell http://es.wiktionary.org http://rae.es http://www.wikilengua.org/i ndex.php/Terminesp:red http://es.wikipedia.org http://www.wordreference.c om/sinonimos/
  • 20/03/2014 11Asun Gómez-Pérez *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell “Red” Etimologiy Del latin “rete” Gender: “f” Definition.: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….” “Red” Sinonyms: “sistema”, “malla”,” distribución” “Red” Norm: UNE 21302-131 English: network German: Netzwerk “Red” Pronunciation: [red] Grammar category: sustantivo femenino Singular: “red” Plural: “redes” “Red_de_computadores” Category: redes informáticas Image Complementary but not connected
  • 20/03/2014 12Asun Gómez-Pérez LD allows linguistic data integration 12 Red Phonetic form Form number singular [RED] Form plural [REDES] Phonetic form number Red Sense written form “red” Sense written form “malla” equivalent Red image Red Sense Sense translation es - en written form “red” “network” written form Red written form Form gender femenine “red”
  • 20/03/2014 13Asun Gómez-Pérez LD as a possible solution • Agree on 21st century vocabularies for describing resource metadata and content • Unified and standardized language for describing resources ( RDF(S)) • Unified and standardized query language (SPARQL) • Standardized non-propietary APIs • Links to other resources
  • 20/03/2014 14Presenter name Linked Data for Language Technologies
  • 20/03/2014 15Asun Gómez-Pérez Linked Open Data and Language 1. LOD is increasingly multilingual 2. LOD interconnects resources – In many domains – in many languages How many Linguistic Resources are exposed in RDF?
  • 20/03/2014 16Asun Gómez-Pérez Linked Data and Language Resources Linguistic LOD (LLOD)  Subset of LOD  Linguistic domain  Open License  Resources in RDF  Interconnected with other LD resources • Long term experience • Huge amount of resources • Maturity • Curation • Legal liability
  • 20/03/2014 17Presenter name The LIDER project
  • 20/03/2014 18Asun Gómez-Pérez The LIDER consortium 18 Universidad Politécnica de Madrid (UPM, Spain) [COORDINATOR] Trinity College Dublin (Ireland) DFKI (Germany) National University of Ireland, Galway (Ireland) Institut für Angewandte Informatik EV (INFAI, Germany) University of Bielefeld (Germany) Universita degli Studi di Roma La Sapienza (Italy) GEIE ERCIM (France)
  • 20/03/2014 19Asun Gómez-Pérez What is 3LD? 3LD Linguistic Linked Licensed Data Language resources such as: - Lexica - Corpora - Dictionaries .. NIF NLP Interchange Format Using RDF and standard data models (vocabularies): - Lexica - Corpora ODRL Open Digital Rights Language Published along with a machine-readable license.
  • 20/03/2014 20Asun Gómez-Pérez Challenge • Which extensions to the LOD are needed to support a new generation of large-scale content analytics applications that will overcome language barriers. – Expose Linguistic Resources in LD format with license information • Metadata • Content – Guidelines for Linguistic Linked Licensed Data (3LD) – Specification of a new generation of 3LD aware NLP services • Requirements: – Keep track of the License information – Keep track of the Provenance of the resource – Keep track of the use of the resource
  • 20/03/2014 21Asun Gómez-Pérez LOD as large background knowledge for NLP Producers Multimedia and Multilingual Content Metadata Generation Consumers Content Analytics Metadata as LD ... Language Resources (Lexicon, corpora, ...) some of them are FOI other are private Linguistic LOD generation (Metadata and Content) Language resources as LD LOD-aware NLP services
  • 20/03/2014 22Asun Gómez-Pérez Industry use cases 1. Roadmap on 3LD for Content Analytics 2. Guidelines for 3LD 3. 3LD Reference Architecture Community building networking LD4LT BP-MLOD W3C-CG OntoLex W3C-CG .- Surveys .- Requirements
  • 20/03/2014 23Asun Gómez-Pérez Community Building • Industrial Board • Open community Events tailored to the different audiences – Roadmapping Workshops 2013 • 21 March, EDF (Athens) • 7-8 May, Multilingual Web WS (Madrid) • 26-27 May, WS on Emotions (LREC – Reykjavik) • 27 May, WS on LD and Linguistics (LREC – Reykjavik) • 4-6 June, WS on Localization World (Dublin) • 2 September, WS on Semantics Conference (Leipzig) – Publication of best practices material via W3C community groups • LD4LT • BP-MLOD W3C-CG • OntoLex W3C-CG – Hackathon on September - Semantics Conference (Leipzig) – Surveys to localization industry and general Web companies
  • 20/03/2014 24Asun Gómez-Pérez Expected Contributions from the Community • Use case definition from industry will be input to the roadmap • Linguistic resources LLOD • Validation of guidelines and reference architecture • Participation in surveys • Participation in events: – Roadmapping WS, hackatons, etc. Lider will help with travelling grants to participants in Roadmapping WS
  • 20/03/2014 25Asun Gómez-Pérez Web channels www.lider-project.eu twitter.com/multilingweb Hashtag: #LiderEU Join the community www.w3c.org/community/ld4lt